← Back to Projects
dimRed
Live March 15, 2018

dimRed

Dimensional reduction algorithms in Python, Scala & PySpark

ML Python PySpark Scala

The Problem

High-dimensional datasets are difficult to visualize, interpret, and compare across experiments. Teams often need consistent dimensionality reduction implementations across Python, Scala, and Spark to move from exploration to production.

What I Built

A compact library of dimensionality reduction implementations in Python, Scala, and PySpark, built to support both notebook exploration and large-scale Spark jobs. The repo focuses on clear, reproducible implementations that make it easy to compare approaches across stacks.

Key Results

  • 62 GitHub stars as a community reference
  • Cross-language implementations that mirror each other for easier benchmarking
  • Spark-friendly workflows for scaling to larger datasets