This repository implements and compares Stochastic Variance Reduced Gradient (SVRG) and Stochastic Gradient Descent (SGD) on convex optimization problems, with a focus on theoretical properties, convergence behavior, and empirical performance.
The project emphasizes clarity, correctness, and alignment with classical optimization theory.
Stochastic Gradient Descent (SGD) is a foundational optimization method in machine learning due to its low per-iteration cost. However, the variance of its stochastic gradients prevents fast convergence near the optimum, requiring diminishing step sizes and resulting in sub-linear convergence rates.
Stochastic Variance Reduced Gradient (SVRG) addresses this limitation by constructing a variance-reduced gradient estimator using periodic full-gradient computations. This enables the use of constant step sizes and leads to significantly faster convergence on smooth and strongly convex objectives.
This repository provides:
- A clean, from-scratch implementation of SGD and SVRG
- A controlled empirical comparison of their convergence behavior
- A practical demonstration of variance reduction in stochastic optimization
- Uses stochastic gradients computed from individual samples or mini-batches
- Requires diminishing learning rates for convergence
- Exhibits sub-linear convergence on strongly convex objectives
- Employs periodic full-gradient snapshots
- Uses variance-reduced stochastic gradient estimators
- Achieves linear (geometric) convergence under standard smoothness and convexity assumptions
The algorithms are evaluated on:
- Logistic Regression
- Ridge Regression
Experiments are conducted on both synthetic and real-world datasets to highlight differences in convergence speed, stability, and final optimization error.
src/models.py— Loss functions and gradient computationssrc/optimizers.py— Implementations of SGD and SVRGsrc/utils.py— Data loading and preprocessing utilitiesnotebooks/comparison_demo.ipynb— Empirical evaluation and visualization
- Install dependencies:
pip install -r requirements.txt
- Johnson, R., & Zhang, T. (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. Advances in Neural Information Processing Systems.