Pure Python for Data Science & Machine Learning

A hands-on tutorial project that implements fundamental machine learning and data science algorithms from scratch using pure Python — no scikit-learn, no TensorFlow, no PyTorch for the core algorithms. The goal is to build deep understanding of how these algorithms actually work under the hood.

Who Is This For?

Students learning ML/DS who want to understand the math and mechanics behind the algorithms
Developers who want to go beyond calling model.fit() and understand what happens inside
Anyone preparing for ML interviews where algorithmic understanding is tested

Prerequisites

Python 3.6+
Basic understanding of linear algebra (vectors, matrices, dot products)
Familiarity with calculus (derivatives, chain rule)
Basic probability and statistics knowledge

How to Use

Clone this repository
Follow the chapters in numerical order (0 through 19)
Each chapter contains Jupyter notebooks (.ipynb) for theory and Python scripts (.py) for implementations
The X.Kaggle_Practice_Projects/ folder contains end-to-end projects applying each algorithm to real datasets
Datasets are stored in Y.Kaggle_Data/

Chapter	Topic	Key Concepts
0. Statistics Supplement	Descriptive & inferential statistics	Mean, median, mode, variance, hypothesis testing, odds & log-odds
1. Finding and Reading Data	Data I/O	CSV parsing, string-to-float conversion
2. Data Preprocessing	Data preparation	Min-max normalization, z-score standardization, feature engineering
3. Resampling Methods	Train/test strategies	Train/test split, k-fold cross-validation

Part II: Evaluation Metrics

Chapter	Topic	Key Concepts
4. Evaluating Accuracy	Classification & advanced metrics	Accuracy, precision, recall, F1-score, ROC curve, AUC
5. Confusion Matrix	Classification evaluation	Multi-class confusion matrix
6. MAE and RMSE	Regression evaluation	Mean Absolute Error, Root Mean Squared Error, R-squared
7. Baseline Models	Benchmarking	Random prediction, ZeroR algorithm

Part III: Linear Models

Chapter	Topic	Key Concepts
8. Linear Regression	Regression	OLS, covariance, correlation, regularization (Ridge/Lasso)
9. Stochastic Gradient Descent	Optimization	SGD algorithm, learning rate, convergence
10. Logistic Regression	Binary classification	Sigmoid function, maximum likelihood, regularization

Part IV: Classic ML Algorithms

Chapter	Topic	Key Concepts
11. Perceptron	Linear classifier	Step function, perceptron learning rule, linear separability
12. Decision Trees	Tree-based models	CART, Gini impurity, recursive splitting, pruning
13. Naive Bayes	Probabilistic classifier	Bayes theorem, conditional independence, Gaussian NB
14. K-Nearest Neighbor	Instance-based learning	Euclidean distance, choosing k, lazy learning
15. Learning Vector Quantization	Prototype-based	Codebook vectors, BMU, competitive learning

Part V: Neural Networks & Advanced Topics

Chapter	Topic	Key Concepts
16. Neural Networks	Deep learning foundations	Forward/backward propagation, sigmoid, weight updates
17. K-Means Clustering	Unsupervised learning	Centroid initialization, cluster assignment, elbow method, K-Means++
18. PCA	Dimensionality reduction	Covariance matrix, eigendecomposition, explained variance
19. Support Vector Machine	Maximum margin classifier	Hinge loss, kernel trick, soft margin, SGD-based SVM

Part VI: Ensemble Methods

Chapter	Topic	Key Concepts
Ensemble Algorithms	Model combination	Bootstrap, bagging, random forests, boosting concepts

Practice Projects

Project	Algorithm	Dataset
case00	Simple Linear Regression	Insurance costs
case01	Linear Regression via SGD	Wine quality
case02	Logistic Regression	Diabetes prediction
case03	Perceptron	Sonar classification
case04	CART Decision Tree	Banknote authentication
case05	KNN	Abalone age prediction
case06	LVQ	Ionosphere radar signals
case07	Neural Network	Wheat seed classification
case08	Bagging	Sonar classification
case09	Random Forest	Sonar classification

Learning Path

Statistics & Data Handling (Ch. 0-3)
         |
         v
Evaluation Metrics (Ch. 4-7)
         |
         v
Linear Models (Ch. 8-10)
         |
    +----+----+
    |         |
    v         v
Classic ML    Neural Networks
(Ch. 11-15)   (Ch. 16)
    |         |
    +----+----+
         |
         v
Unsupervised & Advanced (Ch. 17-19)
         |
         v
Ensemble Methods (PlusPlus)
         |
         v
Practice Projects (X.Kaggle_Practice_Projects)

Project Philosophy

No black boxes: Every algorithm is implemented step-by-step so you can see exactly how it works
Pure Python first: Core algorithms use only Python's standard library (math, random, csv)
Optional visualization: Some notebooks use matplotlib/seaborn for plots, but these are optional and wrapped in try/except blocks
Learn by doing: Each chapter includes working code you can run, modify, and experiment with

Installation

git clone https://github.com/your-username/Pure_Python_for_DS_ML.git
cd Pure_Python_for_DS_ML
pip install -r requirements.txt  # optional, only for visualization
jupyter notebook

License

This project is for educational purposes. Feel free to use and modify for learning.

Author

William

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.idea		.idea
0.Statistics_Supplement		0.Statistics_Supplement
1.Finding_and_Reading_Data		1.Finding_and_Reading_Data
10.Logistic_Regression		10.Logistic_Regression
11.Perceptron		11.Perceptron
12.Classification_and_Decision_Tree		12.Classification_and_Decision_Tree
13.Naive_Bayes		13.Naive_Bayes
14.K_Nearest_Neighbor		14.K_Nearest_Neighbor
15.Learning_Vector_Quantization		15.Learning_Vector_Quantization
16.Artificial_Neural_Network_and_Backpropagation		16.Artificial_Neural_Network_and_Backpropagation
17.K_Means_Clustering		17.K_Means_Clustering
18.Principal_Component_Analysis		18.Principal_Component_Analysis
19.Support_Vector_Machine		19.Support_Vector_Machine
2.Data_Preprocessing		2.Data_Preprocessing
3.Resampling_Methods		3.Resampling_Methods
4.Evaluating_Accuracy		4.Evaluating_Accuracy
5.Confusion_Matrix		5.Confusion_Matrix
6.MAE_and_RMSE		6.MAE_and_RMSE
7.Baseline_Models		7.Baseline_Models
8.Linear_Regression		8.Linear_Regression
9.Stochastic_Gradient_Descent		9.Stochastic_Gradient_Descent
PlusPlus.Ensemble_Algo		PlusPlus.Ensemble_Algo
X.Kaggle_Practice_Projects		X.Kaggle_Practice_Projects
Y.Kaggle_Data		Y.Kaggle_Data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pure Python for Data Science & Machine Learning

Who Is This For?

Prerequisites

How to Use

Table of Contents

Part I: Foundations

Part II: Evaluation Metrics

Part III: Linear Models

Part IV: Classic ML Algorithms

Part V: Neural Networks & Advanced Topics

Part VI: Ensemble Methods

Practice Projects

Learning Path

Project Philosophy

Installation

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

williamjiamin/Pure_Python_for_DS_ML

Folders and files

Latest commit

History

Repository files navigation

Pure Python for Data Science & Machine Learning

Who Is This For?

Prerequisites

How to Use

Table of Contents

Part I: Foundations

Part II: Evaluation Metrics

Part III: Linear Models

Part IV: Classic ML Algorithms

Part V: Neural Networks & Advanced Topics

Part VI: Ensemble Methods

Practice Projects

Learning Path

Project Philosophy

Installation

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages