MLfromScratch is a library designed to help you learn and understand machine learning algorithms by building them from scratch using only NumPy
! No black-box libraries, no hidden magicโjust pure Python and math. It's perfect for beginners who want to see what's happening behind the scenes of popular machine learning models.
๐ Explore the Documentation
Our package structure is designed to look like scikit-learn
, so if you're familiar with that, you'll feel right at home!
-
LinearRegression : Imagine drawing a straight line through a set of points to predict future values. Linear Regression helps in predicting something like house prices based on size.
-
SGDRegressor : A fast way to do Linear Regression using Stochastic Gradient Descent. Perfect for large datasets.
-
SGDClassifier : A classification algorithm predicting categories like "spam" or "not spam."
-
DecisionTreeClassifier : Think of this as playing 20 questions to guess something. A decision tree asks yes/no questions to classify data.
-
DecisionTreeRegressor : Predicts a continuous number (like temperature tomorrow) based on input features.
-
KNeighborsClassifier : Classifies data by looking at the 'k' nearest neighbors to the new point.
-
KNeighborsRegressor : Instead of classifying, it predicts a number based on nearby data points.
-
GaussianNB : Works great for data that follows a normal distribution (bell-shaped curve).
-
MultinomialNB : Ideal for text classification tasks like spam detection.
-
AgglomerativeClustering : Clusters by merging similar points until a single large cluster is formed.
-
DBSCAN : Groups points close to each other and filters out noise. No need to specify the number of clusters!
-
MeanShift : Shifts data points toward areas of high density to find clusters.
-
RandomForestClassifier : Combines multiple decision trees to make stronger decisions.
-
RandomForestRegressor : Predicts continuous values using an ensemble of decision trees.
-
GradientBoostingClassifier : Builds trees sequentially, each correcting errors made by the last.
-
VotingClassifier : Combines the results of multiple models to make a final prediction.
Measure your modelโs performance:
-
accuracy_score : Measures how many predictions your model got right.
-
f1_score : Balances precision and recall into a single score.
-
roc_curve : Shows the trade-off between true positives and false positives.
-
train_test_split : Splits your data into training and test sets.
-
KFold : Trains the model in 'k' iterations for better validation.
-
StandardScaler : Standardizes your data so it has a mean of 0 and a standard deviation of 1.
-
LabelEncoder : Converts text labels into numerical labels (e.g., "cat", "dog").
Dimensionality Reduction helps in simplifying data while retaining most of its valuable information. By reducing the number of features (dimensions) in a dataset, it makes data easier to visualize and speeds up machine learning algorithms.
- PCA (Principal Component Analysis) : PCA reduces the number of dimensions by finding new uncorrelated variables called principal components. It projects your data onto a lower-dimensional space while retaining as much variance as possible.
- How It Works: PCA finds the axes (principal components) that maximize the variance in your data. The first principal component captures the most variance, and each subsequent component captures progressively less.
- Use Case: Use PCA when you have many features, and you want to simplify your dataset for better visualization or faster computation. It is particularly useful when features are highly correlated.
- Learning-First Approach: If you're a beginner and want to understand machine learning, this is the library for you. No hidden complexity, just code.
- No Hidden Magic: Everything is written from scratch, so you can see exactly how each algorithm works.
- Lightweight: Uses only
NumPy
, making it fast and easy to run.
# Clone the repository
git clone https://github.com/adityajn105/MLfromScratch.git
# Navigate to the project directory
cd MLfromScratch
# Install the required dependencies
pip install -r requirements.txt
This project is maintained by Aditya Jain
Constributor: Subrahmanya Gaonkar
We welcome contributions from everyone, especially beginners! If you're new to open-source, donโt worryโfeel free to ask questions, open issues, or submit a pull request.
- Fork the repository.
- Create a new branch (git checkout -b feature-branch).
- Make your changes and commit (git commit -m "Added new feature").
- Push the changes (git push origin feature-branch).
- Submit a pull request and explain your changes.
This project is licensed under the MIT License - see the LICENSE file for details.