This repository contains solutions to various machine learning tasks completed by the Machine Learners team. The tasks are organized into four main categories:
- Regression 📉 - Predicting continuous values (Dairy Goods Sales Dataset)
- Classification 🔍 - Predicting discrete labels from input features (Amazon Products Dataset)
- Unsupervised Learning 🔎 - Extracting meaningful patterns from unlabeled data (Customer Support on Twitter Dataset)
- ML 100 Min Challenge ⏱️ - Solving multiple machine learning challenges in under 100 minutes
Machine-Learners/
├── Regression/ # Contains regression models 📈
│ ├── dairy_dataset.csv # Dataset for regression task (Dairy Goods Sales) 🧀
│ └── Regression_MachineLearners.ipynb # Jupyter Notebook for regression task 📝
│
├── Classification/ # Contains classification models 🛍️
│ ├── Amazon-Products.zip # Raw dataset for classification (Amazon Products) 📦
│ └── Classification_T5.ipynb # Jupyter Notebook for classification task 🧑💻
│
├── Unsupervised/ # Contains unsupervised learning tasks 🧠
│ └── T5-Unsupervised.ipynb # Jupyter Notebook for unsupervised learning task 🔍
│
├── 'ML Challenge'/ # ML 100 Min Challenge folder ⏱️
│ ├── ML_Challenge1_T5.ipynb # Jupyter Notebook for first ML challenge 🏆
│ ├── ML_Challenge2_T5.ipynb # Jupyter Notebook for second ML challenge 🏅
│
└── README.md # This file 📄
- 202418013 - Darshita Dwivedi
- 202418025 - Kelvi Bhesdadiya
- 202418057 - Eric Thomas
- 202418058 - Ujjwal Bhansali
This subproject focuses on predicting continuous values using machine learning. We use a Dairy Goods Sales Dataset to apply regression models.
- dairy_dataset.csv: The dataset contains information on dairy product sales. The goal is to predict continuous values such as sales amounts.
- Regression_MachineLearners.ipynb: The Jupyter notebook where data is processed, various regression models are trained, and predictions are made on sales values in the dairy goods industry.
This subproject aims to classify e-commerce products into categories based on product names. We use the Amazon Products Dataset for this task.
- Amazon-Products.zip: A dataset that contains product names and categories from Amazon.
- Classification_T5.ipynb: This notebook covers the steps of text cleaning, feature extraction (e.g., TF-IDF), and training classification models (e.g., Logistic Regression, Random Forest) to predict product categories.
The Unsupervised Learning subproject aims to identify meaningful patterns in unlabeled data. The dataset used involves customer support interactions on Twitter.
- T5-Unsupervised.ipynb: This notebook applies unsupervised learning techniques like clustering, dimensionality reduction, and pattern recognition to customer support interactions on Twitter.
- Dataset: Customer Support on Twitter
This folder contains solutions to the ML 100 Min Challenge, where we solve multiple machine learning tasks in under 100 minutes.
- ML_Challenge1_T5.ipynb: The first challenge in the ML 100 Min Challenge, where we apply a machine learning model to solve the problem.
- ML_Challenge2_T5.ipynb: The second challenge in the ML 100 Min Challenge, continuing from the first with a new dataset and task.
To run the notebooks, install the required dependencies. It is recommended to use a virtual environment:
pip install -r requirements.txt
The requirements.txt
includes essential libraries such as:
numpy
pandas
sklearn
matplotlib
seaborn
plotly
nltk
- Navigate to the respective folder (e.g., Regression, Classification, or Unsupervised) depending on your task.
- Open the relevant Jupyter Notebook (
.ipynb
) in a Jupyter notebook environment (e.g., JupyterLab or Google Colab). - Execute the cells step-by-step to see the outcomes of each stage in the machine learning pipeline.
- dairy_dataset.csv: Contains data related to dairy goods sales, used for regression tasks.
- Regression_MachineLearners.ipynb: This notebook handles data analysis, model training, and sales predictions in the dairy goods sector.
- Amazon-Products.zip: A dataset with product information such as names and categories for classification tasks.
- Classification_T5.ipynb: This notebook involves text preprocessing, feature extraction, and model training (Logistic Regression, Random Forest) to classify products.
- T5-Unsupervised.ipynb: Explores unsupervised learning techniques, such as clustering and dimensionality reduction, applied to customer support data.
- Dataset: Customer Support on Twitter
- ML_Challenge1_T5.ipynb: Solution for the first ML challenge task.
- ML_Challenge2_T5.ipynb: Solution for the second ML challenge task.
-
Dataset Sources:
- Amazon Products: Kaggle - Amazon Products Dataset
- Dairy Goods Sales: Kaggle - Dairy Goods Sales Dataset
- Customer Support on Twitter: Kaggle - Customer Support on Twitter
-
Libraries Used:
numpy
,pandas
,sklearn
,matplotlib
,seaborn
,plotly
,nltk
- Classification: Experiment with deep learning models like CNNs or LSTMs to potentially enhance performance.
- ML Challenge: Continue tackling additional challenges and applying more advanced machine learning techniques.
- Regression: Incorporate additional features to improve the prediction accuracy.
- Unsupervised Learning: Test different clustering algorithms and dimensionality reduction techniques to better understand data patterns.