Skip to content

eric157/Machine-Learners

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌟 Machine Learners: Machine Learning Group Assignments & ML 100 Min Challenge 🌟

Overview 📚

This repository contains solutions to various machine learning tasks completed by the Machine Learners team. The tasks are organized into four main categories:

  1. Regression 📉 - Predicting continuous values (Dairy Goods Sales Dataset)
  2. Classification 🔍 - Predicting discrete labels from input features (Amazon Products Dataset)
  3. Unsupervised Learning 🔎 - Extracting meaningful patterns from unlabeled data (Customer Support on Twitter Dataset)
  4. ML 100 Min Challenge ⏱️ - Solving multiple machine learning challenges in under 100 minutes

Project Structure 📁

Machine-Learners/
├── Regression/                           # Contains regression models 📈
│   ├── dairy_dataset.csv                # Dataset for regression task (Dairy Goods Sales) 🧀
│   └── Regression_MachineLearners.ipynb  # Jupyter Notebook for regression task 📝
│
├── Classification/                       # Contains classification models 🛍️
│   ├── Amazon-Products.zip               # Raw dataset for classification (Amazon Products) 📦
│   └── Classification_T5.ipynb           # Jupyter Notebook for classification task 🧑‍💻
│
├── Unsupervised/                        # Contains unsupervised learning tasks 🧠
│   └── T5-Unsupervised.ipynb            # Jupyter Notebook for unsupervised learning task 🔍
│
├── 'ML Challenge'/                       # ML 100 Min Challenge folder ⏱️
│   ├── ML_Challenge1_T5.ipynb           # Jupyter Notebook for first ML challenge 🏆
│   ├── ML_Challenge2_T5.ipynb           # Jupyter Notebook for second ML challenge 🏅
│
└── README.md                            # This file 📄

Team Members 👨‍💻👩‍💻

  • 202418013 - Darshita Dwivedi
  • 202418025 - Kelvi Bhesdadiya
  • 202418057 - Eric Thomas
  • 202418058 - Ujjwal Bhansali

Subprojects Overview 🔍

1. Regression 📊

This subproject focuses on predicting continuous values using machine learning. We use a Dairy Goods Sales Dataset to apply regression models.

  • dairy_dataset.csv: The dataset contains information on dairy product sales. The goal is to predict continuous values such as sales amounts.
  • Regression_MachineLearners.ipynb: The Jupyter notebook where data is processed, various regression models are trained, and predictions are made on sales values in the dairy goods industry.

2. Classification 🏷️

This subproject aims to classify e-commerce products into categories based on product names. We use the Amazon Products Dataset for this task.

  • Amazon-Products.zip: A dataset that contains product names and categories from Amazon.
  • Classification_T5.ipynb: This notebook covers the steps of text cleaning, feature extraction (e.g., TF-IDF), and training classification models (e.g., Logistic Regression, Random Forest) to predict product categories.

3. Unsupervised Learning 🧠

The Unsupervised Learning subproject aims to identify meaningful patterns in unlabeled data. The dataset used involves customer support interactions on Twitter.

  • T5-Unsupervised.ipynb: This notebook applies unsupervised learning techniques like clustering, dimensionality reduction, and pattern recognition to customer support interactions on Twitter.
  • Dataset: Customer Support on Twitter

4. ML 100 Min Challenge

This folder contains solutions to the ML 100 Min Challenge, where we solve multiple machine learning tasks in under 100 minutes.

  • ML_Challenge1_T5.ipynb: The first challenge in the ML 100 Min Challenge, where we apply a machine learning model to solve the problem.
  • ML_Challenge2_T5.ipynb: The second challenge in the ML 100 Min Challenge, continuing from the first with a new dataset and task.

How to Run the Project 🚀

1. Install Dependencies ⚙️

To run the notebooks, install the required dependencies. It is recommended to use a virtual environment:

pip install -r requirements.txt

The requirements.txt includes essential libraries such as:

  • numpy
  • pandas
  • sklearn
  • matplotlib
  • seaborn
  • plotly
  • nltk

2. Running the Notebooks 💻

  • Navigate to the respective folder (e.g., Regression, Classification, or Unsupervised) depending on your task.
  • Open the relevant Jupyter Notebook (.ipynb) in a Jupyter notebook environment (e.g., JupyterLab or Google Colab).
  • Execute the cells step-by-step to see the outcomes of each stage in the machine learning pipeline.

Description of Files 🗂️

Regression Folder 📉

  • dairy_dataset.csv: Contains data related to dairy goods sales, used for regression tasks.
  • Regression_MachineLearners.ipynb: This notebook handles data analysis, model training, and sales predictions in the dairy goods sector.

Classification Folder 🛒

  • Amazon-Products.zip: A dataset with product information such as names and categories for classification tasks.
  • Classification_T5.ipynb: This notebook involves text preprocessing, feature extraction, and model training (Logistic Regression, Random Forest) to classify products.

Unsupervised Folder 🔍

  • T5-Unsupervised.ipynb: Explores unsupervised learning techniques, such as clustering and dimensionality reduction, applied to customer support data.
  • Dataset: Customer Support on Twitter

ML Challenge Folder ⏱️

  • ML_Challenge1_T5.ipynb: Solution for the first ML challenge task.
  • ML_Challenge2_T5.ipynb: Solution for the second ML challenge task.

Acknowledgements 🙏


Future Work 🚀

  • Classification: Experiment with deep learning models like CNNs or LSTMs to potentially enhance performance.
  • ML Challenge: Continue tackling additional challenges and applying more advanced machine learning techniques.
  • Regression: Incorporate additional features to improve the prediction accuracy.
  • Unsupervised Learning: Test different clustering algorithms and dimensionality reduction techniques to better understand data patterns.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published