Skip to content

Latest commit

 

History

History
42 lines (39 loc) · 1.51 KB

README.md

File metadata and controls

42 lines (39 loc) · 1.51 KB

Data Science x Logistic Regression

This project aims to introduce the basics of DataScience and Logistic Regression while exploring different tools used in Machine Learning.

Objectives

  • Read Dataset and preprocess it.
  • Create visualizations to detect defects or anomalies.
  • Implement a logistic regression model.

Getting started

Installing

Clone the repository.

git clone https://github.com/framdani/dslr.git
cd dslr

Install the required packages.

pip install -r requirements.txt

Usage

Data Exploration

Use the describe.py script to get an overview of the dataset.

python3 describe.py dataset_train.csv

Data visualization

Use the following scripts to create visualizations:

  • histogram.py : Displays a histogram that answers the question, "Which Hogwarts course has a homogeneous score distribution between all houses?"
  • scatter_plot.py : Displays a scatter plot that answers the question, "What are the features that are similar?"
  • pair_plot.py : Displays a pair plot that helps identify the features to use in the logistic regression.

Logistic Regression

Use the logreg_train.py script to train the model and generate the weights.

python3 logreg_train.py dataset_train.csv

Use the logreg_predict.py script to predict output using the weights generated by the previous script.

python3 logreg_predict.py dataset_test.csv weights.csv

Note

The purpose of this project is educational. You are welcome to clone, modify ,and use it for your own learning purposes.