The repository contains a data science project based on a database that comes from the Kaggle The database, called Car Features and MSRP, contains 11914 car models sold in the U.S. between 1990 and 2018.Each car is described by 16 variables. The main goal of the study is to create a model that best predicts car prices based on the information collected. The structure of the project is as follows:
- Introduction
- Development environment preparation
- Data structure and data cleaning
- Exploratory and Data analysis
- Feature selection
- Modelling
- Conclusion
Project created to test and solidify skills in using pandas, scikit-learn and seaborn tools. It constitutes a project for portfolio.
- Python 3.9.2
- pandas 1.1.3
- numpy 1.19.1
- scikit-learn 0.24.1
- seaborn 0.11.0
- matplotlib 3.3.3
Rest of required packages in requirements.txt file
To run the project:
- Download the entire repository and unzip it.
- Install the required packages included in the requirements.txt file:
- Launch any command-line interface (e.g. Anaconda Prompt).
- If you have one, set up a custom virtual environment for the program in which the project will run.
- Set the destination path to the folder with the project: "cd 'your destination path to project'"
- Type "pip install -r requirements.txt".
- Start Jupyter Notebook or LupyterLab and select the file Car_price_notebook.ipynb
- Run all cells
Application developed and tested in Jupyter Notebook.