You need to be able to work in a Jupyter Notebook on your computer. The following packages (libraries) need to be installed. You can install these packages via conda or pip.
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Scikit learn
- The Distribution of Price is main problem beacuse it was skewed right, and gives large error in train and test. So, to solve this problem I have used Log Transformation.
- Correlation Matrix: This gives an brief idea about connection between each other (Here target variable is msrp).
- Final Results on Train and Test Data
Prediction Vs Actual Distribution on Train Data
Prediction Vs Actual Distribution on Test Data
- Meteric used:
- Root Mean Square Error (RMSE): 0.0708 on Train DataSet and 0.1078 on Test DataSet.
Must give credit to Kaggle for the data. You can find the Licensing for the data and other descriptive information at the Here.