In this project, a dataset of car advertisements in a web is analysed.
The purpose of the analysis is to determine what factors correlate with a car's price.
Exploratory Data Analysis (EDA) is the approach used to observe patterns and draw conclusions on how parameters correlate with each other.
The dataset comprises of 51,525 observations and 13 characteristics. Several missing values are found. No explicit duplicate is found. And several data have mismatch type.
Steps done in the projects:
- Data pre-processing:
- Handling missing values
- Improving data quality
- Observing and analysing core parameters
- Handling data outliers
- Observing and analysing core parameters without data outliers
- Analysing correlations between core parameters
Conclusions:
- Odometer and price moderately correlate, with coeffecient of correlation around 0.5.
- Age and price moderately correlate, with coeffecient of correlation around 0.6.
- Condition and price correlate weakly.
- Automatic transmision cars are sold more expensively for type Sedan. And manual for SUV.
- Orange colored cars are sold more expensively in average than any other color. While green ones are sold at the lowest price.