- Project Overview
- Data description
- Notebooks
- Applyed Models
- The Process of work
- compare Model score
The goal of this project is to develop classification model for dataset that hold specifications of 2000 mobile phones attempt to predict best price ranges by applying various machine learning algorithm.
- battery_power: Total energy a battery can store in mAh
- blue: Has bluetooth or not
- clock_speed: Speed at which microprocessor executes instructions
- dual_sim: Has dual sim support or not
- fc: Front Camera mega pixels
- four_g: Has 4G or not
- int_memory: Internal Memory in Gigabytes
- m_dep: Mobile Depth in cm
- mobile_wt: Weight of mobile phone
- n_cores: Number of cores of processor
- pc: Primary Camera mega pixels
- px_height: Pixel Resolution Height
- px_width: Pixel Resolution Width
- ram: Random Access Memory in Megabytes
- sc_h: Screen Height of mobile in cm
- sc_w: Screen Width of mobile in cm
- talk_time: Longest time that battery will last by a call
- three_g: Has 3G or not
- touch_screen: Has touch screen or not
- wifi: Has wifi or not
Included in this github are a jupyter notebooks folder containing:
-
EDA_Mobile_Price_Classification.ipynb :
- A notebook where I performed significant EDA to explore the variables.
-
Mobile_Price_Classification_ML.ipynb :
- I began with baseline models and continued through random forest, K nearest neighbor, Decision tree , and finally Stacking models.
- find if there is duplicated and null value and clean it.
- visualize data and check if there is outlier.
- we obtain that we had the sam amount of praice range over the data
- the data include old speicification
- RAM has biggest effict on price * In 0 (low cost) Ram values are changing between 0- 2000 megabytes * In 1 (medium cost) Ram values are changing between 0-3000 megabytes * In 2 (high cost) Ram values are changing between 1000-4000 mb * In 3 (very high cost) Ram values are changing between 2000 and 4000 mb( mostly 3500-4000 mb)
- X fot all feature withot price_range
- y for target (price_range)
- to make sure the model that we will use is appropriate for the problem
- Random forest model (rf)
- K nearest neighbor model (knn)
- Decision tree model (dt)
Why do we need cross validation in machine learning ?
Cross-validation is primarily used to estimate the skill of a machine learning model on unseen data.That is, to use a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model.
Stacking, also known as stacked generalization, is an ensemble method where the models are combined using another machine learning algorithm
Grid search is essentially an optimization algorithm which lets you select the best parameters for your optimization problem from a list of parameter options that you provide, hence automating the 'trial-and-error' method.
- Random forest model (rf)
- K nearest neighbor model (knn)
- Decision tree model (dt)
- Stacking model (staked)
- Stacking model after optimizing (grid)