- Dataset
- Objective
- Steps to Run the Notebooks
- Data Preprocessing
- Feature Detection
- BMI Prediction
- Gender Classification
- Distribution of Offences
- Authors
The dataset contains details of the inmates, including sex, weight, height, front face images, and side face images. It also contains the offences committed by each inmate.
To train models on the given dataset to do the following tasks -
-
Predict gender from the front face image of a person
-
Predict BMI from the front face image of a person
Additionally, plot the number of offences commited and their count.
- Clone the repository.
git clone https://github.com/PranaavPrasad/PRML_Project.git
cd PRML_Project
-
Download and place the dataset inside the repository.
-
Make a folder called
test_images
and place all pictures you want to make predictions on in this folder. Make sure the folder is inside the repository. -
To create the custom dataset (custom_dataset.pkl), run all cells in the notebook
custom_dataset.ipynb
. -
Run all cells in
bmi.ipynb
to predict BMI values. -
Run all cells in
gender.ipynb
to predict gender. -
Run all cells in
distribution.ipynb
to visualize the number of offences commited.
-
Reading gender of each inmate.
-
Calculating BMI of each inmate using the formulae
BMI = weight(kg) / height(m)^2
. -
Reading the front image and side image of the inmates only if the image exists.
-
The inmate is recorded only if the front and side image is available.
-
A limit of 4000 is kept on the number of males to match the number of females.
-
The images are resized to (512,512).
-
The images are converted to grayscale.
-
The images are flattened before storing it in the pickle file.
- The dataset is converted to a pickle file
custom_dataset.pkl
for further use.
It is implemented in both the notebooks, bmi.ipynb
and gender.ipynb
.
The 68 facial landmarks were used as features. The coordinates of the landmarks (x and y values) gives a total of 68 * 2 = 136 dimensions.
dlib
library is used to extract these features.
Principal Component Analysis (PCA) is performed on these 136 dimensions to reduce the number of dimensions to 23. This captures 99.9% of the variance.
Linear Regression model is used with a 80-20 train-test split. The model is evaluated on the following parameters -
-
MAE
-
MSE
-
R2
-
Pearson Coefficient
The model achieved a MAE score of 4.32.
Support vector machine (SVM) model is used with a 80-20 train-test split. The model is evaluated on the following parameters -
-
MAE
-
MSE
-
R2
-
Pearson Coefficient
The model achieved an accuracy of 87.43%.
A count of number of times an offence is committed is displayed. As there are a lot of offences, we have plotted the offences which make up 75% of the data for easier visualisation. Additionally, the count of all offences have also been plotted.