-
Vanessa Jimenez
-
Luis Millet
-
Rayner Morla
-
Rad Joe
This project aims to leverage Machine Learning (ML) to predict the likelihood of cervical cancer based on various patient data. Early detection of cervical cancer can significantly improve treatment outcomes and reduce mortality rates. By analyzing a comprehensive dataset containing demographic data, medical history, and test results, we aim to build a predictive model that assists in early diagnosis.
How accurately can we predict the likelihood of a patient developing cervical cancer using demographic data and medical history?
- Dataset: Cervical Cancer Risk Factors dataset from the UCI Machine Learning Repository.
- Link: UCI Cervical Cancer Risk Factors Dataset
-
Data Cleaning:
- Handled missing values.
- Corrected data types.
- Removed outliers.
-
Data Wrangling:
- Formatted the data to be suitable for ML models.
The dataset contains the following columns:
-
Age: Age of the patient.
-
Number of sexual partners: Number of sexual partners the patient has had.
-
First sexual intercourse: Age at first sexual intercourse.
-
Num of pregnancies: Number of pregnancies the patient has had.
-
Smokes: Binary indicator if the patient smokes (0: No, 1: Yes).
-
Smokes (years): Number of years the patient has smoked.
-
Smokes (packs/year): Average packs of cigarettes the patient smoked per year.
-
Hormonal Contraceptives: Binary indicator if the patient uses hormonal contraceptives (0: No, 1: Yes).
-
Hormonal Contraceptives (years): Number of years the patient has used hormonal contraceptives.
-
IUD: Binary indicator if the patient uses an intrauterine device (0: No, 1: Yes).
-
IUD (years): Number of years the patient has used an intrauterine device.
-
STDs: Binary indicator if the patient has had sexually transmitted diseases (0: No, 1: Yes).
-
STDs (number): Number of sexually transmitted diseases the patient has had.
-
STDs:condylomatosis: Binary indicator if the patient has had condylomatosis (0: No, 1: Yes).
-
STDs:cervical condylomatosis: Binary indicator if the patient has had cervical condylomatosis (0: No, 1: Yes).
-
STDs:vaginal condylomatosis: Binary indicator if the patient has had vaginal condylomatosis (0: No, 1: Yes).
-
STDs:vulvo-perineal condylomatosis: Binary indicator if the patient has had vulvo-perineal condylomatosis (0: No, 1: Yes).
-
STDs:syphilis: Binary indicator if the patient has had syphilis (0: No, 1: Yes).
-
STDs:pelvic inflammatory disease: Binary indicator if the patient has had pelvic inflammatory disease (0: No, 1: Yes).
-
STDs:genital herpes: Binary indicator if the patient has had genital herpes (0: No, 1: Yes).
-
STDs:molluscum contagiosum: Binary indicator if the patient has had molluscum contagiosum (0: No, 1: Yes).
-
STDs:AIDS: Binary indicator if the patient has had AIDS (0: No, 1: Yes).
-
STDs:HIV: Binary indicator if the patient has had HIV (0: No, 1: Yes).
-
STDs:Hepatitis B: Binary indicator if the patient has had Hepatitis B (0: No, 1: Yes).
-
STDs:HPV: Binary indicator if the patient has had Human Papillomavirus (0: No, 1: Yes).
-
STDs: Number of diagnosis: Number of STD diagnoses the patient has received.
-
STDs: Time since first diagnosis: Time in years since the patient's first STD diagnosis.
-
STDs: Time since last diagnosis: Time in years since the patient's last STD diagnosis.
-
Dx:Cancer: Binary indicator if the patient has been diagnosed with cancer (0: No, 1: Yes).
-
Dx:CIN: Binary indicator if the patient has been diagnosed with cervical intraepithelial neoplasia (0: No, 1: Yes).
-
Dx:HPV: Binary indicator if the patient has been diagnosed with HPV (0: No, 1: Yes).
-
Dx: Binary indicator if the patient has been diagnosed with any of the above conditions (0: No, 1: Yes).
-
Hinselmann: Binary indicator for a specific diagnostic test result (0: No, 1: Yes).
-
Schiller: Binary indicator for a specific diagnostic test result (0: No, 1: Yes).
-
Citology: Binary indicator for a specific diagnostic test result (0: No, 1: Yes).
-
Biopsy: Binary indicator for a biopsy result (0: No, 1: Yes).
This metadata list provides a comprehensive description of the dataset and the features included, helping to understand the data structure and the variables involved in the cervical cancer prediction model.
Link - https://www.canva.com/design/DAGHw-5U4jg/D4Zp5sPMn3U5NrrIl9gpcQ/edit