This project employs the K-Nearest Neighbors (K-NN) algorithm to classify cancer cells as either benign or malignant. With an impressive accuracy rate of 96.50%, this model can be used for medical diagnosis and research purposes.
The project uses the Cancer_Data.csv dataset. This dataset contains 569 entries with 30 features, and each entry is labeled as benign (B) or malignant (M).
The project is organized into the following sections:
-
Library and Input File: This section imports necessary libraries and loads the dataset.
-
Data Loading and Editing: The dataset is loaded, and unnecessary columns (such as 'Unnamed: 32' and 'id') are removed. The 'diagnosis' column is also converted to numerical values (1 for 'M' and 0 for 'B').
-
Normalization: Data normalization is performed to scale the values between 0 and 1, preventing high or low values from introducing errors in the model.
-
Train Test Split: The dataset is divided into training and testing sets for model training and evaluation.
-
Initialize K-NN Classifier: The K-NN classifier is initialized with a specified number of neighbors (k).
-
Model Training: The K-NN model is trained on the training data.
-
Model Evaluation: The model's performance is evaluated, including a confusion matrix.
-
Model Result: The training progress is monitored, and the model's performance is assessed.
To run the project, make sure you have the following Python libraries installed:
- NumPy
- pandas
- scikit-learn
- seaborn
- matplotlib
You can install these libraries using pip:
pip install numpy
pip install pandas
pip install scikit-learn
pip install seaborn
pip install matplotlib
- Clone the project repository:
git clone https://github.com/Prometheussx/Kaggle-Notebook-Cancer-Prediction-ACC96.5-With-K-NN.git
cd Kaggle-Notebook-Cancer-Prediction-ACC96.5-With-K-NN
- Ensure you have Python and the required libraries installed.
-
Download the Cancer_Data.csv dataset and place it in the project directory.
-
Follow the code in the "Data Loading and Editing" section to load and preprocess the dataset.
Execute the Python code in the repository files to perform K-NN classification and train the model.
Example: knn_classification(x_train, y_train, x_test, y_test, k=5)
The model's performance is evaluated with metrics such as accuracy and a confusion matrix.
The project attains an accuracy of 96.50% in classifying cancer cells. Training progress and results are visualized in the README.
This project is released under the MIT License.
- Email Address: Erdem Taha Sokullu
- LinkedIn Profile: Erdem Taha Sokullu
- GitHub Profile: Prometheussx
- Kaggle Profile: @erdemtaha
Feel free to reach out if you have any questions or need further information about the project.