Lung and Colon Cancer Classification

Implemented machine learning algorithms to differentiate between cancerous and benign lung cells, enhancing early diagnosis of lung cancer. Leveraged skills in ML, data analysis, and computational biology within AI4ALL's Ignite program, showcasing AI's impactful role in healthcare.

Problem Statement

The motivation behind the project of using machine learning and tools to distinguish between cancerous and benign cells in human lungs stems from the urgent need to improve early detection and diagnosis of lung cancer. Lung cancer remains one of the leading causes of cancer-related deaths worldwide, with a high mortality rate often attributed to late-stage diagnoses.

Key Results

The model can be retrieved at https://huggingface.co/ebmonser/lung-cancer-image-classification The model achieves the following results on the testing set:

Loss: 0.0177
Precision: 0.9963
Recall: 0.9963
F1: 0.9963
Accuracy: 0.9963
Confusion matrix: 1245 1 4 0 1250 0 9 0 1241

Methodologies

We started the project by utilizing the Kaggle API within a Python environment in Google Colab to procure lung histopathological images from Kaggle. Then, we employed pandas for data preprocessing and organization, ensuring uniformity across the dataset.

For model training, we fine-tuned a Vision Transformer (ViT) model using PyTorch, a powerful deep learning framework. Throughout the training process, we leveraged NumPy for efficient numerical computations and utilized Matplotlib for advanced visualization of training progress.

Following model training, we evaluated the model's performance on a separate testing dataset using scikit-learn for computing key evaluation metrics such as precision, recall, F1 score, and accuracy. Additionally, we utilized Matplotlib for generating a confusion matrix, providing deeper insights into the model's classification performance.

Finally, we visualized the evaluation results using Matplotlib, enabling clear interpretation of the model's performance. This systematic approach, integrating various technologies and leveraging TensorBoard for monitoring, facilitated the development and assessment of a robust image classification model for lung histopathological images.

Data Sources

Lung and Colon Cancer Histopathological Images: https://www.kaggle.com/datasets/andrewmvd/lung-and-colon-cancer-histopathological-images/code

Technologies Used

Python
pandas
scikit-learn
Matplotlib
PyTorch
NumPy

Authors

This project was completed in collaboration with:

Gabriel Wild (gabew2024@tamu.edu)
Elizabeth Monser (emonser@terpmail.umd.edu)
Sandeep Kandrigi (skandrigi1@tamu.edu)

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets/css		assets/css
3waySplit.ipynb		3waySplit.ipynb
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lung and Colon Cancer Classification

Problem Statement

Key Results

Methodologies

Data Sources

Technologies Used

Authors

About

Uh oh!

Releases

Packages

Languages

License

skandrigi/Ignite_LungColonTumor_Classification

Folders and files

Latest commit

History

Repository files navigation

Lung and Colon Cancer Classification

Problem Statement

Key Results

Methodologies

Data Sources

Technologies Used

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages