Skip to content
/ CGM Public
forked from elmira-amiri/CGM

Implementation of CGM (a multi-gene machine learning-based risk classification for improving prognosis in breast cancer) in Python and R

Notifications You must be signed in to change notification settings

KISysBio/CGM

 
 

Repository files navigation

Cancer Grade Model: A Multi-gene Machine learning-based Risk Classification for Improving Prognosis in Breast Cancer

E. Amiri Souri1, A. Chenoweth2,3, A. Cheung2,3, S. N. Karagiannis2,3, S. Tsoka1

1 Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Bush House, London WC2B 4BG, United Kingdom
2 St. John’s Institute of Dermatology, School of Basic & Medical Biosciences, King’s College London, & NIHR Biomedical Research Centre at Guy’s and St. Thomas’ Hospitals and King’s College London, Guy’s Hospital, King’s College London, London SE1 9RT, United Kingdom
3 Breast Cancer Now Research Unit, School of Cancer & Pharmaceutical Sciences, King’s College London, Guy’s Cancer Centre, London SE1 9RT, United Kingdom

British Journal of Cancer (2021) | https://www.nature.com/articles/s41416-021-01455-1


Abstract

Background: Prognostic stratification of breast cancers remains a challenge to improve clinical decision making. We employ machine learning on breast cancer transcriptomics from multiple studies to link the expression of specific genes to histological grade and classify tumours into a more or less aggressive prognostic type.

Materials and methods: Microarray data of 5031 untreated breast tumours spanning 33 published datasets and corresponding clinical data were integrated. A machine learning model based on gradient boosted trees was trained on histological grade-1 and grade-3 samples. The resulting predictive model (Cancer Grade Model, CGM) was applied on samples of grade-2 and unknown-grade (3029) for prognostic risk classification.

Results: A 70-gene signature for assessing clinical risk was identified and was shown to be 90% accurate when tested on known histological-grade samples. The predictive framework was validated through survival analysis and showed robust prognostic performance. CGM was cross-referenced with existing genomic tests and demonstrated the competitive predictive power of tumour risk.

Conclusions: CGM is able to classify tumours into better-defined prognostic categories without employing information on tumour size, stage, or subgroups. The model offers means to improve prognosis and support the clinical decision and precision treatments, thereby potentially contributing to preventing underdiagnosis of high-risk tumours and minimising over-treatment of low-risk disease.

Cite this article

About

Implementation of CGM (a multi-gene machine learning-based risk classification for improving prognosis in breast cancer) in Python and R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 60.8%
  • Python 30.4%
  • R 8.8%