HarvardX: PH125.9x: Data Science - Capstone Breast Cancer Diagnosis Project
This repo was created to file and share the second of two projects within the HarvardX Data Science Professional Certificate (see https://courses.edx.org/dashboard/programs/3c32e3e0-b6fe-4ee4-bd4f-210c6339e074/).
The objective of this project was to train different algorithms in order to accurately diagnosis breast cancer based on a prediction as to whether a given sample of cells was from a malignant (cancerous) or benign (non-cancerous) tumour mass. The algorithms were trained and tested on the Wisconsin breast cancer (diagnostic) data-set which is available to download from the UCI machine learning repository (see https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29).
This repo includes a script file (.R) which provides all of the code used for the exploratory analyses as well as for the development, testing and presentation of the results from each of the models used, a markdown file (.Rmd) and the final report (.pdf) that it was used to generate. The .Rmd file refers to a preamble.tex file which was created to relax the latex rules on floating figures/tables within the pdf report and a references.bib file which includes the references cited in the report in bibtex format. Both of these files are included in the repo for information.