A data science project involving data exploration, data engineering, and machine learning engineering to detect diseases in patients using machine learning.
Here you can find more information about the competition: https://www.kaggle.com/competitions/icr-identify-age-related-conditions/overview
I applied many techniques, incluiding different types of imputation, vectorization, one hot encoding, correlation analysis, decision tree machine learning, deep learning, gradient boosting, ensambling, different types of k-fold-cross-validation, probability curve adjustment and more.
To try to achieve better disease detection I used different approaches. Approaches.md contains a brief description of each approach. Working code is located on the different notebooks labeled approach x - descriptive title.