This repository is meant to demonstrate the use of classical machine-learning and deep learning techniques for predicting molecule-protein binding classifications on the LEASH-BELKA Kaggle Dataset. It covers:
- Classical machine-learning techniques in combination with FCHL featurization
- Understanding the use of FCHL based descriptors
- Using FCHLs for building the kernel matrix of all building block molecules
- FCHL based feature matrix construction
- FCHL feature matrix combined with logistic regression, random forest and gradient boosting classifiers
- FCHL feature matrix combined with XGBoost and CatBoost (TBD)
- Extending the feature matrix beyond molecular building blocks (including core information, Dy point of attachment information etc.) (TBD)
- FCHLs combined with fully connected feed forward neural networks (TBD)
- Transformer-based architectures (TBD)
- Graph neural networks (TBD)