The objective of the competition is to help us build as good a model as possible so that we can, as optimally as this data allows, relate molecular information, to an actual biological response.
Problem - to build a classifier that determines biological activity.
This project aims to optimize the hyperparameters of two popular classification models in various ways and includes:
- loading and splitting data into samples(the data has been prepared)
- building a logistic regression and a random forest based on default parameters
- tuning of hyperparameters using GridSearchCV
- tuning of hyperparameters using RandomizedSearchCV
- tuning of hyperparameters using Hyperopt
- tuning of hyperparameters using Optuna
- analysis of results
Project structure:
- data - the folder with the original tabular data
- plotly - a folder with charts for viewing them in the browser
- predictingTheBiologicalResponse.ipynb - jupyter-notebook containing the main project code
- requirements.txt - a file with the versions of the modules used, for reproducibility of the code
We have shared the data in the comma separated values (CSV) format. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological response; the molecule was seen to elicit this response (1), or not (0). The remaining columns represent molecular descriptors (d1 through d1776), these are calculated properties that can capture some of the characteristics of the molecule - for example size, shape, or elemental constitution. The descriptor matrix has been normalized.
- Python (3.11.1):
git clone https://github.com/StartrexII/tuningHyperparameters
All other information is presented in the jupyter-notebook predictingTheBiologicalResponse.ipynb.
If the graphs are not displayed on GitHub, you can open them in the browser, they are in the folder plotly/
If the information on this project seems interesting or useful to you, then I will be very grateful to you if you mark the repository and profile with ⭐️⭐️⭐️:)