Incorrect feature scaling and other issues

Hello, as pointed out previously in another issue, it appears that most of the datasets for the classification tasks are already scaled, except for GS-LGG and GS-GBM. This might create problems in machine learning applications, as scaling should be performed on the training data and not on the entire dataset, to prevent data leakage. I believe this is also the reason why I could not reproduce the scores you have presented in your paper, getting extremely high scores even with Logistic Regression. In addition, in the Readme there is no description for the GBM cancer, and there are some incongruencies between the baseline models presented in the paper and what is available in the repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect feature scaling and other issues #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incorrect feature scaling and other issues #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions