Skip to content

Incorrect feature scaling and other issues #7

@ilacosta14

Description

@ilacosta14

Hello, as pointed out previously in another issue, it appears that most of the datasets for the classification tasks are already scaled, except for GS-LGG and GS-GBM. This might create problems in machine learning applications, as scaling should be performed on the training data and not on the entire dataset, to prevent data leakage. I believe this is also the reason why I could not reproduce the scores you have presented in your paper, getting extremely high scores even with Logistic Regression. In addition, in the Readme there is no description for the GBM cancer, and there are some incongruencies between the baseline models presented in the paper and what is available in the repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions