Skip to content

HystonKayange/Feature-Selection-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š Feature Selector Tool

Python 3.7+ License: MIT


๐Ÿš€ Overview

Feature Selector Tool is a Python application designed for automatic feature selection in machine learning datasets.

It handles missing values, encodes categorical variables, analyzes feature importance, and offers powerful insights through visualizations โ€” supporting both classification and regression tasks.


๐Ÿ› ๏ธ Key Features

  • Automatic Handling of Missing Values
  • Automatic Detection and Encoding of Categorical Variables
  • Insights into Feature Importance and Distribution
  • Support for both CSV and TXT datasets
  • Designed for classification and regression workflows

๐Ÿ“ˆ Visualizations

The Feature Selector Tool includes powerful visualization methods to inspect dataset characteristics:

๐Ÿ”น Feature Importance

๐Ÿ”น Feature Distribution


Usage

1. Navigate to the project directory:
   cd feature_selection_tool

2. Run the tool:
   python main.py

3. Choose a dataset using the file explorer when prompted.

4. Follow the on-screen instructions to perform feature selection.

๐Ÿ“ฆ Prerequisites

  • Python >= 3.7
  • pandas == 2.1.2
  • numpy == 1.26.1
  • scikit-learn == 1.3.2
  • feature-engine == 1.6.2
  • torch == 2.1.0
  • matplotlib == 3.8.1
  • seaborn == 0.13.0

Install all requirements via:

pip install -r requirements.txt

โšก Limitations

  • Large Datasets: May cause memory or computation issues. Consider subsampling large datasets.
  • Outliers: Recommend preprocessing to handle outliers before using the tool.
  • Dataset Characteristics: Some datasets may require extra preprocessing depending on complexity.

๐Ÿ”ฎ Future Improvements

  • Enhanced support for large datasets
  • Improved outlier detection and handling
  • Expanded visualization capabilities
  • Only handles Tabular datasets

๐Ÿ“‚ Tested Datasets

DatasetDescriptionLink
Heart Disease DatasetFeatures related to patient heart health.Dataset Link
Diabetes DatasetHealth indicators for diabetes prediction.Dataset Link
MovieLens DatasetMovie recommendation system data.Dataset Link

๐Ÿค Acknowledgements

Special thanks to all contributors and users who have helped test and improve the Feature Selector Tool!


About

Automated Feature Selection for Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages