Entropy-Based Binning

Summary

The function here is designed for binning continuous independent variables, in the way minimizing total entropy of corresponding response.
Also this function can plot the change of the entropy in the process.
Test dataset is from: https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset

Use guide

Make sure u put the test file into correct path, which is "sample_data/heart.csv" in google colab.
If the response variable is not dummy, u shoud transfer it first.
The function here is designd for binary response only, if ur response is not binary, u should change the entropy base and revise the probability and entropy calculating on your own, which is not that difficult so I just skip the tutorial.

Structure

Every round we will cut out two bins from a leaf node of binary tree in oeder to minimize entropy, then we insert these two bins under that node and these two bins will become new leaf nodes, recursively doing this process until the entropy is optimized, also we can set a threshold using bins count or information gained.

Entropy

Using scipy.stats.entropy to calculate the entropy of every bin, and weight them according to the lenth of bin divided by total lenth of the data. Every round of bining it will find the binning method with most information gained, which is new entropy minus orginal entropy.

Given a set of samples S, if S is partitioned into two intervals S1 and S2 using boundary T, the entropy after partitioning is:

Binary tree

The binary tree is to store the data, every node is a double list containing independent and response variables, the root node means whole data and the leaf nodes mean the remaining bins in that time point, we use leaf nodes to do further bining every round.

Result

In this case, we set threshold as bins count is equal to 4 and keep it running till the entropy is optimized to draw the change of the entropy.
The output is shown below:

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
entropy.ipynb		entropy.ipynb
heart.csv		heart.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entropy-Based Binning

Summary

Use guide

Structure

Entropy

Binary tree

Result

About

Releases

Packages

Languages

MengChiehLiu/Entropy-Based-Binning

Folders and files

Latest commit

History

Repository files navigation

Entropy-Based Binning

Summary

Use guide

Structure

Entropy

Binary tree

Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages