Skip to content

Repo for a python porting of R's Leaps Package

License

Notifications You must be signed in to change notification settings

tinosai/pyleaps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PYLEAPS

This package wants to be a simple porting of the regsubset function from R. Unlike leaps in R, the package is not optimized yet, and requires extra work to improve code readability. For now, I have implemented forward/backward/best subset selection for linear regression, building on top of the excellent statsmodels package.

In addition, for now, the user needs to manually code the categorical variable contrasts. This will be fixed in the future.

The package can be easily installed through pip. Check out https://pypi.org/project/pyleaps/ for details

TO INSTALL

pip install pyleaps

The relevant dependencies should automatically get installed, in case they are not present in the environment

COLLABORATION

Any help/collaboration is very welcome. Just let me know what kind of edits you propose and I will be very happy to discuss them.

TO DOs

This section contains a list of future edits:

  1. Improve general code readability
  2. Figure out a way to speed up best subset section. So far, it is way slower than the R counterpart.

USAGE EXAMPLE

This section demonstrates the package usage. In this instance, I will use a dataset from the popular UCI data set repository. Please visit https://archive.ics.uci.edu/ml for further details.

import pandas as pd
import pyleaps
import matplotlib.pyplot as plt

Loading the data set

df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat", sep="\t", header=None)
df.columns = ["freq", "aoa", "ch_len", "u", "suc_thick", "sound_db"]

1. Best Model Selection

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=True, method="full").summary
r2 r2_adj bic aic ssr vars
1 0.152655 0.152091 9835.55871 9824.928273 60570.206223 [intercept, freq]
2 0.323783 0.322882 9503.806546 9487.860891 48337.58386 [intercept, freq, suc_thick]
3 0.43992 0.438799 9227.905534 9206.64466 40035.855465 [intercept, freq, ch_len, suc_thick]
4 0.484574 0.483198 9110.342833 9083.766741 36843.885086 [intercept, u, freq, aoa, ch_len]
5 0.51571 0.514092 9024.006788 8992.115478 34618.219133 [intercept, u, freq, aoa, ch_len, suc_thick]
pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=False, method="full").summary
r2 r2_adj bic aic ssr vars
1 0.915417 0.915361 15074.73871 15069.423491 1987206.653961 [u]
2 0.925304 0.925204 14895.227275 14884.596838 1754927.356895 [u, ch_len]
3 0.937199 0.937073 14641.845804 14625.900149 1475469.985069 [u, aoa, ch_len]
4 0.938688 0.938524 14613.094355 14591.833481 1440485.372375 [u, freq, aoa, ch_len]
5 0.939991 0.93979 14588.128228 14561.552136 1409876.596034 [u, freq, aoa, ch_len, suc_thick]

2. Forward Selection

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=True, method="forward").summary
r2 r2_adj bic aic ssr vars
1 0.152655 0.152091 9835.55871 9824.928273 60570.206223 [intercept, freq]
2 0.323783 0.322882 9503.806546 9487.860891 48337.58386 [intercept, freq, suc_thick]
3 0.43992 0.438799 9227.905534 9206.64466 40035.855465 [intercept, freq, suc_thick, ch_len]
4 0.477646 0.476251 9130.411111 9103.835019 37339.129014 [intercept, freq, suc_thick, ch_len, u]
5 0.51571 0.514092 9024.006788 8992.115478 34618.219133 [intercept, freq, suc_thick, ch_len, u, aoa]
pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=False, method="forward").summary
r2 r2_adj bic aic ssr vars
1 0.915417 0.915361 15074.73871 15069.423491 1987206.653961 [u]
2 0.925304 0.925204 14895.227275 14884.596838 1754927.356895 [u, ch_len]
3 0.937199 0.937073 14641.845804 14625.900149 1475469.985069 [u, ch_len, aoa]
4 0.938688 0.938524 14613.094355 14591.833481 1440485.372375 [u, ch_len, aoa, freq]
5 0.939991 0.93979 14588.128228 14561.552136 1409876.596034 [u, ch_len, aoa, freq, suc_thick]

3. Backward Selection

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=True, method="backward").summary
r2 r2_adj bic aic ssr vars
5 0.51571 0.514092 9024.006788 8992.115478 34618.219133 [intercept, u, freq, aoa, ch_len, suc_thick]
4 0.484574 0.483198 9110.342833 9083.766741 36843.885086 [intercept, u, freq, aoa, ch_len]
3 0.427997 0.426852 9259.565127 9238.304253 40888.126119 [intercept, freq, aoa, ch_len]
2 0.227202 0.226172 9704.462269 9688.516614 55241.410754 [intercept, freq, aoa]
1 0.152655 0.152091 9835.55871 9824.928273 60570.206223 [intercept, freq]
pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=False, method="backward").summary
r2 r2_adj bic aic ssr vars
5 0.939991 0.93979 14588.128228 14561.552136 1409876.596034 [u, freq, aoa, ch_len, suc_thick]
4 0.938688 0.938524 14613.094355 14591.833481 1440485.372375 [u, freq, aoa, ch_len]
3 0.937199 0.937073 14641.845804 14625.900149 1475469.985069 [u, aoa, ch_len]
2 0.925304 0.925204 14895.227275 14884.596838 1754927.356895 [u, ch_len]
1 0.915417 0.915361 15074.73871 15069.423491 1987206.653961 [u]

About

Repo for a python porting of R's Leaps Package

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages