PYLEAPS

This package wants to be a simple porting of the regsubset function from R. Unlike leaps in R, the package is not optimized yet, and requires extra work to improve code readability. For now, I have implemented forward/backward/best subset selection for linear regression, building on top of the excellent statsmodels package.

In addition, for now, the user needs to manually code the categorical variable contrasts. This will be fixed in the future.

The package can be easily installed through pip. Check out https://pypi.org/project/pyleaps/ for details

TO INSTALL

pip install pyleaps

The relevant dependencies should automatically get installed, in case they are not present in the environment

COLLABORATION

Any help/collaboration is very welcome. Just let me know what kind of edits you propose and I will be very happy to discuss them.

TO DOs

This section contains a list of future edits:

Improve general code readability
Figure out a way to speed up best subset section. So far, it is way slower than the R counterpart.

USAGE EXAMPLE

This section demonstrates the package usage. In this instance, I will use a dataset from the popular UCI data set repository. Please visit https://archive.ics.uci.edu/ml for further details.

import pandas as pd
import pyleaps
import matplotlib.pyplot as plt

Loading the data set

df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat", sep="\t", header=None)
df.columns = ["freq", "aoa", "ch_len", "u", "suc_thick", "sound_db"]

1. Best Model Selection

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=True, method="full").summary

	r2	r2_adj	bic	aic	ssr	vars
1	0.152655	0.152091	9835.55871	9824.928273	60570.206223	[intercept, freq]
2	0.323783	0.322882	9503.806546	9487.860891	48337.58386	[intercept, freq, suc_thick]
3	0.43992	0.438799	9227.905534	9206.64466	40035.855465	[intercept, freq, ch_len, suc_thick]
4	0.484574	0.483198	9110.342833	9083.766741	36843.885086	[intercept, u, freq, aoa, ch_len]
5	0.51571	0.514092	9024.006788	8992.115478	34618.219133	[intercept, u, freq, aoa, ch_len, suc_thick]

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=False, method="full").summary

	r2	r2_adj	bic	aic	ssr	vars
1	0.915417	0.915361	15074.73871	15069.423491	1987206.653961	[u]
2	0.925304	0.925204	14895.227275	14884.596838	1754927.356895	[u, ch_len]
3	0.937199	0.937073	14641.845804	14625.900149	1475469.985069	[u, aoa, ch_len]
4	0.938688	0.938524	14613.094355	14591.833481	1440485.372375	[u, freq, aoa, ch_len]
5	0.939991	0.93979	14588.128228	14561.552136	1409876.596034	[u, freq, aoa, ch_len, suc_thick]

2. Forward Selection

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=True, method="forward").summary

	r2	r2_adj	bic	aic	ssr	vars
1	0.152655	0.152091	9835.55871	9824.928273	60570.206223	[intercept, freq]
2	0.323783	0.322882	9503.806546	9487.860891	48337.58386	[intercept, freq, suc_thick]
3	0.43992	0.438799	9227.905534	9206.64466	40035.855465	[intercept, freq, suc_thick, ch_len]
4	0.477646	0.476251	9130.411111	9103.835019	37339.129014	[intercept, freq, suc_thick, ch_len, u]
5	0.51571	0.514092	9024.006788	8992.115478	34618.219133	[intercept, freq, suc_thick, ch_len, u, aoa]

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=False, method="forward").summary

	r2	r2_adj	bic	aic	ssr	vars
1	0.915417	0.915361	15074.73871	15069.423491	1987206.653961	[u]
2	0.925304	0.925204	14895.227275	14884.596838	1754927.356895	[u, ch_len]
3	0.937199	0.937073	14641.845804	14625.900149	1475469.985069	[u, ch_len, aoa]
4	0.938688	0.938524	14613.094355	14591.833481	1440485.372375	[u, ch_len, aoa, freq]
5	0.939991	0.93979	14588.128228	14561.552136	1409876.596034	[u, ch_len, aoa, freq, suc_thick]

3. Backward Selection

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=True, method="backward").summary

	r2	r2_adj	bic	aic	ssr	vars
5	0.51571	0.514092	9024.006788	8992.115478	34618.219133	[intercept, u, freq, aoa, ch_len, suc_thick]
4	0.484574	0.483198	9110.342833	9083.766741	36843.885086	[intercept, u, freq, aoa, ch_len]
3	0.427997	0.426852	9259.565127	9238.304253	40888.126119	[intercept, freq, aoa, ch_len]
2	0.227202	0.226172	9704.462269	9688.516614	55241.410754	[intercept, freq, aoa]
1	0.152655	0.152091	9835.55871	9824.928273	60570.206223	[intercept, freq]

pyleaps.regsubsets(df, "sound_db", df.columns.to_list(), intercept=False, method="backward").summary

	r2	r2_adj	bic	aic	ssr	vars
5	0.939991	0.93979	14588.128228	14561.552136	1409876.596034	[u, freq, aoa, ch_len, suc_thick]
4	0.938688	0.938524	14613.094355	14591.833481	1440485.372375	[u, freq, aoa, ch_len]
3	0.937199	0.937073	14641.845804	14625.900149	1475469.985069	[u, aoa, ch_len]
2	0.925304	0.925204	14895.227275	14884.596838	1754927.356895	[u, ch_len]
1	0.915417	0.915361	15074.73871	15069.423491	1987206.653961	[u]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
build/lib		build/lib
dist		dist
examples		examples
pyleaps/src		pyleaps/src
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PYLEAPS

TO INSTALL

COLLABORATION

TO DOs

USAGE EXAMPLE

1. Best Model Selection

2. Forward Selection

3. Backward Selection

About

Releases

Packages

Languages

License

tinosai/pyleaps

Folders and files

Latest commit

History

Repository files navigation

PYLEAPS

TO INSTALL

COLLABORATION

TO DOs

USAGE EXAMPLE

1. Best Model Selection

2. Forward Selection

3. Backward Selection

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages