Super demo for nbdev. We'll write a discretizer.
Normally this bit would describe the package, give Install instructions, and then some Examples.
But since this is an nbdev
demo, I'll start by talking about using nbdev
. I'll assume by this time you have:
- Switched to a suitable Python virtual environment
- Installed jupyter
- Installed nbdev
- Found and maybe cloned this repo.
This README is written in index.ipynb
which generates README.md
and the index.html
page in docs/
. Yes, you get to write README in Jupyter!
- GitHub renders
README.md
as the main package description. - It also creates and hosts full package documentation via GutHub Pages.
- Or you can see those locally if you install & run Jekyll.
In index.ipynb
, assuming you've imported your module(s) up top (from mydemo.core import *
), you can generate the mydemo
Python package & docs via these two commands:
$ nbdev_build_lib && nbdev_build_docs
(If you have make
installed, just type make
!)
core
. It's defined in 00_core.ipynb
which generates mydemo/core.py
, which we import here as from mydemo.core import *
, up in the first cell.
# ModuleName
> One-line module description
settings.ini
can be tricky. Remember to require
key packages or it may not work in a new environment (so GitHub's Continuous Integration will fail).
Write your install instructions here. Typically something like:
pip install your_project_name
<-- replace with mydemo
...
Note: nbdev
makes it easy to upload your package to pip
or conda
, but if doing this for work, check with work first!. Similarly with GitHub etc. (Though you can configure nbdev
to use private repositories.)
Fill me in please! Don't forget code examples.
OK, let's use our module's data-grabbing function to get car crash data.
df = getCrashes()
df.sample(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
total | speeding | alcohol | not_distracted | no_previous | ins_premium | ins_losses | abbrev | |
---|---|---|---|---|---|---|---|---|
39 | 11.1 | 3.774 | 4.218 | 10.212 | 8.769 | 1148.99 | 148.58 | RI |
2 | 18.6 | 6.510 | 5.208 | 15.624 | 17.856 | 899.47 | 110.35 | AZ |
13 | 12.8 | 4.608 | 4.352 | 12.032 | 12.288 | 803.11 | 139.15 | IL |
44 | 11.3 | 4.859 | 1.808 | 9.944 | 10.848 | 809.38 | 109.48 | UT |
41 | 19.4 | 6.014 | 6.402 | 19.012 | 16.684 | 669.31 | 96.87 | SD |
Core defines a few handy functions like is_numeric()
. Try it:
df.apply(is_numeric)
total True
speeding True
alcohol True
not_distracted True
no_previous True
ins_premium True
ins_losses True
abbrev False
dtype: bool
It's common to put assert
in some tests so nbdev
can check during build. (This is more common in the modules rather than the index/README.) Here:
assert is_numeric(df['speeding'])
It will report its actions, and then return the discretized dataframe, suitable for passing on to your Bayes Net learning algorithm, etc.
help(discretize)
Help on function discretize in module mydemo.core:
discretize(df, nbins=10, cut=<function qcut at 0x7ff4a04a6440>, verbose=2, drop_useless=True)
Discretize columns in {df} to have at most {nbins} categories.
* Categorical columns: take the Top n-1 plus "Other"
* Continuous columns: cut into {nbins} using {cut}.
Returns a new discretized dataframe with the same column names.
Promotes discrete columns to categories.
Parameters
-----------
df: Dataframe to discretize
nbins: Max number of bins to use. May return fewer.
cut: Cutting method. Default `pd.qcut`. Consider pd.cut, or write your own.
verbose: 0: silent, 1: colnames, 2: (Default) top N for each column
drop_useless: Removes columns that have < 2 unique values.
Replaces numerical NA values with 'NA'.
discretize(df, nbins=4)
total:
(5.899, 12.75] 13
(12.75, 15.6] 13
(15.6, 18.5] 12
(18.5, 23.9] 13
speeding:
(1.7910000000000001, 3.766] 13
(3.766, 4.608] 13
(4.608, 6.439] 12
(6.439, 9.45] 13
alcohol:
(1.592, 3.894] 13
(3.894, 4.554] 13
(4.554, 5.604] 12
(5.604, 10.038] 13
not_distracted:
(1.7590000000000001, 10.478] 13
(10.478, 13.857] 13
(13.857, 16.14] 12
(16.14, 23.661] 13
no_previous:
(5.899, 11.348] 13
(11.348, 13.775] 13
(13.775, 16.755] 12
(16.755, 21.28] 13
ins_premium:
(641.9590000000001, 768.43] 13
(768.43, 858.97] 13
(858.97, 1007.945] 12
(1007.945, 1301.52] 13
ins_losses:
(82.749, 114.645] 13
(114.645, 136.05] 13
(136.05, 151.87] 12
(151.87, 194.78] 13
abbrev:
LA 1
MI 1
MO 1
WY 1
Other 47
DROPPED [] because < 2 vals each.
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
total | speeding | alcohol | not_distracted | no_previous | ins_premium | ins_losses | abbrev | |
---|---|---|---|---|---|---|---|---|
0 | (18.5, 23.9] | (6.439, 9.45] | (5.604, 10.038] | (16.14, 23.661] | (13.775, 16.755] | (768.43, 858.97] | (136.05, 151.87] | Other |
1 | (15.6, 18.5] | (6.439, 9.45] | (3.894, 4.554] | (16.14, 23.661] | (16.755, 21.28] | (1007.945, 1301.52] | (114.645, 136.05] | Other |
2 | (18.5, 23.9] | (6.439, 9.45] | (4.554, 5.604] | (13.857, 16.14] | (16.755, 21.28] | (858.97, 1007.945] | (82.749, 114.645] | Other |
3 | (18.5, 23.9] | (3.766, 4.608] | (5.604, 10.038] | (16.14, 23.661] | (16.755, 21.28] | (768.43, 858.97] | (136.05, 151.87] | Other |
4 | (5.899, 12.75] | (3.766, 4.608] | (1.592, 3.894] | (10.478, 13.857] | (5.899, 11.348] | (858.97, 1007.945] | (151.87, 194.78] | Other |
5 | (12.75, 15.6] | (4.608, 6.439] | (1.592, 3.894] | (10.478, 13.857] | (11.348, 13.775] | (768.43, 858.97] | (136.05, 151.87] | Other |
6 | (5.899, 12.75] | (4.608, 6.439] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (1007.945, 1301.52] | (151.87, 194.78] | Other |
7 | (15.6, 18.5] | (4.608, 6.439] | (4.554, 5.604] | (13.857, 16.14] | (13.775, 16.755] | (1007.945, 1301.52] | (136.05, 151.87] | Other |
8 | (5.899, 12.75] | (1.7910000000000001, 3.766] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (1007.945, 1301.52] | (114.645, 136.05] | Other |
9 | (15.6, 18.5] | (1.7910000000000001, 3.766] | (4.554, 5.604] | (16.14, 23.661] | (16.755, 21.28] | (1007.945, 1301.52] | (136.05, 151.87] | Other |
10 | (12.75, 15.6] | (1.7910000000000001, 3.766] | (3.894, 4.554] | (13.857, 16.14] | (13.775, 16.755] | (858.97, 1007.945] | (136.05, 151.87] | Other |
11 | (15.6, 18.5] | (6.439, 9.45] | (5.604, 10.038] | (13.857, 16.14] | (13.775, 16.755] | (858.97, 1007.945] | (114.645, 136.05] | Other |
12 | (12.75, 15.6] | (4.608, 6.439] | (3.894, 4.554] | (10.478, 13.857] | (13.775, 16.755] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
13 | (12.75, 15.6] | (3.766, 4.608] | (3.894, 4.554] | (10.478, 13.857] | (11.348, 13.775] | (768.43, 858.97] | (136.05, 151.87] | Other |
14 | (12.75, 15.6] | (1.7910000000000001, 3.766] | (3.894, 4.554] | (10.478, 13.857] | (11.348, 13.775] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
15 | (15.6, 18.5] | (1.7910000000000001, 3.766] | (3.894, 4.554] | (13.857, 16.14] | (11.348, 13.775] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
16 | (15.6, 18.5] | (4.608, 6.439] | (3.894, 4.554] | (10.478, 13.857] | (13.775, 16.755] | (768.43, 858.97] | (114.645, 136.05] | Other |
17 | (18.5, 23.9] | (3.766, 4.608] | (4.554, 5.604] | (16.14, 23.661] | (13.775, 16.755] | (858.97, 1007.945] | (136.05, 151.87] | Other |
18 | (18.5, 23.9] | (6.439, 9.45] | (5.604, 10.038] | (13.857, 16.14] | (16.755, 21.28] | (1007.945, 1301.52] | (151.87, 194.78] | LA |
19 | (12.75, 15.6] | (4.608, 6.439] | (3.894, 4.554] | (10.478, 13.857] | (11.348, 13.775] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
20 | (5.899, 12.75] | (3.766, 4.608] | (3.894, 4.554] | (1.7590000000000001, 10.478] | (11.348, 13.775] | (1007.945, 1301.52] | (151.87, 194.78] | Other |
21 | (5.899, 12.75] | (1.7910000000000001, 3.766] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (1007.945, 1301.52] | (114.645, 136.05] | Other |
22 | (12.75, 15.6] | (1.7910000000000001, 3.766] | (3.894, 4.554] | (10.478, 13.857] | (5.899, 11.348] | (1007.945, 1301.52] | (151.87, 194.78] | MI |
23 | (5.899, 12.75] | (1.7910000000000001, 3.766] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (768.43, 858.97] | (114.645, 136.05] | Other |
24 | (15.6, 18.5] | (1.7910000000000001, 3.766] | (4.554, 5.604] | (1.7590000000000001, 10.478] | (16.755, 21.28] | (858.97, 1007.945] | (151.87, 194.78] | Other |
25 | (15.6, 18.5] | (6.439, 9.45] | (4.554, 5.604] | (13.857, 16.14] | (11.348, 13.775] | (768.43, 858.97] | (136.05, 151.87] | MO |
26 | (18.5, 23.9] | (6.439, 9.45] | (5.604, 10.038] | (16.14, 23.661] | (16.755, 21.28] | (768.43, 858.97] | (82.749, 114.645] | Other |
27 | (12.75, 15.6] | (1.7910000000000001, 3.766] | (4.554, 5.604] | (10.478, 13.857] | (11.348, 13.775] | (641.9590000000001, 768.43] | (114.645, 136.05] | Other |
28 | (12.75, 15.6] | (4.608, 6.439] | (4.554, 5.604] | (13.857, 16.14] | (13.775, 16.755] | (1007.945, 1301.52] | (136.05, 151.87] | Other |
29 | (5.899, 12.75] | (3.766, 4.608] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (641.9590000000001, 768.43] | (114.645, 136.05] | Other |
30 | (5.899, 12.75] | (1.7910000000000001, 3.766] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (1007.945, 1301.52] | (151.87, 194.78] | Other |
31 | (15.6, 18.5] | (1.7910000000000001, 3.766] | (4.554, 5.604] | (10.478, 13.857] | (16.755, 21.28] | (858.97, 1007.945] | (114.645, 136.05] | Other |
32 | (5.899, 12.75] | (3.766, 4.608] | (1.592, 3.894] | (10.478, 13.857] | (5.899, 11.348] | (1007.945, 1301.52] | (136.05, 151.87] | Other |
33 | (15.6, 18.5] | (6.439, 9.45] | (4.554, 5.604] | (13.857, 16.14] | (11.348, 13.775] | (641.9590000000001, 768.43] | (114.645, 136.05] | Other |
34 | (18.5, 23.9] | (4.608, 6.439] | (5.604, 10.038] | (16.14, 23.661] | (16.755, 21.28] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
35 | (12.75, 15.6] | (3.766, 4.608] | (4.554, 5.604] | (13.857, 16.14] | (11.348, 13.775] | (641.9590000000001, 768.43] | (114.645, 136.05] | Other |
36 | (18.5, 23.9] | (4.608, 6.439] | (5.604, 10.038] | (16.14, 23.661] | (16.755, 21.28] | (858.97, 1007.945] | (151.87, 194.78] | Other |
37 | (12.75, 15.6] | (3.766, 4.608] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (11.348, 13.775] | (768.43, 858.97] | (82.749, 114.645] | Other |
38 | (15.6, 18.5] | (6.439, 9.45] | (5.604, 10.038] | (16.14, 23.661] | (13.775, 16.755] | (858.97, 1007.945] | (151.87, 194.78] | Other |
39 | (5.899, 12.75] | (3.766, 4.608] | (3.894, 4.554] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (1007.945, 1301.52] | (136.05, 151.87] | Other |
40 | (18.5, 23.9] | (6.439, 9.45] | (5.604, 10.038] | (16.14, 23.661] | (16.755, 21.28] | (768.43, 858.97] | (114.645, 136.05] | Other |
41 | (18.5, 23.9] | (4.608, 6.439] | (5.604, 10.038] | (16.14, 23.661] | (13.775, 16.755] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
42 | (18.5, 23.9] | (3.766, 4.608] | (5.604, 10.038] | (13.857, 16.14] | (13.775, 16.755] | (641.9590000000001, 768.43] | (151.87, 194.78] | Other |
43 | (18.5, 23.9] | (6.439, 9.45] | (5.604, 10.038] | (16.14, 23.661] | (16.755, 21.28] | (858.97, 1007.945] | (151.87, 194.78] | Other |
44 | (5.899, 12.75] | (4.608, 6.439] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (768.43, 858.97] | (82.749, 114.645] | Other |
45 | (12.75, 15.6] | (3.766, 4.608] | (3.894, 4.554] | (10.478, 13.857] | (11.348, 13.775] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
46 | (5.899, 12.75] | (1.7910000000000001, 3.766] | (1.592, 3.894] | (10.478, 13.857] | (5.899, 11.348] | (768.43, 858.97] | (151.87, 194.78] | Other |
47 | (5.899, 12.75] | (3.766, 4.608] | (1.592, 3.894] | (1.7590000000000001, 10.478] | (5.899, 11.348] | (858.97, 1007.945] | (82.749, 114.645] | Other |
48 | (18.5, 23.9] | (6.439, 9.45] | (5.604, 10.038] | (16.14, 23.661] | (16.755, 21.28] | (858.97, 1007.945] | (151.87, 194.78] | Other |
49 | (12.75, 15.6] | (4.608, 6.439] | (3.894, 4.554] | (1.7590000000000001, 10.478] | (11.348, 13.775] | (641.9590000000001, 768.43] | (82.749, 114.645] | Other |
50 | (15.6, 18.5] | (6.439, 9.45] | (4.554, 5.604] | (13.857, 16.14] | (13.775, 16.755] | (768.43, 858.97] | (114.645, 136.05] | WY |