Skip to content

Commit

Permalink
Use grid_search in notebook and add visualization
Browse files Browse the repository at this point in the history
Addresses issues with example notebook brought up at July 26 meetup:

1. Standardize training and testing separately
2. Use AUROC on continuous rather than binary predictions

Clean up variable names. Simplify to to testing/training terminology. No more
"hold out".

Use sklearn.grid_search.GridSearchCV to optimize hyperparameters. Expand range
of l1_ratio and alpha. Specify random_state in GridSearchCV, which should
prevent having to set the seed manually using the random module. Grid search
should enable a more modular architecture enabling swapping in different
algorithms as long as their `param_grid` is defined.

Add exploratory analysis of predictions.

Add parallel processing using joblib to speed up cross validation.

Remove median absolute deviation feature selection. This step had to be removed
or modified because it used testing data for feature selection.
  • Loading branch information
dhimmel committed Jul 28, 2016
1 parent 7ab14ad commit 9557e47
Show file tree
Hide file tree
Showing 2 changed files with 746 additions and 388 deletions.
Loading

0 comments on commit 9557e47

Please sign in to comment.