SKLL 1.5
This is a major new release of SKLL.
What's new
- Several new scikit-learn learners included along with reasonable default parameter grids for tuning, where appropriate (issues #256 & #375, PR #377).
BayesianRidge
DummyRegressor
HuberRegressors
Lars
MLPRegressor
RANSACRegressor
TheilSenRegressor
DummyClassifier
MLPClassifier
RidgeClassifier
- Allow computing any number of additional evaluation metrics in addition to the tuning objective (issue #350, PR #384).
- Rename
cv_folds_file
configuration option tofolds_file
. The former is still supported with a deprecation warning but will be removed in the next release (PR #367). - Add a new configuration option
use_folds_file_for_grid_search
which controls whether the inner-loop grid-search in a cross-validation experiment with a custom folds file also uses the folds from the file. It's set to True by default. Setting it to False means that the inner loop uses regular 3-fold cross-validation and ignores the file (PR #367). - Also add a keyword argument called
use_custom_folds_for_grid_search
to theLearner.cross_validate()
method (PR #367). - Learning curves can now be plotted from existing summary files using the new
plot_learning_curves
command line utility (issue #346, PR #396). - Overhaul logging in SKLL. All messages are now logged both to the console (if running interactively) and to log files. Read more about the SKLL log files in the Output Files section of the documentation (issue #369, PR #380).
neg_log_loss
is now available as an objective function for classification (issue #327, PR #392).
Changes
- SKLL now supports Python 3.6. Although Python 3.4 and 3.5 will still work, 3.6 is now the officially supported Python 3 version. Python 2.7 is still supported. (issue #355, PR #360).
- The required version of scikit-learn has been bumped up to 0.19.1 (issue #328, PR #330).
- The learning curve y-limits are now computed a bit more intelligently (issue #389, PR #390).
- Raise a warning if ablation flag is used for an experiment that uses
train_file
/test_file
- this is not supported (issue #313, PR #392). - Raise a warning if both
fixed_parameters
andparam_grids
are specified (issue #185, PR #297). - Disable grid search if no default parameter grids are available in SKLL and the user doesn't provide parameter grids either (issue #376, PR #378).
- SKLL has a copy of scikit-learn's
DictVectorizer
because it needs some custom functionality. Most (but not all) of our modifications have now been merged into scikit-learn so our custom version is now significantly condensed down to just a single method (issue #263, PR #374). - Improved outputs for cross-validation tasks (issues #349 & #371, PRs #365 & #372)
- When a folds file is specified, the log erroneously showed the full dictionary.
- Show number of cross-validation folds in results to be via folds file if a folds file is specified.
- Show grid search folds in results to be via folds file if the grid search ends up using the folds file.
- Do not show the stratified folds information in results when a folds file is specified.
- Show the value of
use_folds_file_for_grid_search
in results when appropriate. - Show grid search related information in results only when we are actually doing grid search.
- The Travis CI plan was broken up into multiple jobs in order to get around the 50 minute limit (issue #385, PR #387).
- For the conda package, some of the dependencies are now sourced from the
conda-forge
channel.
Bugfixes
- Fix the bug that was causing the inner grid-search loop of a cross-validation experiment to use a single job instead of the number specified via
grid_search_jobs
(issue #363, PR #367). - Fix unbound variable in
readers.py
(issue #340, PR #392). - Fix bug when running a learning curve experiment via
gridmap
(issue #386, PR #390). - Fix a mismatch between the default number of grid search folds and the default number of slots requested via
gridmap
(issue #342, PR #367).