An anytime implementation of scikit-learn GridSearchCV.
Waiting for GridSearchCV to finish running can be quite long, using an anytime approach will allow the algorithm to run in the background, with an endpoint to query for best result.
The project consists of the following parts:
- A web application for creating and displaying searches and results through a REST API
- A distributed cluster for running the searches
The project requires:
- Python (>=3.6)
- Django (>=2.1)
- PostgreSQL (>=9.6)
- distributed (>=1.25)
# clone repo
git clone https://github.com/OryJonay/anytime-gridsearch.git anytimegridsearch
# create virtual environment
cd anytimegridsearch
virtualenv -p python3.5 .
# install dependencies
pip install -r requirements.txt
# run tests
python manage.py test
Alternatively, use Docker Compose to run everything:
docker-compose up
And the web application will run on localhost (on port 8000).
The old way of using scikit-learn's GridSearchCV (taken from the examples part of the documentation):
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
print clf.best_params_['kernel']
This will run all possible grid points (4 in this example), and only after all grid points are fitted and cross validated will return (and print the best kernel). We'll do it like this:
from sklearn import svm, datasets
from AnyTimeGridSearchCV.grids.anytime_search import AnyTimeGridSearchCV as GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
print clf.best_params_['kernel']
And that's (mostly) it- just change the search algorithm to the new one, and voilà- all done! The call to the search algorithm is non blocking, so it's possible to query the search algorithm before all the grid points are cross validated.
API documentation can be found on the web server at /docs/.
Things to do in the future (not sorted by priority):
- Home page for project