A collection of AutoML experiments than can be executed in Docker and can use Kafka as streaming data source.
Required: Docker
Strongly recommended: Docker Compose, Make
Useful: kafkacat
All containers at once:
make up
Individual containers:
docker-compose up auto-sklearn zookeeper broker
OpenML dataset:
make publish-openml-dataset
For any other dataset:
cat ./datasets/covtype.csv | kafkacat -P -b localhost -t covtype
make train-scikit-multiflow-kafka
Or directly using Docker Compose
docker-compose exec auto-sklearn python training/scikit-multiflow-kafka.py
Alternatively, you can run a single container using only Docker run.
Find the right port for the experiment/service in the docker-compose.yml
Navigate to: localhost:<port>
, for example: localhost:8888
Get the Jupyter token by running
docker-compose logs <service_name>
For example:
docker-compose logs auto-sklearn
Copy the token and use it to login in Jupyter.
All containers at once:
make down
For developing the experiments it is useful to have the dependencies installed locally
in a virtualenv
. It helps IDEs to provide autocompletion information.
- Create and activate a
virtualenv
- Install some or all dependencies from
dev-requirements.txt
pip install -r dev-requirements.txt