The MNIST dataset is useful for those who want to try learning techniques and pattern recognition methods on real-world data. The classification task is tackled using classical Machine Learning and Deep Learning approaches. On top of the training loop, there is an experiment tracker that will allow the data practitioner to decide which approach is better.
It is recommended to use virtualenvwrapper
. Find here the instructions to install and use it.
Set the MLFlow
environment variables as follows
export MLFLOW_TRACKING_URI=sqlite:///experiment/mlflow/db/mydb.sqlite
export ARTIFACT_ROOT=./experiment/mlflow/mlruns/
You can also find them in mlflow.cfg.
To enable experiment tracking and model registry start MLFlow
server as follows:
mlflow server --default-artifact-root $ARTIFACT_ROOT --backend-store-uri $MLFLOW_TRACKING_URI
The MNIST dataset contains 70,000 grayscale small images (28x28) of labeled handwritten digits, from 0 - 9. This problem is often called the "Hello World" of Machine Learning because anyone who learns Machine Learning tackles this problem at any time.
Further information about the dataset can be found on the following web pages:
Some examples of the digits are shown below.
Three basic components:
- ETL
- Model
- Training
- Evaluation
- Deployment
Depending on the model evaluation or data practitioner criteria the ETL or the Model may suffer changes.
A simplified approach using binary classification, a 5-detector
:
- Classical Machine Learning
- Scikit-Learn: Stochastic Gradient Descent
- Scikit-Learn: Random Forest
A multiclass classification with the proper architecture:
- Convolutional Neural Network
- Using scaled images
- using
n_pca_components
to keep 95% of the explained variance
This hands-on experience with Computer Vision common projects was inspired by Tensorflow in Practice by Laurence Moroney - Coursera and the concepts explained in Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow by Aureélien Géron - O'Reily.
- Tensorflow in Practice - Coursera Specialization
- Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow - O'Reily Book
Sections of code were taken from both sources stated in Acknowledgments and all the datasets used in this notebook are open-sourced.