Keras implementation of Hinton's knowledge distillation (KD), a way of transferring knowledge from a large model into a smaller model.
Caltech-256 dataset for a demonstration of the technique.
- I resize all images to 299x299.
- For validation I use 20 images from each category.
- For training I use 100 images from each category.
- I use random crops and color augmentation to balance the dataset.
Teacher: Xception Student: Squeezenet
Please complete the tutorial, if you want to learn how to use knowledge distillation to train a student model with unlabeled data.
- Azure Subscription ID. Make sure you have an Azure subscription ID that allows you to create compute targets (compare VM sizes).
- Knowledge of basic data science and machine learning concepts. Here and here you'll find short introductory material.
- Moderate skills in coding with Python and machine learning using Python. A good place to start is here.
- Prior experience with AML services is recommended.
- A recent installation of Miniconda (or Anaconda)
- An IDE for editing code: VS code https://code.visualstudio.com/
-
Buciluǎ, Cristian, Rich Caruana, and Alexandru Niculescu-Mizil. "Model compression." Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.
-
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).