diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb index d895005ab..768f145c9 100644 --- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb @@ -7,80 +7,21 @@ "id": "7cf96fb4" }, "source": [ - "# Quantization using the Model Compression Toolkit - example in Pytorch" - ] - }, - { - "cell_type": "markdown", - "id": "59ed8f02", - "metadata": { - "id": "59ed8f02" - }, - "source": [ - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb)" - ] - }, - { - "cell_type": "markdown", - "id": "822944a1", - "metadata": { - "id": "822944a1" - }, - "source": [ - "## Overview" - ] - }, - { - "cell_type": "markdown", - "id": "743dbc3d", - "metadata": { - "id": "743dbc3d" - }, - "source": [ - "This quick start guide covers how to use the Model Compression Toolkit (MCT) for quantizing a PyTorch model. We will do so by giving an end-to-end example, training a model from scratch on MNIST data, then quantizing it using the MCT." - ] - }, - { - "cell_type": "markdown", - "id": "59e2eeae", - "metadata": { - "id": "59e2eeae" - }, - "source": [ - "## Summary" - ] - }, - { - "cell_type": "markdown", - "id": "1daf577a", - "metadata": { - "id": "1daf577a" - }, - "source": [ - "In this tutorial we will cover:\n", - "1. Training a Pytorch model from scratch on MNIST.\n", - "2. Quantizing the model in a hardware-friendly manner (symmetric quantization, power-of-2 thresholds) using 8-bit activations and weights.\n", - "3. We will examine the output quantized model, evaluate it and compare its performance to the original model.\n", - "4. We will approximate the compression gains due to quantization." - ] - }, - { - "cell_type": "markdown", - "id": "8b3396bf", - "metadata": { - "id": "8b3396bf" - }, - "source": [ - "## Setup" - ] - }, - { - "cell_type": "markdown", - "id": "5e7690ef", - "metadata": { - "id": "5e7690ef" - }, - "source": [ + "# Quantization using the Model Compression Toolkit - example in Pytorch\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb)\n", + "\n", + "## Overview\n", + "This quick-start guide explains how to use the Model Compression Toolkit (MCT) to quantize a PyTorch model. We'll provide an end-to-end example, starting with training a model from scratch on MNIST dataset and then applying MCT for quantization.\n", + "\n", + "## Summary\n", + "In this tutorial, we will explore the following:\n", + "\n", + "**1. Training a PyTorch model from scratch on MNIST:** We'll start by building a basic PyTorch model and training it on the MNIST dataset.\n", + "**2. Quantizing the model using 8-bit activations and weights:** We'll employ a hardware-friendly quantization technique, such as symmetric quantization with power-of-2 thresholds.\n", + "**3. Evaluating the quantized model:** We'll compare the performance of the quantized model to the original model, focusing on accuracy.\n", + "**4. Analyzing compression gains:** We'll estimate the compression achieved by quantization.\n", + "\n", + "## Setup\n", "Install the relevant packages:" ] }, @@ -93,12 +34,23 @@ }, "outputs": [], "source": [ - "! pip install -q model-compression-toolkit\n", "! pip install -q torch\n", "! pip install -q torchvision\n", "! pip install -q onnx" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "5441efd2978cea5a", + "metadata": {}, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ] + }, { "cell_type": "code", "execution_count": null, @@ -108,25 +60,12 @@ }, "outputs": [], "source": [ - "from __future__ import print_function\n", - "import argparse\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "from torchvision import datasets, transforms\n", - "from torch.optim.lr_scheduler import StepLR\n", - "import model_compression_toolkit as mct" - ] - }, - { - "cell_type": "markdown", - "id": "1653425b", - "metadata": { - "id": "1653425b" - }, - "source": [ - "## Train a Pytorch classifier model on MNIST" + "from torch.optim.lr_scheduler import StepLR" ] }, { @@ -136,7 +75,8 @@ "id": "02312089" }, "source": [ - "Let us define the network and some helper functions to train and evaluate the model. These are taken from the official Pytorch examples https://github.com/pytorch/examples/blob/main/mnist/main.py" + "## Train a Pytorch classifier model on MNIST\n", + "Let's define the network and a few helper functions to train and evaluate the model. The following code snippets are adapted from the official PyTorch examples: https://github.com/pytorch/examples/blob/main/mnist/main.py" ] }, { @@ -213,7 +153,6 @@ "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", "torch.backends.cudnn.enabled = False\n", "torch.manual_seed(random_seed)\n", - "dataset_folder = '/datasets/mnist/images'\n", "epochs = 2\n", "gamma = 0.7\n", "lr = 1.0" @@ -226,7 +165,7 @@ "id": "c24d3c5a" }, "source": [ - "Let us define the dataset loaders, and optimizer and train the model for 2 epochs." + "Let's define the dataset loaders and optimizer, then train the model for 2 epochs." ] }, { @@ -242,12 +181,13 @@ " transforms.ToTensor(),\n", " transforms.Normalize((0.1307,), (0.3081,))\n", " ])\n", - "dataset1 = datasets.MNIST(dataset_folder, train=True, download=True,\n", + "dataset_folder = './mnist'\n", + "train_dataset = datasets.MNIST(dataset_folder, train=True, download=True,\n", " transform=transform)\n", - "dataset2 = datasets.MNIST(dataset_folder, train=False,\n", + "test_dataset = datasets.MNIST(dataset_folder, train=False,\n", " transform=transform)\n", - "train_loader = torch.utils.data.DataLoader(dataset1, num_workers=1, pin_memory=True, batch_size=batch_size, shuffle=True)\n", - "test_loader = torch.utils.data.DataLoader(dataset2, num_workers=1, pin_memory=True, batch_size=test_batch_size, shuffle=False)\n", + "train_loader = torch.utils.data.DataLoader(train_dataset, num_workers=1, pin_memory=True, batch_size=batch_size, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(test_dataset, num_workers=1, pin_memory=True, batch_size=test_batch_size, shuffle=False)\n", "\n", "model = Net().to(device)\n", "optimizer = optim.Adadelta(model.parameters(), lr=lr)\n", @@ -259,26 +199,6 @@ " scheduler.step()" ] }, - { - "cell_type": "markdown", - "id": "69366614", - "metadata": { - "id": "69366614" - }, - "source": [ - "After training for 2 epochs we get an accuracy of 98.5%. Not bad." - ] - }, - { - "cell_type": "markdown", - "id": "e9cd25a7", - "metadata": { - "id": "e9cd25a7" - }, - "source": [ - "## Hardware-friendly quantization using MCT" - ] - }, { "cell_type": "markdown", "id": "c0321aad", @@ -286,8 +206,8 @@ "id": "c0321aad" }, "source": [ - "Now we would like to quantize this model using the Model Compression Toolkit.\n", - "To do so, we need to define a representative dataset, which is a generator that returns a list of images:" + "## Representative Dataset\n", + "For quantization with MCT, we need to define a representative dataset required by the Post-Training Quantization (PTQ) algorithm. This dataset is a generator that returns a list of images:" ] }, { @@ -299,12 +219,12 @@ }, "outputs": [], "source": [ - "image_data_loader = iter(train_loader)\n", "n_iter=10\n", "\n", - "def representative_data_gen() -> list:\n", + "def representative_dataset_gen():\n", + " dataloader_iter = iter(train_loader)\n", " for _ in range(n_iter):\n", - " yield [next(image_data_loader)[0]]" + " yield [next(dataloader_iter)[0]]" ] }, { @@ -314,7 +234,8 @@ "id": "d0a92bee" }, "source": [ - "Now for the fireworks. Lets run hardware-friendly post training quantization on the model. The output of MCT is a simulated quantized model in the input model's framework. That is, the model adds fake-quantization nodes after layers that need to be quantized. The output model's size on the disk does'nt change, but all the quantization parameters are available for deployment on target hardware." + "## Hardware-friendly quantization using MCT\n", + "Now for the exciting part! Let’s run hardware-friendly post-training quantization on the model. " ] }, { @@ -326,10 +247,14 @@ }, "outputs": [], "source": [ + "import model_compression_toolkit as mct\n", + "\n", + "# Define a `TargetPlatformCapability` object, representing the HW specifications on which we wish to eventually deploy our quantized model.\n", "target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')\n", + "\n", "quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(\n", " in_module=model,\n", - " representative_data_gen=representative_data_gen,\n", + " representative_data_gen=representative_dataset_gen,\n", " target_platform_capabilities=target_platform_cap\n", ")" ] @@ -341,7 +266,7 @@ "id": "d3521637" }, "source": [ - "The MCT prints the approximated model size after real quantization and the compression ratio. In this example, we used the default setting of MCT and compressed the model from 32 bits to 8 bits, hence the compression ratio is x4. Using the simulated quantized model, we can evaluate its performance using the original model's testing environment, and compare its performance to the original model." + "Our model is now quantized. MCT has created a simulated quantized model within the original PyTorch framework by inserting [quantization representation modules](https://github.com/sony/mct_quantizers). These modules, such as `PytorchQuantizationWrapper` and `PytorchActivationQuantizationHolder`, wrap PyTorch layers to simulate the quantization of weights and activations, respectively. While the size of the saved model remains unchanged, all the quantization parameters are stored within these modules and are ready for deployment on the target hardware. In this example, we used the default MCT settings, which compressed the model from 32 bits to 8 bits, resulting in a compression ratio of 4x. Let's print the quantized model and examine the quantization modules:" ] }, { @@ -353,55 +278,68 @@ }, "outputs": [], "source": [ - "print(quantization_info)\n", - "test(quantized_model, device, test_loader)" + "print(quantized_model)" ] }, { "cell_type": "markdown", - "id": "fd09fa27", - "metadata": { - "id": "fd09fa27" - }, + "id": "c677bd61c3ab4649", + "metadata": {}, + "source": [ + "## Models evaluation\n", + "Using the simulated quantized model, we can evaluate its performance and compare the results to the floating-point model.\n", + "\n", + "Let's start with the floating-point model evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fdd038f7aff8cde7", + "metadata": {}, + "outputs": [], "source": [ - "In this scenario, we see that the compression almost didn't affect the accuracy of the model." + "test(model, device, test_loader)" ] }, { "cell_type": "markdown", + "id": "f4f564f31e253f5c", + "metadata": {}, "source": [ - "Now, we can export the quantized model to ONNX:" - ], - "metadata": { - "id": "9nQBVWFhbKXV" - }, - "id": "9nQBVWFhbKXV" + "Finally, let's evaluate the quantized model:" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "c9da2134f0bde415", + "metadata": {}, + "outputs": [], "source": [ - "# Export quantized model to ONNX\n", - "import tempfile\n", - "_, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model\n", - "mct.exporter.pytorch_export_model(model=quantized_model,\n", - " save_model_path=onnx_file_path,\n", - " repr_dataset=representative_data_gen)" - ], + "test(quantized_model, device, test_loader)" + ] + }, + { + "cell_type": "markdown", + "id": "fd09fa27", "metadata": { - "id": "oXMn6bFjbQad" + "id": "fd09fa27" }, - "id": "oXMn6bFjbQad", - "execution_count": null, - "outputs": [] + "source": [ + "Now, we can export the quantized model to ONNX:" + ] }, { - "cell_type": "markdown", - "id": "14877777", + "cell_type": "code", + "execution_count": null, + "id": "oXMn6bFjbQad", "metadata": { - "id": "14877777" + "id": "oXMn6bFjbQad" }, + "outputs": [], "source": [ - "## Conclusion" + "mct.exporter.pytorch_export_model(quantized_model, save_model_path='qmodel.onnx', repr_dataset=representative_dataset_gen)" ] }, { @@ -411,11 +349,13 @@ "id": "bb7e1572" }, "source": [ - "In this tutorial, we demonstrated how to quantize a classification model for MNIST in a hardware-friendly manner using MCT. We saw that we can achieve an x4 compression ratio with minimal performance degradation.\n", + "## Conclusion\n", + "\n", + "In this tutorial, we demonstrated how to quantize a classification model for MNIST in a hardware-friendly manner using MCT. We observed that a 4x compression ratio was achieved with minimal performance degradation.\n", "\n", - "The advantage of quantizing in a hardware-friendly manner is that this model can run more efficiently in the sense of run time, power consumption, and memory on designated hardware.\n", + "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n", "\n", - "This is a very simple model and a very simple task. MCT can demonstrate competitive results on a wide variety of tasks and network architectures. Check out the paper for more details: https://arxiv.org/abs/2109.09113\n", + "While this was a simple model and task, MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n", "\n", "\n", "Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved.\n", @@ -435,6 +375,9 @@ } ], "metadata": { + "colab": { + "provenance": [] + }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", @@ -450,10 +393,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.7" - }, - "colab": { - "provenance": [] + "version": "3.10.4" } }, "nbformat": 4,