Add FAQ and bag-of-tricks (#1006)

Add FAQ and Quantization Troubleshooting pages Add message at end of exporter referencing those pages
sony · Mar 21, 2024 · 5e9d1ba · 5e9d1ba
1 parent 90efbf3
commit 5e9d1ba
Show file tree

Hide file tree

Showing 5 changed files with 337 additions and 7 deletions.
diff --git a/FAQ.md b/FAQ.md
@@ -0,0 +1,56 @@
+# FAQ
+
+**Table of Contents:**
+
+1. [Why does the size of the quantized model remain the same as the original model size?](#1-why-does-the-size-of-the-quantized-model-remain-the-same-as-the-original-model-size)
+2. [Why does loading a quantized exported model from a file fail?](#2-why-does-loading-a-quantized-exported-model-from-a-file-fail)
+3. [Why am I getting a torch.fx error?](#3-why-am-i-getting-a-torchfx-error)
+
+
+### 1. Why does the size of the quantized model remain the same as the original model size?
+
+MCT performs a process known as *fake quantization*, wherein the model's weights and activations are still represented in a floating-point
+format but are quantized to represent a maximum of 2^N unique values (for N-bit cases).
+
+Exporting your model to INT8 format (currently, this is supported only for Keras models exported to TFLite models) will truly compress your model,
+but this exporting method is limited to uniform 8-bit quantization only.  
+Note that the IMX500 converter accepts the "fake quantization" model and supports all the features of MCT (e.g. less than 8 bits for weights bit-width and non-uniform quantization).
+
+For more information and an implementation example, check out the [INT8 TFLite export tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/keras/export/example_keras_export.ipynb)
+
+
+### 2. Why does loading a quantized exported model from a file fail?
+
+The models MCT exports contain QuantizationWrappers and Quantizer objects that define and quantize the model at inference.
+These objects are custom layers and layer wrappers created by MCT (defined in an external repository: [MCTQ](https://github.com/sony/mct_quantizers)), 
+and thus, MCT offers an API for loading these models from a file, depending on the framework.
+
+#### Keras
+
+Keras models can be loaded with the following function:
+```python
+import model_compression_toolkit as mct
+
+quantized_model = mct.keras_load_quantized_model('my_model.keras')
+```
+
+#### PyTorch
+
+PyTorch models can be exported as onnx models. An example of loading a saved onnx model can be found [here](https://sony.github.io/model_optimization/api/api_docs/modules/exporter.html#use-exported-model-for-inference).
+
+*Note:* Running inference on an ONNX model in the `onnxruntime` package has a high latency.
+Inference on the target platform (e.g. the IMX500) is not affected by this latency.
+
+
+### 3. Why am I getting a torch.fx error?
+
+When quantizing a PyTorch model, MCT's initial step involves converting the model into a graph representation using `torch.fx`.
+However, `torch.fx` comes with certain common limitations, with the primary one being its requirement for the computational graph to remain static.
+
+Despite these limitations, some adjustments can be made to facilitate MCT quantization.
+
+**Solution**: (assuming you have access to the model's code)
+
+Check the `torch.fx` error, and search for an identical replacement. Some examples:
+* An `if` statement in a module's `forward` method might can be easily skipped.
+* The `list()` Python method can be replaced with a concatenation operation [A, B, C].
diff --git a/README.md b/README.md
@@ -17,6 +17,7 @@ MCT is developed by researchers and engineers working at Sony Semiconductor Isra
 - [Getting Started](#getting-started)
 - [Supported features](#supported-features)
 - [Results](#results)
+- [Troubleshooting](#trouble-shooting)
 - [Contributions](#contributions)
 - [License](#license)
 
@@ -160,6 +161,12 @@ Results for applying pruning to reduce the parameters of the following models by
 | DenseNet121 [3] | 74.44                | 71.71                 |
 
 
+## Trouble Shooting
+
+If the accuracy degradation of the quantized model is too large for your application, check out the [Quantization Troubleshooting](https://github.com/sony/model_optimization/tree/main/quantization_troubleshooting.md)
+for common pitfalls and some tools to improve quantization accuracy.
+
+Check out the [FAQ](https://github.com/sony/model_optimization/tree/main/FAQ.md) for common issues.
 
 
 ## Contributions

diff --git a/...compression_toolkit/exporter/model_wrapper/keras/builder/fully_quantized_model_builder.py b/...compression_toolkit/exporter/model_wrapper/keras/builder/fully_quantized_model_builder.py
@@ -89,6 +89,11 @@ def get_exportable_keras_model(graph: Graph) -> Tuple[tf.keras.models.Model, Use
                                                         get_activation_quantizer_holder(n,
                                                                                         fw_impl=C.keras.keras_implementation.KerasImplementation())).build_model()
         exportable_model.trainable = False
+
+        Logger.info("Please run your accuracy evaluation on the exported quantized model to verify it's accuracy.\n"
+                    "Checkout the FAQ and Troubleshooting pages for resolving common issues and improving the quantized model accuracy:\n"
+                    "FAQ: https://github.com/sony/model_optimization/tree/main/FAQ.md"
+                    "Quantization Troubleshooting: https://github.com/sony/model_optimization/tree/main/quantization_troubleshooting.md")
         return exportable_model, user_info
 else:
     def get_exportable_keras_model(*args, **kwargs):  # pragma: no cover

diff --git a/...mpression_toolkit/exporter/model_wrapper/pytorch/builder/fully_quantized_model_builder.py b/...mpression_toolkit/exporter/model_wrapper/pytorch/builder/fully_quantized_model_builder.py
@@ -74,13 +74,21 @@ def get_exportable_pytorch_model(graph: Graph):
         Returns:
             Fully quantized PyTorch model.
         """
-        return PyTorchModelBuilder(graph=graph,
-                                   wrapper=lambda n, m:
-                                   fully_quantized_wrapper(n, m,
-                                                           fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation()),
-                                   get_activation_quantizer_holder_fn=lambda n:
-                                   get_activation_quantizer_holder(n,
-                                                                   fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation())).build_model()
+        exportable_model, user_info = PyTorchModelBuilder(graph=graph,
+                                                          wrapper=lambda n, m:
+                                                          fully_quantized_wrapper(n, m,
+                                                                                  fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation()),
+                                                          get_activation_quantizer_holder_fn=lambda n:
+                                                          get_activation_quantizer_holder(n,
+                                                                                          fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation())).build_model()
+
+        Logger.info("Please run your accuracy evaluation on the exported quantized model to verify it's accuracy.\n"
+                    "Checkout the FAQ and Troubleshooting pages for resolving common issues and improving the quantized model accuracy:\n"
+                    "FAQ: https://github.com/sony/model_optimization/tree/main/FAQ.md"
+                    "Quantization Troubleshooting: https://github.com/sony/model_optimization/tree/main/quantization_troubleshooting.md")
+
+        return exportable_model, user_info
+
 
 else:
     def get_exportable_pytorch_model(*args, **kwargs):