Skip to content

Commit

Permalink
Add FAQ and bag-of-tricks (#1006)
Browse files Browse the repository at this point in the history
Add FAQ and Quantization Troubleshooting pages
Add message at end of exporter referencing those pages
  • Loading branch information
elad-c committed Mar 21, 2024
1 parent 90efbf3 commit 5e9d1ba
Show file tree
Hide file tree
Showing 5 changed files with 337 additions and 7 deletions.
56 changes: 56 additions & 0 deletions FAQ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# FAQ

**Table of Contents:**

1. [Why does the size of the quantized model remain the same as the original model size?](#1-why-does-the-size-of-the-quantized-model-remain-the-same-as-the-original-model-size)
2. [Why does loading a quantized exported model from a file fail?](#2-why-does-loading-a-quantized-exported-model-from-a-file-fail)
3. [Why am I getting a torch.fx error?](#3-why-am-i-getting-a-torchfx-error)


### 1. Why does the size of the quantized model remain the same as the original model size?

MCT performs a process known as *fake quantization*, wherein the model's weights and activations are still represented in a floating-point
format but are quantized to represent a maximum of 2^N unique values (for N-bit cases).

Exporting your model to INT8 format (currently, this is supported only for Keras models exported to TFLite models) will truly compress your model,
but this exporting method is limited to uniform 8-bit quantization only.
Note that the IMX500 converter accepts the "fake quantization" model and supports all the features of MCT (e.g. less than 8 bits for weights bit-width and non-uniform quantization).

For more information and an implementation example, check out the [INT8 TFLite export tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/keras/export/example_keras_export.ipynb)


### 2. Why does loading a quantized exported model from a file fail?

The models MCT exports contain QuantizationWrappers and Quantizer objects that define and quantize the model at inference.
These objects are custom layers and layer wrappers created by MCT (defined in an external repository: [MCTQ](https://github.com/sony/mct_quantizers)),
and thus, MCT offers an API for loading these models from a file, depending on the framework.

#### Keras

Keras models can be loaded with the following function:
```python
import model_compression_toolkit as mct

quantized_model = mct.keras_load_quantized_model('my_model.keras')
```

#### PyTorch

PyTorch models can be exported as onnx models. An example of loading a saved onnx model can be found [here](https://sony.github.io/model_optimization/api/api_docs/modules/exporter.html#use-exported-model-for-inference).

*Note:* Running inference on an ONNX model in the `onnxruntime` package has a high latency.
Inference on the target platform (e.g. the IMX500) is not affected by this latency.


### 3. Why am I getting a torch.fx error?

When quantizing a PyTorch model, MCT's initial step involves converting the model into a graph representation using `torch.fx`.
However, `torch.fx` comes with certain common limitations, with the primary one being its requirement for the computational graph to remain static.

Despite these limitations, some adjustments can be made to facilitate MCT quantization.

**Solution**: (assuming you have access to the model's code)

Check the `torch.fx` error, and search for an identical replacement. Some examples:
* An `if` statement in a module's `forward` method might can be easily skipped.
* The `list()` Python method can be replaced with a concatenation operation [A, B, C].
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ MCT is developed by researchers and engineers working at Sony Semiconductor Isra
- [Getting Started](#getting-started)
- [Supported features](#supported-features)
- [Results](#results)
- [Troubleshooting](#trouble-shooting)
- [Contributions](#contributions)
- [License](#license)

Expand Down Expand Up @@ -160,6 +161,12 @@ Results for applying pruning to reduce the parameters of the following models by
| DenseNet121 [3] | 74.44 | 71.71 |


## Trouble Shooting

If the accuracy degradation of the quantized model is too large for your application, check out the [Quantization Troubleshooting](https://github.com/sony/model_optimization/tree/main/quantization_troubleshooting.md)
for common pitfalls and some tools to improve quantization accuracy.

Check out the [FAQ](https://github.com/sony/model_optimization/tree/main/FAQ.md) for common issues.


## Contributions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,11 @@ def get_exportable_keras_model(graph: Graph) -> Tuple[tf.keras.models.Model, Use
get_activation_quantizer_holder(n,
fw_impl=C.keras.keras_implementation.KerasImplementation())).build_model()
exportable_model.trainable = False

Logger.info("Please run your accuracy evaluation on the exported quantized model to verify it's accuracy.\n"
"Checkout the FAQ and Troubleshooting pages for resolving common issues and improving the quantized model accuracy:\n"
"FAQ: https://github.com/sony/model_optimization/tree/main/FAQ.md"
"Quantization Troubleshooting: https://github.com/sony/model_optimization/tree/main/quantization_troubleshooting.md")
return exportable_model, user_info
else:
def get_exportable_keras_model(*args, **kwargs): # pragma: no cover
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,21 @@ def get_exportable_pytorch_model(graph: Graph):
Returns:
Fully quantized PyTorch model.
"""
return PyTorchModelBuilder(graph=graph,
wrapper=lambda n, m:
fully_quantized_wrapper(n, m,
fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation()),
get_activation_quantizer_holder_fn=lambda n:
get_activation_quantizer_holder(n,
fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation())).build_model()
exportable_model, user_info = PyTorchModelBuilder(graph=graph,
wrapper=lambda n, m:
fully_quantized_wrapper(n, m,
fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation()),
get_activation_quantizer_holder_fn=lambda n:
get_activation_quantizer_holder(n,
fw_impl=C.pytorch.pytorch_implementation.PytorchImplementation())).build_model()

Logger.info("Please run your accuracy evaluation on the exported quantized model to verify it's accuracy.\n"
"Checkout the FAQ and Troubleshooting pages for resolving common issues and improving the quantized model accuracy:\n"
"FAQ: https://github.com/sony/model_optimization/tree/main/FAQ.md"
"Quantization Troubleshooting: https://github.com/sony/model_optimization/tree/main/quantization_troubleshooting.md")

return exportable_model, user_info


else:
def get_exportable_pytorch_model(*args, **kwargs):
Expand Down
Loading

0 comments on commit 5e9d1ba

Please sign in to comment.