tiny_llama_synthetic_data

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

README.md

Compress TinyLLama model using synthetic data

This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of TinyLlama/TinyLlama-1.1B-Chat-v1.0 model. This leads to a significant decrease in model footprint and performance improvement with OpenVINO.

The example includes the following steps:

Prepare TinyLlama/TinyLlama-1.1B-Chat-v1.0 text-generation model in OpenVINO representation using Optimum-Intel.
Prepare synthetic dataset using nncf.data.generate_text_data method.
Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & synthetic dataset.

Install requirements

To use this example:

Create a separate Python* environment and activate it: python3 -m venv nncf_env && source nncf_env/bin/activate
Install dependencies:

pip install -U pip
pip install -r requirements.txt
pip install ../../../../

Run Example

The example is fully automated. Just run the following command in the prepared Python environment:

python main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

tiny_llama_synthetic_data

tiny_llama_synthetic_data

README.md

Compress TinyLLama model using synthetic data

Install requirements

Run Example

Collapse file tree

Files

tiny_llama_synthetic_data

Directory actions

More options

Directory actions

More options

Latest commit

History

tiny_llama_synthetic_data

Folders and files

parent directory

README.md

Compress TinyLLama model using synthetic data

Install requirements

Run Example