Update examples to only load required number of samples from dataset #1118

kylesayrs · 2025-01-31T19:50:00Z

Purpose

Speed up examples and demonstrate how to load dataset slices, which is especially relevant for oneshot flows

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import oneshot

# Select model and load it.
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="auto",
    torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# Select calibration dataset.
DATASET_ID = "HuggingFaceH4/ultrachat_200k"
DATASET_SPLIT = "train_sft"

# Select number of samples. 512 samples is a good place to start.
# Increasing the number of samples can improve accuracy.
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048

# Load dataset and preprocess.
ds = load_dataset(DATASET_ID, split=f"{DATASET_SPLIT}[:{NUM_CALIBRATION_SAMPLES}]")
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))

...

Changes

Limit data loading by NUM_CALIBRATION_SAMPLES in all examples
Leave select argument for demonstration purposes for those who are new to using the Datasets library

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2025-01-31T19:50:13Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

update examples to only load required number of samples from dataset

57a2c38

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update examples to only load required number of samples from dataset #1118

Update examples to only load required number of samples from dataset #1118

kylesayrs commented Jan 31, 2025 •

edited

Loading

github-actions bot commented Jan 31, 2025

Update examples to only load required number of samples from dataset #1118

Are you sure you want to change the base?

Update examples to only load required number of samples from dataset #1118

Conversation

kylesayrs commented Jan 31, 2025 • edited Loading

Purpose

Changes

github-actions bot commented Jan 31, 2025

kylesayrs commented Jan 31, 2025 •

edited

Loading