Skip to content

Conversation

@anchen25
Copy link

@anchen25 anchen25 commented Nov 22, 2025

Related issues

Description

Implementing GPT-2 based model distilgpt2 on the wikitext-2-raw-v1 dataset using AIHWKit. The example demonstrates how to convert the model to analog, run fine-tuning, text-generation, and inference.

Details

  1. Model and Dataset:
  • Implemented an example using the smallest GPT-2 model (distilgpt2).
  • Utilized the wikitext-2-raw-v1 dataset for training and validation, which is smaller and faster to process compared to
  • openwebtext.
  1. Training and Inference Setup:
  • Configured the model to use analog inference with specified noise levels.
  • Added support for digital inference as an option.
  • Implemented preprocessing functions to handle dataset tokenization.
  • Provided functionality to train the model and save/load checkpoints.
  1. Logging and Monitoring:
  • Integrated TensorBoard for logging training and validation metrics.
  • Added TensorBoardCallback to the Trainer for seamless logging.
  • Configured the script to save logs in a specific directory and visualize them using TensorBoard.
  1. Performance Metrics:
  • Calculated validation loss and perplexity as the primary performance metrics.
  • Digital model loss varies with learning rate. Lowest training loss = 3.26, lowest inference loss = 3.55
  • Loss of HWA-finetuned analog model depends on learning rate and optimizer. Lowest HWA inference loss = 3.55, matching digital model.
  1. How to use command-line arguments:

For text generation (of both digital and analog models), use command line arguments: "gt", "L", "c", "pt"

Example 1: python 36_gpt2_on_wikitext_v3.py -gt -pt "Once upon a time" ---> text generation using the pre-trained DistilGPT2 model without fine-tuning
Example 2: python 36_gpt2_on_wikitext_v3.py -gt -pt "Once upon a time" -L -c "checkpoint_filename.pth" ---> Text generation using a fine-tuned (digital or analog) model with a saved checkpoint file

For digital model fine-tuning and loss calculation, use command line arguments: "d", "c", "lr", "L"

Example 3: python 36_gpt2_on_wikitext_v3.py -d -lr 1e-5 -c "checkpoint_filename.pth" ---> fine-tune the digital model with specified learning-rate and save checkpoint to the specified fine name
Example 4: python 36_gpt2_on_wikitext_v3.py -d -L ---> inference (loss calculation) without fine-tuning on the pre-trained DistilGPT2 model
Example 5: python 36_gpt2_on_wikitext_v3.py -d -L -c "checkpoint_filename.pth" ---> inference (loss calculation) on a fine-tuned digital model with a saved checkpoint file

For analog model HWA fine-tuning and loss calculation, use command line arguments: "t", "c", "n", "lr", "L"

Example 6: python 36_gpt2_on_wikitext_v3.py -t -n 0.0 -l 0.01 -c "checkpoint_filename.pth" ---> fine-tune the analog model with specified noise and learning rate, and save the checkpoint file
Example 7: python 36_gpt2_on_wikitext_v3.py -L -c "checkpoint_filename.pth" ---> inference (loss calculation) on a fine-tuned analog model with a saved checkpoint file

@anchen25 anchen25 marked this pull request as draft November 23, 2025 00:12
@anchen25 anchen25 marked this pull request as ready for review November 23, 2025 00:20
@PabloCarmona
Copy link
Collaborator

Thanks @anchen25 for the PR! We will take a look and run the lint and test workflows and get back to you ASAP!

@anchen25 anchen25 marked this pull request as draft November 26, 2025 15:19
@anchen25 anchen25 marked this pull request as ready for review November 26, 2025 15:24
@anchen25
Copy link
Author

Added "disable=invalid-name" to address pylint error

Copy link
Collaborator

@PabloCarmona PabloCarmona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @anchen25! Instead of adding disable comments for linting can you try to address it and follow the guide and advice the linting is giving to you?

@PabloCarmona
Copy link
Collaborator

Is this PR the most recent and that will be used instead of this one: #664? In that case please let us know @anchen25 @charles-mackin to close the other in favor of this one. Thanks!

@anchen25
Copy link
Author

anchen25 commented Dec 4, 2025 via email

@anchen25
Copy link
Author

anchen25 commented Dec 4, 2025 via email

@charlesmackin
Copy link
Collaborator

@PabloCarmona Confirming that this is an improved duplicate of the previous #664 submission, which can now be safely removed


from datetime import datetime
from argparse import ArgumentParser
from transformers.integrations import TensorBoardCallback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorBoardCallback requires adding tensorboard to the requirements-examples.txt file

@anchen25 anchen25 marked this pull request as draft December 7, 2025 06:18
@anchen25 anchen25 marked this pull request as ready for review December 7, 2025 07:02
DataCollatorForLanguageModeling,
)

from torch import save as torch_save, load as torch_load
Copy link
Collaborator

@charlesmackin charlesmackin Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this line to import torch since we are now using torch.device to ensure the Model is on the correct device

Since we've imported torch, there is no need for torch_load and torch_save anymore. Please make sure to delete this line and and change torch_load to torch.load and torch_save to torch.save everywhere

@anchen25 anchen25 marked this pull request as draft December 16, 2025 21:46
Correct errors related to "import torch"
@anchen25 anchen25 marked this pull request as ready for review December 16, 2025 21:56
@anchen25 anchen25 marked this pull request as draft December 18, 2025 06:37
Add Gyujun Jeong's name and email in the comment.
@anchen25 anchen25 marked this pull request as ready for review December 18, 2025 06:43
@anchen25 anchen25 marked this pull request as draft December 19, 2025 21:53
Signed-off-by An Chen
@anchen25 anchen25 marked this pull request as ready for review December 19, 2025 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants