Run distilGPT2 on aihwkit #749

anchen25 · 2025-11-22T14:23:31Z

Related issues

Description

Implementing GPT-2 based model distilgpt2 on the wikitext-2-raw-v1 dataset using AIHWKit. The example demonstrates how to convert the model to analog, run fine-tuning, text-generation, and inference.

Details

Model and Dataset:

Implemented an example using the smallest GPT-2 model (distilgpt2).
Utilized the wikitext-2-raw-v1 dataset for training and validation, which is smaller and faster to process compared to
openwebtext.

Training and Inference Setup:

Configured the model to use analog inference with specified noise levels.
Added support for digital inference as an option.
Implemented preprocessing functions to handle dataset tokenization.
Provided functionality to train the model and save/load checkpoints.

Logging and Monitoring:

Integrated TensorBoard for logging training and validation metrics.
Added TensorBoardCallback to the Trainer for seamless logging.
Configured the script to save logs in a specific directory and visualize them using TensorBoard.

Performance Metrics:

Calculated validation loss and perplexity as the primary performance metrics.
Digital model loss varies with learning rate. Lowest training loss = 3.26, lowest inference loss = 3.55
Loss of HWA-finetuned analog model depends on learning rate and optimizer. Lowest HWA inference loss = 3.55, matching digital model.

How to use command-line arguments:

For text generation (of both digital and analog models), use command line arguments: "gt", "L", "c", "pt"

Example 1: python 36_gpt2_on_wikitext_v3.py -gt -pt "Once upon a time" ---> text generation using the pre-trained DistilGPT2 model without fine-tuning
Example 2: python 36_gpt2_on_wikitext_v3.py -gt -pt "Once upon a time" -L -c "checkpoint_filename.pth" ---> Text generation using a fine-tuned (digital or analog) model with a saved checkpoint file

For digital model fine-tuning and loss calculation, use command line arguments: "d", "c", "lr", "L"

Example 3: python 36_gpt2_on_wikitext_v3.py -d -lr 1e-5 -c "checkpoint_filename.pth" ---> fine-tune the digital model with specified learning-rate and save checkpoint to the specified fine name
Example 4: python 36_gpt2_on_wikitext_v3.py -d -L ---> inference (loss calculation) without fine-tuning on the pre-trained DistilGPT2 model
Example 5: python 36_gpt2_on_wikitext_v3.py -d -L -c "checkpoint_filename.pth" ---> inference (loss calculation) on a fine-tuned digital model with a saved checkpoint file

For analog model HWA fine-tuning and loss calculation, use command line arguments: "t", "c", "n", "lr", "L"

Example 6: python 36_gpt2_on_wikitext_v3.py -t -n 0.0 -l 0.01 -c "checkpoint_filename.pth" ---> fine-tune the analog model with specified noise and learning rate, and save the checkpoint file
Example 7: python 36_gpt2_on_wikitext_v3.py -L -c "checkpoint_filename.pth" ---> inference (loss calculation) on a fine-tuned analog model with a saved checkpoint file

PabloCarmona · 2025-11-24T17:28:36Z

Thanks @anchen25 for the PR! We will take a look and run the lint and test workflows and get back to you ASAP!

anchen25 · 2025-11-26T15:24:33Z

Added "disable=invalid-name" to address pylint error

PabloCarmona

Thanks for the work @anchen25! Instead of adding disable comments for linting can you try to address it and follow the guide and advice the linting is giving to you?

examples/36_gpt2_on_wikitext.py

PabloCarmona · 2025-12-04T15:23:17Z

Is this PR the most recent and that will be used instead of this one: #664? In that case please let us know @anchen25 @charles-mackin to close the other in favor of this one. Thanks!

anchen25 · 2025-12-04T16:36:58Z

Yes, this pull request originated from #664, but the script in #664 has some errors, so the script for this pull request is updated.

…

On Dec 4, 2025, at 7:23 AM, pablocarmona ***@***.***> wrote: PabloCarmona left a comment (IBM/aihwkit#749) <#749 (comment)> Is this PR the most recent and that will be used instead of this one: #664 <#664>? In that case please let us know @anchen25 <https://github.com/anchen25> @charles-mackin <https://github.com/charles-mackin> to close the other in favor of this one. Thanks! — Reply to this email directly, view it on GitHub <#749 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BQ4THO2LVSH5QWPRY24BCHL4ABGXXAVCNFSM6AAAAACM4TNZGCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMJSG43TSOJWHA>. You are receiving this because you were mentioned.

anchen25 · 2025-12-04T16:39:18Z

Regarding the pylint error, the script passed pylint test when it was tested in CCC. The only difference is that the script title starts with a number, which generated a naming error when it was tested in CCC. When the file name is changed to start with a letter, it passed the pylint test. Since all the names of the scripts in the “example” folder start with number, I changed the script file name back to starting with a number and added the ignore statement in the script. Many example scripts also has this ignore statement, which I assume is because of the file name. If the pylint error is actually caused by other reasons, please let me know. Thanks, An

…

On Nov 24, 2025, at 9:28 AM, pablocarmona ***@***.***> wrote: PabloCarmona left a comment (IBM/aihwkit#749) <#749 (comment)> Thanks @anchen25 <https://github.com/anchen25> for the PR! We will take a look and run the lint and test workflows and get back to you ASAP! — Reply to this email directly, view it on GitHub <#749 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BQ4THO5QWXHENAZK232XW2L36M55VAVCNFSM6AAAAACM4TNZGCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZRHEZTKNJZGM>. You are receiving this because you were mentioned.

charlesmackin · 2025-12-04T17:45:12Z

@PabloCarmona Confirming that this is an improved duplicate of the previous #664 submission, which can now be safely removed

charlesmackin · 2025-12-05T17:31:19Z

examples/36_gpt2_on_wikitext.py

+
+from datetime import datetime
+from argparse import ArgumentParser
+from transformers.integrations import TensorBoardCallback


TensorBoardCallback requires adding tensorboard to the requirements-examples.txt file

examples/36_gpt2_on_wikitext_v3.py

examples/36_gpt2_on_wikitext.py

charlesmackin · 2025-12-16T20:01:58Z

examples/36_gpt2_on_wikitext.py

+    DataCollatorForLanguageModeling,
+)
+
+from torch import save as torch_save, load as torch_load


Please change this line to import torch since we are now using torch.device to ensure the Model is on the correct device

Since we've imported torch, there is no need for torch_load and torch_save anymore. Please make sure to delete this line and and change torch_load to torch.load and torch_save to torch.save everywhere

Correct errors related to "import torch"

Add Gyujun Jeong's name and email in the comment.

Signed-off-by An Chen

anchen25 added 2 commits November 21, 2025 23:09

Add files via upload

8854ae7

Rename 31_gpt2_on_wikitext_v3.py to 36_gpt2_on_wikitext_v3.py

19e2da6

anchen25 marked this pull request as draft November 23, 2025 00:12

anchen25 marked this pull request as ready for review November 23, 2025 00:20

PabloCarmona requested review from PabloCarmona, anu-pub and charlesmackin November 24, 2025 17:29

anchen25 marked this pull request as draft November 26, 2025 15:19

Update 36_gpt2_on_wikitext_v3.py

2ff7ce6

anchen25 marked this pull request as ready for review November 26, 2025 15:24

PabloCarmona requested changes Dec 4, 2025

View reviewed changes

examples/36_gpt2_on_wikitext.py Show resolved Hide resolved

charlesmackin reviewed Dec 5, 2025

View reviewed changes

examples/36_gpt2_on_wikitext_v3.py Outdated Show resolved Hide resolved

charlesmackin reviewed Dec 5, 2025

View reviewed changes

examples/36_gpt2_on_wikitext_v3.py Outdated Show resolved Hide resolved

charlesmackin reviewed Dec 5, 2025

View reviewed changes

examples/36_gpt2_on_wikitext.py Show resolved Hide resolved

charlesmackin reviewed Dec 5, 2025

View reviewed changes

examples/36_gpt2_on_wikitext.py Show resolved Hide resolved

anchen25 marked this pull request as draft December 7, 2025 06:18

Update and rename 36_gpt2_on_wikitext_v3.py to 36_gpt2_on_wikitext.py

13994dd

anchen25 marked this pull request as ready for review December 7, 2025 07:02

PabloCarmona mentioned this pull request Dec 15, 2025

Added Example 31 for GPT-2 transformer model #664

Closed

charlesmackin reviewed Dec 16, 2025

View reviewed changes

anchen25 marked this pull request as draft December 16, 2025 21:46

Update 36_gpt2_on_wikitext.py

902bac1

Correct errors related to "import torch"

anchen25 marked this pull request as ready for review December 16, 2025 21:56

anchen25 marked this pull request as draft December 18, 2025 06:37

Update 36_gpt2_on_wikitext.py

b03d2d6

Add Gyujun Jeong's name and email in the comment.

anchen25 marked this pull request as ready for review December 18, 2025 06:43

anchen25 marked this pull request as draft December 19, 2025 21:53

Update 36_gpt2_on_wikitext.py

3d8f177

Signed-off-by An Chen

anchen25 marked this pull request as ready for review December 19, 2025 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run distilGPT2 on aihwkit #749

Run distilGPT2 on aihwkit #749

Uh oh!

anchen25 commented Nov 22, 2025 •

edited

Loading

Uh oh!

PabloCarmona commented Nov 24, 2025

Uh oh!

anchen25 commented Nov 26, 2025

Uh oh!

PabloCarmona left a comment

Uh oh!

Uh oh!

PabloCarmona commented Dec 4, 2025

Uh oh!

anchen25 commented Dec 4, 2025 via email

Uh oh!

anchen25 commented Dec 4, 2025 via email

Uh oh!

charlesmackin commented Dec 4, 2025

Uh oh!

charlesmackin Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

charlesmackin Dec 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Run distilGPT2 on aihwkit #749

Are you sure you want to change the base?

Run distilGPT2 on aihwkit #749

Uh oh!

Conversation

anchen25 commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issues

Description

Details

Uh oh!

PabloCarmona commented Nov 24, 2025

Uh oh!

anchen25 commented Nov 26, 2025

Uh oh!

PabloCarmona left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PabloCarmona commented Dec 4, 2025

Uh oh!

anchen25 commented Dec 4, 2025 via email

Uh oh!

anchen25 commented Dec 4, 2025 via email

Uh oh!

charlesmackin commented Dec 4, 2025

Uh oh!

charlesmackin Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

charlesmackin Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anchen25 commented Nov 22, 2025 •

edited

Loading

charlesmackin Dec 16, 2025 •

edited

Loading