Esm2 on Sagemaker Hyperpod #387

awsankur · 2024-07-25T06:32:47Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Ankur Srivastava <awsankur@amazon.com>

KeitaW · 2024-07-25T08:01:38Z

Do we have any SMHP specific feature in this test case?
If not we may organize the test case per scheduler:

23.esm
├── kubernetes
└── slurm

see also #381

KeitaW · 2024-07-30T23:22:56Z

3.test_cases/23.SMHP-esm2/README.md

+
+|  Model | device_batch_size | num_nodes | torch.compile |     Instance   |   Throughput   |
+|:------:|:-----------------:|:---------:|:-------------:| :------------: | :------------: |
+|  ESM2  |         8         |     2     |       No      |  g5.12xlarge   |  160 samples/s | 


The set up instruction advise to use 24xl but actually 12xl was used?

KeitaW · 2024-07-30T23:37:18Z

3.test_cases/23.SMHP-esm2/README.md

+## What is ESM-2?
+[ESM-2](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1) is a pLM trained using unsupervied masked language modelling on 250 Million protein sequences by researchers at [Facebook AI Research (FAIR)](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1). It is available in several sizes, ranging from 8 Million to 15 Billion parameters. The smaller models are suitable for various sequence and token classification tasks. The FAIR team also adapted the 3 Billion parameter version into the ESMFold protein structure prediction algorithm. They have since used ESMFold to predict the struture of [more than 700 Million metagenomic proteins](https://esmatlas.com/about).
+
+ESM-2 is a powerful pLM. We will demonstrate how to use QLoRA to fine-tune ESM-2 on g5.24xlarge instances. We will use ESM-2 to predict [subcellular localization](https://academic.oup.com/nar/article/50/W1/W228/6576357?login=false). Understanding where proteins appear in cells can help us understand their role in disease and find new drug targets.


Is this test case demonstrating pretraining? or finetuning? I believe latter but the title states former.

3.test_cases/23.SMHP-esm2/README.md

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

3.test_cases/23.SMHP-esm2/2.train_ddp.sh

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

perifaws · 2024-09-12T16:16:14Z

@awsankur @KeitaW are we good on this?

Signed-off-by: Ankur Srivastava <awsankur@amazon.com>

awsankur added 4 commits July 3, 2024 18:04

Added files

8631a20

Signed-off-by: Ankur Srivastava <awsankur@amazon.com>

Updated with training example

251e015

Signed-off-by: Ankur Srivastava <awsankur@amazon.com>

Added ESM2 training on SMHP

82d2e4e

Signed-off-by: Ankur Srivastava <awsankur@amazon.com>

Added ESM2 training on SMHP

a2d7766

Signed-off-by: Ankur Srivastava <awsankur@amazon.com>

awsankur requested review from KeitaW and amanshanbhag July 25, 2024 06:32

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/README.md Outdated Show resolved Hide resolved

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh Outdated Show resolved Hide resolved

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/2.train_ddp.sh Outdated Show resolved Hide resolved

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh Outdated Show resolved Hide resolved

KeitaW assigned awsankur Jul 30, 2024

KeitaW and others added 5 commits November 5, 2024 11:07

Update 3.test_cases/23.SMHP-esm2/2.train_ddp.sh

d87b862

Update 3.test_cases/23.SMHP-esm2/README.md

f5b0543

Update 3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

30cb879

Update 3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

d63b2c6

updated

85cb318

Signed-off-by: Ankur Srivastava <awsankur@amazon.com>

nghtm requested a review from mvinci12 April 29, 2025 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Esm2 on Sagemaker Hyperpod #387

Esm2 on Sagemaker Hyperpod #387

Uh oh!

awsankur commented Jul 25, 2024

Uh oh!

KeitaW commented Jul 25, 2024 •

edited

Loading

Uh oh!

KeitaW Jul 30, 2024

Uh oh!

KeitaW Jul 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perifaws commented Sep 12, 2024

Uh oh!

Uh oh!

Esm2 on Sagemaker Hyperpod #387

Are you sure you want to change the base?

Esm2 on Sagemaker Hyperpod #387

Uh oh!

Conversation

awsankur commented Jul 25, 2024

Uh oh!

KeitaW commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KeitaW Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

KeitaW Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perifaws commented Sep 12, 2024

Uh oh!

Uh oh!

KeitaW commented Jul 25, 2024 •

edited

Loading