Update gemma_backbone.py for sharding config. #1491

qlzh727 · 2024-03-06T20:37:11Z

This is trying to address the #1464.

The new setting is based on the Gemma training script internally.

Here is some perf benchmark on TPU v3-8:

(Smaller value are better)

===================
base line (current setting):
generate: 1342 ms per 100 token
finetune with lora: 125ms/step

=====================
This PR setting
generate: 1245 ms per 100 token
finetune with lora: 64ms/step

qlzh727 · 2024-03-12T17:41:15Z

PTAL again.

mattdangerw

This LGTM, though we still might want to check with other folks to help decide between the two.

qlzh727 · 2024-03-12T18:14:43Z

Ack, I will leave the PR here and feel free to merge it when ready.

mattdangerw

looks good!

* Update gemma_backbone.py for sharding config. * Update unit test and fix format. * Update sharding spec for gemma based on gemma training.

Update gemma_backbone.py for sharding config.

e95675a

github-actions bot added the Gemma Gemma model specific issues label Mar 6, 2024

mattdangerw self-requested a review March 8, 2024 01:42

Update unit test and fix format.

d11fb86

mattdangerw approved these changes Mar 12, 2024

View reviewed changes

Update sharding spec for gemma based on gemma training.

fcc94d5

qlzh727 requested a review from mattdangerw March 14, 2024 17:22

mattdangerw approved these changes Mar 14, 2024

View reviewed changes

mattdangerw merged commit 4511580 into master Mar 14, 2024
18 of 19 checks passed

qlzh727 mentioned this pull request Mar 15, 2024

Question about Gemma tensor parallel sharding policy #1464

Closed

josharian mentioned this pull request May 3, 2024

GemmaBackbone.get_layout_map broken for gemma_2b_en #1613

Closed

mattdangerw deleted the qlzh727-patch-2 branch August 22, 2024 00:15

This was referenced Jan 3, 2025

Fixing batch_dim_name attribute keras-team/keras#20674

Merged

Removing batch dimension from default layout maps for Gemma and Llama #2035

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update gemma_backbone.py for sharding config. #1491

Update gemma_backbone.py for sharding config. #1491

qlzh727 commented Mar 6, 2024 •

edited

Loading

qlzh727 commented Mar 12, 2024

mattdangerw left a comment

qlzh727 commented Mar 12, 2024

mattdangerw left a comment

Update gemma_backbone.py for sharding config. #1491

Update gemma_backbone.py for sharding config. #1491

Conversation

qlzh727 commented Mar 6, 2024 • edited Loading

qlzh727 commented Mar 12, 2024

mattdangerw left a comment

Choose a reason for hiding this comment

qlzh727 commented Mar 12, 2024

mattdangerw left a comment

Choose a reason for hiding this comment

qlzh727 commented Mar 6, 2024 •

edited

Loading