Support NeMo NeVA Model #343

athitten · 2024-05-01T18:38:38Z

🚀 Feature

NeMo's NeVa (LLaVa) is a multimodal language model

Initial examine:
Found 49 distinct operations, of which 39 (79.6%) are supported

Work items

Running the model

Required data

First download the freely available data and place it in a data directory.

NeMo installation

Dependencies

python3 -m pip install --no-deps \
  huggingface-hub==0.23.2

NeMo branch

To keep the whole thunder team on the same NeMo revisions, and to prevent having a bunch of "modify file to call thunder.jit()" instructions, we temporarily maintain our own branch for thunder. You can grab it by cloning https://github.com/tfogal/NeMo.git. Make sure you have checked out the tfogal/thunder-nemo branch.

To install NeMo, run python3 -m pip install -e . from the root of the checked-out directory.

Running the network

rm -fr foo-neva-train; mkdir -p foo-neva-train
HYDRA_FULL_ERROR=1 \
THUNDER_ANNOTATE_TRACES=1 \
NEMO_THUNDER_NEVA=thunder \
python3 ./examples/multimodal/multimodal_llm/neva/neva_pretrain.py \
    trainer.precision=bf16-mixed \
    model.megatron_amp_O2=True \
    model.mcore_gpt=False \
    trainer.num_nodes=1 \
    trainer.devices=1 \
    trainer.val_check_interval=10 \
    trainer.limit_val_batches=5 \
    trainer.log_every_n_steps=1 \
    ++exp_manager.max_time_per_run=00:00:03:00 \
    trainer.max_steps=20 \
    model.micro_batch_size=2 \
    model.global_batch_size=4 \
    model.tensor_model_parallel_size=1 \
    model.pipeline_model_parallel_size=1 \
    exp_manager.create_checkpoint_callback=False \
    model.data.data_path=./data/multimodal/tiny-neva/dummy.json \
    model.data.image_folder=./data/multimodal/tiny-neva/images \
    model.tokenizer.library=sentencepiece \
    model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model \
    model.num_layers=2 \
    model.hidden_size=5120 \
    model.ffn_hidden_size=13824 \
    model.num_attention_heads=40 \
    model.normalization=rmsnorm \
    model.data.num_workers=0 \
    model.data.conv_template=llama_2 \
    model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14 \
    model.mm_cfg.llm.from_pretrained=null \
    model.use_flash_attention=false \
    exp_manager.exp_dir=./foo-neva-train

Note that the latest version of the tfogal/thunder-nemo branch allows running with dynamo+thunder by setting NEMO_THUNDER_NEVA=dynamo.

cc @apaz-cli @tfogal

The text was updated successfully, but these errors were encountered:

IvanYashchuk · 2024-07-09T08:38:06Z

Can you share the script for the examine call?

tfogal · 2024-07-10T20:11:37Z

Can you share the script for the examine call?

@athitten when you have a minute

athitten · 2024-08-06T00:44:29Z

Adding the updated command to use megatron_amp_O2=True and model.mcore_gpt = True (NeMo models will be defaulting to using models from Megatron, hence this setting). With megatron_amp_O2=True, having precision=bf16 should do mixed precision training with main copy of weights in FP32, but just to be safe also specifying precision=bf16-mixed.

python3 ./examples/multimodal/multimodal_llm/neva/neva_pretrain.py trainer.precision=bf16-mixed model.megatron_amp_O2=True model.mcore_gpt=True  trainer.num_nodes=1 trainer.devices=1 trainer.val_check_interval=10 trainer.limit_val_batches=5 trainer.log_every_n_steps=1 ++exp_manager.max_time_per_run=00:00:03:00 trainer.max_steps=20 model.micro_batch_size=2 model.global_batch_size=4 model.tensor_model_parallel_size=1 model.pipeline_model_parallel_size=1 exp_manager.create_checkpoint_callback=False model.data.data_path=./data/multimodal/tiny-neva/dummy.json model.data.image_folder=./data/multimodal/tiny-neva/images model.tokenizer.library=sentencepiece model.tokenizer.model=./data/multimodal/tiny-neva/tokenizer_add_special.model model.num_layers=2 model.hidden_size=5120 model.ffn_hidden_size=13824 model.num_attention_heads=40 model.normalization=rmsnorm model.data.num_workers=0 model.data.conv_template=llama_2 model.mm_cfg.vision_encoder.from_pretrained=openai/clip-vit-large-patch14 model.mm_cfg.llm.from_pretrained=null model.use_flash_attention=false exp_manager.exp_dir=./foo-neva-train

athitten · 2024-08-06T00:48:28Z

This might be helpful: The full config with default values for all parameters can be found: here. Only the parameters we specify in the run command get overwritten by the specified values and others default to values mentioned in the config.

tfogal · 2024-08-09T22:44:03Z

Adding the updated command

Thanks, @athitten !
I have edited the original issue to mostly reflect the updated command. Unfortunately #753 blocks setting model.mcore_gpt=True, so for now that one's still False... but let's prioritize that one!

athitten · 2024-08-09T23:24:16Z

Yes its important to prioritize getting thunder working with mcore_gpt=True as it will be default for NeMo models once we deprecate the legacy path.

athitten added the enhancement New feature or request label May 1, 2024

tfogal added the nemo Issues needed to support NVIDIA NeMo models. label May 1, 2024

mruberry added triage review and removed triage review labels May 6, 2024

tfogal changed the title ~~Support NeMo NeVa Model~~ Support NeMo NeVA Model Jun 12, 2024

tfogal mentioned this issue Jun 27, 2024

Implement TensorBase.__setitem__ #341

Closed

tfogal mentioned this issue Jul 5, 2024

Widen torch.where supported cases #719

Closed

IvanYashchuk added the operators label Jul 9, 2024

tfogal mentioned this issue Jul 10, 2024

type inference: mismatched dtype in cat operator #750

Closed

tfogal added the high priority label Jul 11, 2024

tfogal mentioned this issue Jul 31, 2024

performance (# of recompiles): do not constant fold when it provides no benefit #897

Closed

k223kim mentioned this issue Aug 20, 2024

NeVA bug regarding flash_attn_with_kvcache #1004

Open

tfogal mentioned this issue Aug 24, 2024

KeyError: 'type' from torch.compile executor #1040

Closed

tfogal mentioned this issue Sep 11, 2024

CUDA transform of a NeVA graph #1142

Closed

tfogal added the neva label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support NeMo NeVA Model #343

Support NeMo NeVA Model #343

athitten commented May 1, 2024 •

edited by tfogal

Loading

IvanYashchuk commented Jul 9, 2024

tfogal commented Jul 10, 2024

athitten commented Aug 6, 2024 •

edited

Loading

athitten commented Aug 6, 2024

tfogal commented Aug 9, 2024

athitten commented Aug 9, 2024

Support NeMo NeVA Model #343

Support NeMo NeVA Model #343

Comments

athitten commented May 1, 2024 • edited by tfogal Loading

🚀 Feature

Work items

Running the model

Required data

NeMo installation

Dependencies

NeMo branch

Running the network

IvanYashchuk commented Jul 9, 2024

tfogal commented Jul 10, 2024

athitten commented Aug 6, 2024 • edited Loading

athitten commented Aug 6, 2024

tfogal commented Aug 9, 2024

athitten commented Aug 9, 2024

athitten commented May 1, 2024 •

edited by tfogal

Loading

athitten commented Aug 6, 2024 •

edited

Loading