Skip to content

Conversation

@GirinMan
Copy link

Overview

  • This PR is originated from Saving pytorch_model.bin with QLORA #123
  • I also faced similar problems with it but no one ever made commits for it...
  • I added a callback which saves not only adapter_model.bin but also trainable_params.bin and the configuration of backbone model(config.json) in order to reuse the configurations of rope scaling.

New callback: SavePeftModelCallback

  • In a new file named save_callback.py, I added a callback named SavePeftModelCallback, which saves trained weights and model config in a new directory.
  • The name of the directory is like f"{args.output_dir}/step-{state.global_step}}". The callback will automatically create if it doesn't exist, so that this callback can be used to store separate checkpoints at specific step intervals.

Changes in merge_lora_weights_and_save_hf_model.py

  • While loading a backbone model, the script didn't use the model config used during training, so the merged & saved checkpoint does not have information of the rope scaling configurations.
  • I guess this is why the config of LongLoRA models in huggingface hub do not contain any information with rope_scaling, even though they where changed during training. That's why I let SavePeftModelCallback to save the model's config too.
  • With changes in this PR, merge_lora_weights_and_save_hf_model.py will try to load and use the model config saved during training, which contains information about rope scaling.

See Llama-2-7b-longlora-8k/main/config.json

{
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.31.0.dev0",
  "use_cache": true,
  "vocab_size": 32001
}

Thank you so much for sharing and maintaining such great research!
If you have any feedback, please feel free to...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant