Add callback for saving trainable parameters and model config #178
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
adapter_model.binbut alsotrainable_params.binand the configuration of backbone model(config.json) in order to reuse the configurations of rope scaling.New callback:
SavePeftModelCallbacksave_callback.py, I added a callback namedSavePeftModelCallback, which saves trained weights and model config in a new directory.f"{args.output_dir}/step-{state.global_step}}". The callback will automatically create if it doesn't exist, so that this callback can be used to store separate checkpoints at specific step intervals.Changes in
merge_lora_weights_and_save_hf_model.pyrope_scaling, even though they where changed during training. That's why I letSavePeftModelCallbackto save the model's config too.merge_lora_weights_and_save_hf_model.pywill try to load and use the model config saved during training, which contains information about rope scaling.See Llama-2-7b-longlora-8k/main/config.json
{ "_name_or_path": "meta-llama/Llama-2-7b-hf", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0.dev0", "use_cache": true, "vocab_size": 32001 }Thank you so much for sharing and maintaining such great research!
If you have any feedback, please feel free to...