-
Notifications
You must be signed in to change notification settings - Fork 437
Docs: Improve SFT/RL user experience #2794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8629e8b to
5ae647f
Compare
| python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \ | ||
| model_name=${MODEL} \ | ||
| tokenizer_path=${TOKENIZER} \ | ||
| load_parameters_path=${MAXTEXT_CKPT_PATH} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be wrong if user sets MAXTEXT_CKPT_PATH in the above sction as it includes 0/items. Maybe we can format this section like this: https://github.com/AI-Hypercomputer/maxtext/blob/8c3289731a28524b631ec62dfb226a357e6e72db/docs/tutorials/posttraining/sft.md#get-your-model-checkpoint to explicitly call this out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user sets path including 0/items, then the checkpoint will be saved to 0/items/0/items (example). We still need to use as ${MAXTEXT_CKPT_PATH}/0/items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see your point and the issue. The previous doc provides two ckpt conversion examples, but the base_output_directory format are different. We should stick to ${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}, and then the actual checkpoint was saved to MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items. We use MAXTEXT_CKPT_PATH to load for SFT later.
|
|
||
| The overview of what this run will do is as follows: | ||
|
|
||
| 1. We load a policy model and a reference model. Both are copies of `Llama3.1-8b-Instruct`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user sets MODEL to a different model name other than `Llama3.1-8B, then the following overview section would be misleading. Please rephrase this overview section accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
docs/install_maxtext.md
Outdated
| 3. **Run tests:** Run MaxText tests to ensure there are no regressions. | ||
| 3. **Run tests:** Run MaxText tests to ensure there are no regressions. | ||
|
|
||
| ## Appendix: Install XPK for MaxText Multi-host Workloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this relevant to this doc?
We already have a different document that provides steps to run MaxText with XPK: https://maxtext.readthedocs.io/en/latest/run_maxtext/run_maxtext_via_xpk.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes then I will remove this part and instead points to the xpk doc.
c836743 to
66c8566
Compare
Description
Reduce user friction in SFT/RL and fix broken links.
b/463394566
b/463409639
b/463409807
b/463396352
b/463393644
Tests
N/A
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.