Skip to content

Conversation

@hengtaoguo
Copy link
Collaborator

@hengtaoguo hengtaoguo commented Dec 6, 2025

Description

Reduce user friction in SFT/RL and fix broken links.

b/463394566
b/463409639
b/463409807
b/463396352
b/463393644

Tests

N/A

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@hengtaoguo hengtaoguo force-pushed the hengtaoguo-grpo branch 2 times, most recently from 8629e8b to 5ae647f Compare December 10, 2025 06:32
@hengtaoguo hengtaoguo changed the title More UXR fixes Docs: Improve SFT/RL user experience Dec 10, 2025
@hengtaoguo hengtaoguo marked this pull request as ready for review December 10, 2025 18:11
python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
model_name=${MODEL} \
tokenizer_path=${TOKENIZER} \
load_parameters_path=${MAXTEXT_CKPT_PATH} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be wrong if user sets MAXTEXT_CKPT_PATH in the above sction as it includes 0/items. Maybe we can format this section like this: https://github.com/AI-Hypercomputer/maxtext/blob/8c3289731a28524b631ec62dfb226a357e6e72db/docs/tutorials/posttraining/sft.md#get-your-model-checkpoint to explicitly call this out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user sets path including 0/items, then the checkpoint will be saved to 0/items/0/items (example). We still need to use as ${MAXTEXT_CKPT_PATH}/0/items?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see your point and the issue. The previous doc provides two ckpt conversion examples, but the base_output_directory format are different. We should stick to ${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}, and then the actual checkpoint was saved to MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items. We use MAXTEXT_CKPT_PATH to load for SFT later.

https://screenshot.googleplex.com/AsY59zep5hZxNZL


The overview of what this run will do is as follows:

1. We load a policy model and a reference model. Both are copies of `Llama3.1-8b-Instruct`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user sets MODEL to a different model name other than `Llama3.1-8B, then the following overview section would be misleading. Please rephrase this overview section accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

3. **Run tests:** Run MaxText tests to ensure there are no regressions.
3. **Run tests:** Run MaxText tests to ensure there are no regressions.

## Appendix: Install XPK for MaxText Multi-host Workloads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant to this doc?
We already have a different document that provides steps to run MaxText with XPK: https://maxtext.readthedocs.io/en/latest/run_maxtext/run_maxtext_via_xpk.html.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes then I will remove this part and instead points to the xpk doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants