Docs: Improve SFT/RL user experience #2794

hengtaoguo · 2025-12-06T05:12:32Z

Description

Reduce user friction in SFT/RL and fix broken links.

b/463394566
b/463409639
b/463409807
b/463396352
b/463393644

Tests

N/A

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

SurbhiJainUSC · 2025-12-10T18:57:54Z

docs/tutorials/posttraining/rl.md

 python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
  model_name=${MODEL} \
  tokenizer_path=${TOKENIZER} \
-  load_parameters_path=${MAXTEXT_CKPT_PATH} \


This will be wrong if user sets MAXTEXT_CKPT_PATH in the above sction as it includes 0/items. Maybe we can format this section like this: https://github.com/AI-Hypercomputer/maxtext/blob/8c3289731a28524b631ec62dfb226a357e6e72db/docs/tutorials/posttraining/sft.md#get-your-model-checkpoint to explicitly call this out?

If the user sets path including 0/items, then the checkpoint will be saved to 0/items/0/items (example). We still need to use as ${MAXTEXT_CKPT_PATH}/0/items?

Now I see your point and the issue. The previous doc provides two ckpt conversion examples, but the base_output_directory format are different. We should stick to ${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}, and then the actual checkpoint was saved to MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items. We use MAXTEXT_CKPT_PATH to load for SFT later.

https://screenshot.googleplex.com/AsY59zep5hZxNZL

SurbhiJainUSC · 2025-12-10T18:59:35Z

docs/tutorials/posttraining/rl.md


 The overview of what this run will do is as follows:

 1. We load a policy model and a reference model. Both are copies of `Llama3.1-8b-Instruct`.


If user sets MODEL to a different model name other than `Llama3.1-8B, then the following overview section would be misleading. Please rephrase this overview section accordingly.

SurbhiJainUSC · 2025-12-10T19:03:34Z

docs/install_maxtext.md

-3.  **Run tests:** Run MaxText tests to ensure there are no regressions.
+3.  **Run tests:** Run MaxText tests to ensure there are no regressions.
+
+## Appendix: Install XPK for MaxText Multi-host Workloads


Is this relevant to this doc?
We already have a different document that provides steps to run MaxText with XPK: https://maxtext.readthedocs.io/en/latest/run_maxtext/run_maxtext_via_xpk.html.

Yes then I will remove this part and instead points to the xpk doc.

hengtaoguo force-pushed the hengtaoguo-grpo branch 2 times, most recently from 8629e8b to 5ae647f Compare December 10, 2025 06:32

hengtaoguo changed the title ~~More UXR fixes~~ Docs: Improve SFT/RL user experience Dec 10, 2025

hengtaoguo marked this pull request as ready for review December 10, 2025 18:11

hengtaoguo requested review from A9isha, RissyRan, bvandermoon, gagika, gobbleturk, jacoguzo, jiangjy1982, richjames0, shralex and vipannalla as code owners December 10, 2025 18:11

SurbhiJainUSC reviewed Dec 10, 2025

View reviewed changes

More UXR fixes

66c8566

hengtaoguo force-pushed the hengtaoguo-grpo branch from c836743 to 66c8566 Compare December 10, 2025 20:54

NuojCheng approved these changes Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docs: Improve SFT/RL user experience #2794

Docs: Improve SFT/RL user experience #2794

hengtaoguo commented Dec 6, 2025 •

edited

Loading

Uh oh!

SurbhiJainUSC Dec 10, 2025

Uh oh!

hengtaoguo Dec 10, 2025

Uh oh!

hengtaoguo Dec 10, 2025

Uh oh!

SurbhiJainUSC Dec 10, 2025

Uh oh!

hengtaoguo Dec 10, 2025

Uh oh!

SurbhiJainUSC Dec 10, 2025

Uh oh!

hengtaoguo Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		The overview of what this run will do is as follows:

		1. We load a policy model and a reference model. Both are copies of `Llama3.1-8b-Instruct`.

Docs: Improve SFT/RL user experience #2794

Are you sure you want to change the base?

Docs: Improve SFT/RL user experience #2794

Conversation

hengtaoguo commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

SurbhiJainUSC Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

hengtaoguo Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

hengtaoguo Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

hengtaoguo Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

hengtaoguo Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hengtaoguo commented Dec 6, 2025 •

edited

Loading