Skip to content

Conversation

@NicoGrande
Copy link
Collaborator

@NicoGrande NicoGrande commented Dec 2, 2025

Description

This PR finishes the work started by @gagika in #2767. Credits to @gagika for helping with this feature!

This PR adds the changes required to train_rl.py as well as other modules related to Tunix integration to allow for additional configurations needed for the MaxText on vLLM flow to be passed to Tunix.

More specifically, this PR adds vllm_additional_config and vllm_hf_config_path as new arguments such that these values can be pipelined to Tunix for RL.

Additionally, this PR makes some small modifications to tunix_adapter.py to allow for no-ops to be used as mappings when running RL using MaxText for vLLM.

Tests

Gemma3-4B:

Local (v6e-4 VM):

NEW_MODEL_DESIGN=True  HF_TOKEN=$HF_TOKEN TPU_BACKEND_TYPE=jax python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
  model_name=gemma3-4b \
  tokenizer_path=google/gemma-3-4b-it \
  run_name=$WORKLOAD \
  base_output_directory=$OUTPUT_PATH \
  hf_access_token=$HF_TOKEN \
scan_layers=False \
load_parameters_path="gs://maxtext-gemma/unified/gemma3/4b/unscanned/2025-08-09-01-17/0/items" \
  vllm_hf_config_path=src/MaxText/integration/vllm/maxtext_vllm_adapter   vllm_additional_config='{"maxtext_config": {"model_name": "gemma3-4b", "max_prefill_predict_length": 28, "max_target_length": 32, "ici_tensor_parallelism": 4}}'

Output: logs

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@NicoGrande NicoGrande force-pushed the nicogrande/maxtext-vllm-rl-integration branch 2 times, most recently from ff0eb43 to bc49da1 Compare December 4, 2025 23:31
Fix formatting.

Refactor model creation and error handling in RL training

fix linting.

adding no-op mappings to tunix adapter.

removing kvcache init for vllm case.
Copy link
Collaborator

@gagika gagika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants