Add Megatron-Bridge LoRA support for GRPO actor training by taivu1998 · Pull Request #1865 · THUDM/slime

taivu1998 · 2026-04-26T19:23:47Z

Summary

Addresses #1202.

This PR adds a first supported Megatron-Bridge LoRA path for dense GRPO actor training in slime. It introduces LoRA CLI flags, validates the initially supported configuration at startup, applies Megatron-Bridge PEFT LoRA only to the actor model, and exports effective actor weights to SGLang by temporarily merging adapters into the live model during bridge-based HF weight conversion.

Why

Issue #1202 asks for LoRA support for GRPO training and examples. The discussion also calls out known Megatron-Bridge LoRA risk around MoE and checkpointing paths, so this implementation intentionally starts with a narrow, guarded dense-model path rather than silently enabling unsupported combinations.

Changes

Added --enable-lora, --lora-target-modules, --lora-rank, --lora-alpha, and --lora-dropout.
Added validation for the first supported LoRA slice:
- Megatron backend.
- Megatron-Bridge HF export mode.
- GRPO actor training.
- Colocated rollout outside debug train-only runs.
- Dense models only.
- Default weight backuper enabled.
Added a Megatron LoRA helper module that:
- lazily imports Megatron-Bridge PEFT LoRA.
- builds the LoRA config from slime args.
- applies LoRA only to the actor provider path.
- logs local trainable and total parameter counts.
- restores TensorBackuper-style model weights around temporary LoRA merge export.
Updated bridge weight export so LoRA runs export base + adapter effective actor weights to SGLang, then restore unmerged training weights.
Added focused unit tests for validation, config mapping, actor-only application, merge traversal, and backup restore safety.
Added an English advanced usage page with a concrete GRPO LoRA flag example and linked it from the docs index.

Guardrails

The PR rejects combinations that need separate parity work before support:

MoE models.
PPO or critic-based training.
Decoupled rollout mode outside --debug-train-only.
Custom model providers.
--only-train-params-name-list and --freeze-params-name-list.
On-policy distillation.
Reference model update intervals.
--disable-weights-backuper.

Validation

env UV_CACHE_DIR=/tmp/uv-cache PYTHONPATH=. uv run pytest tests/test_lora_support.py -> 22 passed
python3 -m py_compile slime/backends/megatron_utils/peft.py slime/backends/megatron_utils/model_provider.py slime/backends/megatron_utils/model.py slime/backends/megatron_utils/update_weight/hf_weight_iterator_bridge.py slime/utils/arguments.py tests/test_lora_support.py
git diff --check

uv run ruff check ... was attempted locally but could not run because this worktree environment does not have a ruff executable installed.

Add Megatron-Bridge LoRA GRPO support

899044c

taivu1998 marked this pull request as ready for review April 26, 2026 23:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Megatron-Bridge LoRA support for GRPO actor training#1865

Add Megatron-Bridge LoRA support for GRPO actor training#1865
taivu1998 wants to merge 1 commit intoTHUDM:mainfrom
taivu1998:tdv/issue-1202-lora-grpo

taivu1998 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taivu1998 commented Apr 26, 2026

Summary

Why

Changes

Guardrails

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant