Skip to content

Commit 5f963b0

Browse files
committed
Add post-training landing page
1 parent 2c0edb7 commit 5f963b0

5 files changed

Lines changed: 77 additions & 8 deletions

File tree

docs/_static/grpo_diagram.png

80.1 KB
Loading

docs/_static/rl_workflow.png

92.7 KB
Loading

docs/tutorials.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,15 +35,11 @@ See the <a href="https://www.kaggle.com/code/shivajidutta/maxtext-on-kaggle" tar
3535

3636
You can also find other examples in the [MaxText repository](https://github.com/AI-Hypercomputer/maxtext/tree/main/pedagogical_examples).
3737

38+
39+
3840
```{toctree}
3941
:maxdepth: 1
4042
41-
tutorials/first_run.md
42-
tutorials/pretraining.md
43-
tutorials/full_finetuning.md
44-
tutorials/how_to_run_colabs.md
45-
tutorials/grpo.md
46-
tutorials/sft.md
47-
tutorials/sft_on_multi_host.md
48-
tutorials/grpo_with_pathways.md
43+
tutorials/pre_training_index.md
44+
tutorials/post_training_index.md
4945
```
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Post training
2+
3+
## What is MaxText post training?
4+
5+
MaxText provides performance and scalable LLM and VLM post-training, across a variety of techniques like SFT and GRPO.
6+
7+
We’re investing in performance, scale, algorithms, models, reliability, and ease of use to provide the most competitive OSS solution available.
8+
9+
## The MaxText stack
10+
11+
MaxText was co-designed with key Google led innovations to provide a unified post training experience:
12+
- [MaxText model library](https://maxtext.readthedocs.io/en/latest/index.html#model-library) for JAX LLMs highly optimized for TPUs
13+
- [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques
14+
- [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL)
15+
- [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer
16+
17+
![GRPO Diagram](../_static/grpo_diagram.png)
18+
19+
## Supported techniques & models
20+
21+
- **SFT (Supervised Fine-Tuning)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/sft.html)
22+
- Supports all MaxText models
23+
- **Multimodal SFT** [(link)](https://maxtext.readthedocs.io/en/latest/guides/multimodal.html)
24+
- **GRPO (Group Relative Policy Optimization)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/grpo.html)
25+
- Llama 3.1 8B
26+
- Llama 3.1 70B
27+
- **GSPO-token**
28+
- Coming soon
29+
30+
## Step by step RL
31+
32+
Making powerful RL accessible is at the core of the MaxText mission
33+
34+
Here is an example of the steps you might go through to run a Reinforcement Learning (RL) job:
35+
36+
![RL Workflow](../_static/rl_workflow.png)
37+
38+
## What is Pathways and why is it key for RL?
39+
40+
Pathways is a single controller JAX runtime that was [designed and pressure tested internally at Google DeepMind](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/) over many years. Now available on Google Cloud, it is designed to coordinate distributed computations across thousands of accelerators from a single Python program. It efficiently performs data transfers between accelerators both within a slice using ICI (Inter-chip Interconnect) and across slices over DCN (Data Center Network).
41+
42+
Pathways allows for fine grained resource allocation (subslice of a physical slice) and scheduling. This allows JAX developers to explore novel model architectures in an easy to develop single controller programming environment.
43+
44+
Pathways supercharges RL with:
45+
1. **Multi-host Model Support:** Easily manages models that span multiple hosts.
46+
1. **Unified Orchestration:** Controls both trainers and samplers from a single Python process.
47+
1. **Efficient Data Transfer:** Optimally moves data between training and inference devices, utilizing ICI or DCN as needed. JAX Reshard primitives simplify integration.
48+
1. **Flexible Resource Allocation:** Enables dedicating different numbers of accelerators to inference and training within the same job, adapting to workload bottlenecks (disaggregated setup).
49+
50+
## Getting started
51+
52+
Start your Post-Training journey through quick experimentation with our [Google Colabs](https://maxtext.readthedocs.io/en/latest/tutorials/how_to_run_colabs.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/sft_on_multi_host.html) and [GRPO](https://maxtext.readthedocs.io/en/latest/tutorials/grpo_with_pathways.html).
53+
54+
## More tutorials
55+
56+
```{toctree}
57+
:maxdepth: 1
58+
59+
full_finetuning.md
60+
how_to_run_colabs.md
61+
grpo.md
62+
sft.md
63+
sft_on_multi_host.md
64+
grpo_with_pathways.md
65+
```
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Pre training
2+
3+
```{toctree}
4+
:maxdepth: 1
5+
6+
first_run.md
7+
pretraining.md
8+
```

0 commit comments

Comments
 (0)