Skip to content

Commit 60028c4

Browse files
Merge pull request #2718 from melissawm:post-training
PiperOrigin-RevId: 839430301
2 parents 89325d4 + c2283e2 commit 60028c4

5 files changed

Lines changed: 75 additions & 27 deletions

File tree

docs/_static/grpo_diagram.png

123 KB
Loading

docs/_static/rl_workflow.png

92.7 KB
Loading

docs/tutorials.md

Lines changed: 2 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -16,34 +16,9 @@
1616

1717
# Tutorials
1818

19-
For your first time running MaxText, we provide specific [instructions](first-run).
20-
21-
MaxText supports training and inference of various open models.
22-
23-
Some extra helpful guides:
24-
* [Gemma](https://ai.google.dev/gemma): a family of open-weights Large Language Model (LLM) by [Google DeepMind](https://deepmind.google/), based on Gemini research and technology. You can run decode and finetuning using [these instructions](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/gemma/Run_Gemma.md).
25-
* [Llama2](https://llama.meta.com/llama2/): a family of open-weights Large Language Model (LLM) by Meta. You can run decode and finetuning using [these instructions](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/llama2/run_llama2.md).
26-
* [Mixtral](https://mistral.ai/news/mixtral-of-experts/): a family of open-weights sparse mixture-of-experts (MoE) model by Mistral AI. You can run decode and finetuning using [these instructions](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/mixtral/Run_Mixtral.md)
27-
28-
In addition to the getting started guides, there are always other MaxText capabilities that are being constantly being added! The full suite of end-to-end tests is in [end_to_end](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end). We run them with a nightly cadence. They can be a good source for understanding MaxText Alternatively you can see the continuous [unit tests](https://github.com/AI-Hypercomputer/maxtext/blob/main/.github/workflows/RunTests.yml) which are run almost continuously.
29-
30-
## End-to-end example
31-
32-
See the <a href="https://www.kaggle.com/code/shivajidutta/maxtext-on-kaggle" target="_blank">MaxText example Kaggle notebook</a>.
33-
34-
## Other examples
35-
36-
You can also find other examples in the [MaxText repository](https://github.com/AI-Hypercomputer/maxtext/tree/main/pedagogical_examples).
37-
3819
```{toctree}
3920
:maxdepth: 1
4021
41-
tutorials/first_run.md
42-
tutorials/pretraining.md
43-
tutorials/full_finetuning.md
44-
tutorials/how_to_run_colabs.md
45-
tutorials/grpo.md
46-
tutorials/sft.md
47-
tutorials/sft_on_multi_host.md
48-
tutorials/grpo_with_pathways.md
22+
tutorials/pre_training_index.md
23+
tutorials/post_training_index.md
4924
```
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Post training
2+
3+
## What is MaxText post training?
4+
5+
MaxText provides performance and scalable LLM and VLM post-training, across a variety of techniques like SFT and GRPO.
6+
7+
We’re investing in performance, scale, algorithms, models, reliability, and ease of use to provide the most competitive OSS solution available.
8+
9+
## The MaxText stack
10+
11+
MaxText was co-designed with key Google led innovations to provide a unified post training experience:
12+
- [MaxText model library](https://maxtext.readthedocs.io/en/latest/index.html#model-library) for JAX LLMs highly optimized for TPUs
13+
- [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques
14+
- [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL)
15+
- [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer
16+
17+
![GRPO Diagram](../_static/grpo_diagram.png)
18+
19+
## Supported techniques & models
20+
21+
- **SFT (Supervised Fine-Tuning)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/sft.html)
22+
- Supports all MaxText models
23+
- **Multimodal SFT** [(link)](https://maxtext.readthedocs.io/en/latest/guides/multimodal.html)
24+
- **GRPO (Group Relative Policy Optimization)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/grpo.html)
25+
- Llama 3.1 8B
26+
- Llama 3.1 70B
27+
- **GSPO-token**
28+
- Coming soon
29+
30+
## Step by step RL
31+
32+
Making powerful RL accessible is at the core of the MaxText mission
33+
34+
Here is an example of the steps you might go through to run a Reinforcement Learning (RL) job:
35+
36+
![RL Workflow](../_static/rl_workflow.png)
37+
38+
## What is Pathways and why is it key for RL?
39+
40+
Pathways is a single controller JAX runtime that was [designed and pressure tested internally at Google DeepMind](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/) over many years. Now available on Google Cloud, it is designed to coordinate distributed computations across thousands of accelerators from a single Python program. It efficiently performs data transfers between accelerators both within a slice using ICI (Inter-chip Interconnect) and across slices over DCN (Data Center Network).
41+
42+
Pathways allows for fine grained resource allocation (subslice of a physical slice) and scheduling. This allows JAX developers to explore novel model architectures in an easy to develop single controller programming environment.
43+
44+
Pathways supercharges RL with:
45+
1. **Multi-host Model Support:** Easily manages models that span multiple hosts.
46+
1. **Unified Orchestration:** Controls both trainers and samplers from a single Python process.
47+
1. **Efficient Data Transfer:** Optimally moves data between training and inference devices, utilizing ICI or DCN as needed. JAX Reshard primitives simplify integration.
48+
1. **Flexible Resource Allocation:** Enables dedicating different numbers of accelerators to inference and training within the same job, adapting to workload bottlenecks (disaggregated setup).
49+
50+
## Getting started
51+
52+
Start your Post-Training journey through quick experimentation with our [Google Colabs](https://maxtext.readthedocs.io/en/latest/tutorials/how_to_run_colabs.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/sft_on_multi_host.html) and [GRPO](https://maxtext.readthedocs.io/en/latest/tutorials/grpo_with_pathways.html).
53+
54+
## More tutorials
55+
56+
```{toctree}
57+
:maxdepth: 1
58+
59+
full_finetuning.md
60+
how_to_run_colabs.md
61+
grpo.md
62+
sft.md
63+
sft_on_multi_host.md
64+
grpo_with_pathways.md
65+
```
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Pre training
2+
3+
```{toctree}
4+
:maxdepth: 1
5+
6+
first_run.md
7+
pretraining.md
8+
```

0 commit comments

Comments
 (0)