|
| 1 | +# Post training |
| 2 | + |
| 3 | +## What is MaxText post training? |
| 4 | + |
| 5 | +MaxText provides performance and scalable LLM and VLM post-training, across a variety of techniques like SFT and GRPO. |
| 6 | + |
| 7 | +We’re investing in performance, scale, algorithms, models, reliability, and ease of use to provide the most competitive OSS solution available. |
| 8 | + |
| 9 | +## The MaxText stack |
| 10 | + |
| 11 | +MaxText was co-designed with key Google led innovations to provide a unified post training experience: |
| 12 | +- [MaxText model library](https://maxtext.readthedocs.io/en/latest/index.html#model-library) for JAX LLMs highly optimized for TPUs |
| 13 | +- [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques |
| 14 | +- [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL) |
| 15 | +- [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +## Supported techniques & models |
| 20 | + |
| 21 | +- **SFT (Supervised Fine-Tuning)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/sft.html) |
| 22 | + - Supports all MaxText models |
| 23 | +- **Multimodal SFT** [(link)](https://maxtext.readthedocs.io/en/latest/guides/multimodal.html) |
| 24 | +- **GRPO (Group Relative Policy Optimization)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/grpo.html) |
| 25 | + - Llama 3.1 8B |
| 26 | + - Llama 3.1 70B |
| 27 | +- **GSPO-token** |
| 28 | + - Coming soon |
| 29 | + |
| 30 | +## Step by step RL |
| 31 | + |
| 32 | +Making powerful RL accessible is at the core of the MaxText mission |
| 33 | + |
| 34 | +Here is an example of the steps you might go through to run a Reinforcement Learning (RL) job: |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | +## What is Pathways and why is it key for RL? |
| 39 | + |
| 40 | +Pathways is a single controller JAX runtime that was [designed and pressure tested internally at Google DeepMind](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/) over many years. Now available on Google Cloud, it is designed to coordinate distributed computations across thousands of accelerators from a single Python program. It efficiently performs data transfers between accelerators both within a slice using ICI (Inter-chip Interconnect) and across slices over DCN (Data Center Network). |
| 41 | + |
| 42 | +Pathways allows for fine grained resource allocation (subslice of a physical slice) and scheduling. This allows JAX developers to explore novel model architectures in an easy to develop single controller programming environment. |
| 43 | + |
| 44 | +Pathways supercharges RL with: |
| 45 | +1. **Multi-host Model Support:** Easily manages models that span multiple hosts. |
| 46 | +1. **Unified Orchestration:** Controls both trainers and samplers from a single Python process. |
| 47 | +1. **Efficient Data Transfer:** Optimally moves data between training and inference devices, utilizing ICI or DCN as needed. JAX Reshard primitives simplify integration. |
| 48 | +1. **Flexible Resource Allocation:** Enables dedicating different numbers of accelerators to inference and training within the same job, adapting to workload bottlenecks (disaggregated setup). |
| 49 | + |
| 50 | +## Getting started |
| 51 | + |
| 52 | +Start your Post-Training journey through quick experimentation with our [Google Colabs](https://maxtext.readthedocs.io/en/latest/tutorials/how_to_run_colabs.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/sft_on_multi_host.html) and [GRPO](https://maxtext.readthedocs.io/en/latest/tutorials/grpo_with_pathways.html). |
| 53 | + |
| 54 | +## More tutorials |
| 55 | + |
| 56 | +```{toctree} |
| 57 | +:maxdepth: 1 |
| 58 | +
|
| 59 | +full_finetuning.md |
| 60 | +how_to_run_colabs.md |
| 61 | +grpo.md |
| 62 | +sft.md |
| 63 | +sft_on_multi_host.md |
| 64 | +grpo_with_pathways.md |
| 65 | +``` |
0 commit comments