AI-Hypercomputer
diff --git a/‎docs/_static/grpo_diagram.png‎
123 KB b/‎docs/_static/grpo_diagram.png‎
123 KB
diff --git a/‎docs/_static/rl_workflow.png‎
92.7 KB b/‎docs/_static/rl_workflow.png‎
92.7 KB
diff --git a/‎docs/tutorials.md‎
Lines changed: 2 additions & 27 deletions b/‎docs/tutorials.md‎
Lines changed: 2 additions & 27 deletions
diff --git a/‎docs/tutorials/post_training_index.md‎
Lines changed: 65 additions & 0 deletions b/‎docs/tutorials/post_training_index.md‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎docs/tutorials/pre_training_index.md‎
Lines changed: 8 additions & 0 deletions b/‎docs/tutorials/pre_training_index.md‎
Lines changed: 8 additions & 0 deletions
@@ -16,34 +16,9 @@
 
 # Tutorials
 
-For your first time running MaxText, we provide specific [instructions](first-run).
-
-MaxText supports training and inference of various open models.
-
-Some extra helpful guides:
-* [Gemma](https://ai.google.dev/gemma): a family of open-weights Large Language Model (LLM) by [Google DeepMind](https://deepmind.google/), based on Gemini research and technology. You can run decode and finetuning using [these instructions](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/gemma/Run_Gemma.md).
-* [Llama2](https://llama.meta.com/llama2/): a family of open-weights Large Language Model (LLM) by Meta. You can run decode and finetuning using [these instructions](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/llama2/run_llama2.md).
-* [Mixtral](https://mistral.ai/news/mixtral-of-experts/): a family of open-weights sparse mixture-of-experts (MoE) model by Mistral AI. You can run decode and finetuning using [these instructions](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/mixtral/Run_Mixtral.md)
-
-In addition to the getting started guides, there are always other MaxText capabilities that are being constantly being added! The full suite of end-to-end tests is in [end_to_end](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end). We run them with a nightly cadence. They can be a good source for understanding MaxText Alternatively you can see the continuous [unit tests](https://github.com/AI-Hypercomputer/maxtext/blob/main/.github/workflows/RunTests.yml) which are run almost continuously.
-
-## End-to-end example
-
-See the <a href="https://www.kaggle.com/code/shivajidutta/maxtext-on-kaggle" target="_blank">MaxText example Kaggle notebook</a>.
-
-## Other examples
-
-You can also find other examples in the [MaxText repository](https://github.com/AI-Hypercomputer/maxtext/tree/main/pedagogical_examples).
-
 ```{toctree}
 :maxdepth: 1
 
-tutorials/first_run.md
-tutorials/pretraining.md
-tutorials/full_finetuning.md
-tutorials/how_to_run_colabs.md
-tutorials/grpo.md
-tutorials/sft.md
-tutorials/sft_on_multi_host.md
-tutorials/grpo_with_pathways.md
+tutorials/pre_training_index.md
+tutorials/post_training_index.md
 ```
@@ -0,0 +1,65 @@
+# Post training
+
+## What is MaxText post training?
+
+MaxText provides performance and scalable LLM and VLM post-training, across a variety of techniques like SFT and GRPO.
+
+We’re investing in performance, scale, algorithms, models, reliability, and ease of use to provide the most competitive OSS solution available.
+
+## The MaxText stack
+
+MaxText was co-designed with key Google led innovations to provide a unified post training experience:
+- [MaxText model library](https://maxtext.readthedocs.io/en/latest/index.html#model-library) for JAX LLMs highly optimized for TPUs
+- [Tunix](https://github.com/google/tunix) for the latest algorithms and post-training techniques
+- [vLLM on TPU](https://github.com/vllm-project/tpu-inference) for high performance sampling (inference) for Reinforcement Learning (RL)
+- [Pathways](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro) for multi-host inference (sampling) and highly efficient weight transfer
+
+![GRPO Diagram](../_static/grpo_diagram.png)
+
+## Supported techniques & models
+
+- **SFT (Supervised Fine-Tuning)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/sft.html)
+    - Supports all MaxText models
+- **Multimodal SFT** [(link)](https://maxtext.readthedocs.io/en/latest/guides/multimodal.html)
+- **GRPO (Group Relative Policy Optimization)** [(link)](https://maxtext.readthedocs.io/en/latest/tutorials/grpo.html)
+    - Llama 3.1 8B
+    - Llama 3.1 70B
+- **GSPO-token**
+    - Coming soon
+
+## Step by step RL
+
+Making powerful RL accessible is at the core of the MaxText mission
+
+Here is an example of the steps you might go through to run a Reinforcement Learning (RL) job:
+
+![RL Workflow](../_static/rl_workflow.png)
+
+## What is Pathways and why is it key for RL?
+
+Pathways is a single controller JAX runtime that was [designed and pressure tested internally at Google DeepMind](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/) over many years. Now available on Google Cloud, it is designed to coordinate distributed computations across thousands of accelerators from a single Python program. It efficiently performs data transfers between accelerators both within a slice using ICI (Inter-chip Interconnect) and across slices over DCN (Data Center Network).
+
+Pathways allows for fine grained resource allocation (subslice of a physical slice) and scheduling. This allows JAX developers to explore novel model architectures in an easy to develop single controller programming environment.
+
+Pathways supercharges RL with:
+1. **Multi-host Model Support:** Easily manages models that span multiple hosts.
+1. **Unified Orchestration:** Controls both trainers and samplers from a single Python process.
+1. **Efficient Data Transfer:** Optimally moves data between training and inference devices, utilizing ICI or DCN as needed. JAX Reshard primitives simplify integration.
+1. **Flexible Resource Allocation:** Enables dedicating different numbers of accelerators to inference and training within the same job, adapting to workload bottlenecks (disaggregated setup).
+
+## Getting started
+
+Start your Post-Training journey through quick experimentation with our [Google Colabs](https://maxtext.readthedocs.io/en/latest/tutorials/how_to_run_colabs.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/sft_on_multi_host.html) and [GRPO](https://maxtext.readthedocs.io/en/latest/tutorials/grpo_with_pathways.html).
+
+## More tutorials
+
+```{toctree}
+:maxdepth: 1
+
+full_finetuning.md
+how_to_run_colabs.md
+grpo.md
+sft.md
+sft_on_multi_host.md
+grpo_with_pathways.md
+```
@@ -0,0 +1,8 @@
+# Pre training
+
+```{toctree}
+:maxdepth: 1
+
+first_run.md
+pretraining.md
+```