Skip to content

Commit 89325d4

Browse files
Merge pull request #2517 from melissawm:landing-page
PiperOrigin-RevId: 839425232
2 parents c7acd26 + 21a1a9d commit 89325d4

6 files changed

Lines changed: 324 additions & 114 deletions

File tree

README.md

Lines changed: 43 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@
2020

2121
> **_NOTE:_** We recommend running MaxText with Python 3.12, as it is our primary supported version. Other Python versions may encounter compatibility issues.
2222
23-
MaxText is a high performance, highly scalable, open-source LLM library and reference implementation written in pure Python/[JAX](https://docs.jax.dev/en/latest/jax-101.html) and targeting Google Cloud TPUs and GPUs for training.
23+
MaxText is a high performance, highly scalable, open-source LLM library and reference implementation written in pure Python/[JAX](https://docs.jax.dev/en/latest/jax-101.html) and targeting Google Cloud TPUs and GPUs for training.
2424

2525
MaxText provides a library of high performance models to choose from, including Gemma, Llama, DeepSeek, Qwen, and Mistral. For each of these models, MaxText supports pre-training (up to tens of thousands of chips) and scalable post-training, with popular techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO, a type of Reinforcement Learning) and Group Sequence Policy Optimization (GSPO, a type of Reinforcement Learning).
2626

2727
MaxText achieves high Model FLOPs Utilization (MFU) and tokens/second from single host to very large clusters while staying simple and largely "optimization-free" thanks to the power of JAX and the XLA compiler.
2828

2929
MaxText is the launching point for ambitious LLM projects both in research and production. We encourage you to start by experimenting with MaxText out of the box and then fork and modify MaxText to meet your needs.
3030

31-
Check out our [Read The Docs site](https://maxtext.readthedocs.io/en/latest/) or directly [Get Started](https://maxtext.readthedocs.io/en/latest/tutorials/first_run.html) with your first MaxText run. If you’re interested in Diffusion models (Wan 2.1, Flux, etc), see the [MaxDiffusion](https://github.com/AI-Hypercomputer/maxdiffusion) repository in our AI Hypercomputer GitHub organization.
31+
Check out our [Read The Docs site](https://maxtext.readthedocs.io/en/latest/) or directly [Get Started](https://maxtext.readthedocs.io/en/latest/tutorials/first_run.html) with your first MaxText run. If you’re interested in Diffusion models (Wan 2.1, Flux, etc), see the [MaxDiffusion](https://github.com/AI-Hypercomputer/maxdiffusion) repository in our AI Hypercomputer GitHub organization.
3232

3333
## Installation
3434

@@ -37,72 +37,80 @@ See our installation guide to [install MaxText with pip](https://maxtext.readthe
3737
## Decoupled mode
3838
See our guide on running MaxText in decoupled mode, without any GCP dependencies in [Decoupled Mode Guide](https://maxtext.readthedocs.io/en/latest/guides/run_maxtext/decoupled_mode.html).
3939

40+
<!-- NEWS START -->
41+
4042
## 🔥 Latest news 🔥
4143

4244
* \[September 26, 2025\] Vocabulary tiling ([PR](https://github.com/AI-Hypercomputer/maxtext/pull/2242)) is now supported in MaxText! Adjust config `num_vocab_tiling` to unlock more efficient memory usage.
4345
* \[September 24, 2025\] The GPT-OSS family of models (20B, 120B) is now supported.
4446
* \[September 15, 2025\] MaxText is now available as a [PyPI package](https://pypi.org/project/maxtext). Users can now [install maxtext through pip](https://maxtext.readthedocs.io/en/latest/guides/install_maxtext.html).
45-
* \[September 5, 2025\] MaxText has moved to an `src` layout as part of [RESTRUCTURE.md](RESTRUCTURE.md). For existing environments, please run `pip install -e .` from MaxText root.
46-
* \[August 13, 2025\] The Qwen3 2507 MoE family of models is now supported: MoEs: 235B Thinking & 280B Coder as well as existing dense models: 0.6B, 4B, 8B, 14B, and 32B.
47-
* \[July 27, 2025\] Updated TFLOPS/s calculation ([PR](https://github.com/AI-Hypercomputer/maxtext/pull/1988)) to account for causal attention, dividing the attention flops in half. Accounted for sliding window and chunked attention reduced attention flops in [PR](https://github.com/AI-Hypercomputer/maxtext/pull/2009) and [PR](https://github.com/AI-Hypercomputer/maxtext/pull/2030). Changes impact large sequence configs, as explained in this [doc](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/guides/performance_metrics.md)
48-
* \[July 16, 2025\] We will be restructuring the MaxText repository for improved organization and clarity. Please review the [proposed structure](https://github.com/AI-Hypercomputer/maxtext/blob/main/RESTRUCTURE.md) and provide feedback.
49-
* \[July 11, 2025\] Multi-Token Prediction (MTP) training support\! Adds an auxiliary loss based on predicting multiple future tokens, inspired by [DeepSeek-V3 paper](https://arxiv.org/html/2412.19437v1), to enhance training efficiency.
50-
* \[June 25, 2025\] DeepSeek R1-0528 variant is now supported.
47+
* \[September 5, 2025\] MaxText has moved to an `src` layout as part of [RESTRUCTURE.md](https://github.com/AI-Hypercomputer/maxtext/blob/aca5b24931ebcbadb55a82e56ebffe8024874028/RESTRUCTURE.md). For existing environments, please run `pip install -e .` from MaxText root.
48+
* \[August 13, 2025\] The Qwen3 2507 MoE family of models is now supported: MoEs: 235B Thinking & 280B Coder as well as existing dense models: 0.6B, 4B, 8B, 14B, and 32B.
49+
* \[July 27, 2025\] Updated TFLOPS/s calculation ([PR](https://github.com/AI-Hypercomputer/maxtext/pull/1988)) to account for causal attention, dividing the attention flops in half. Accounted for sliding window and chunked attention reduced attention flops in [PR](https://github.com/AI-Hypercomputer/maxtext/pull/2009) and [PR](https://github.com/AI-Hypercomputer/maxtext/pull/2030). Changes impact large sequence configs, as explained in this [doc](https://maxtext.readthedocs.io/en/latest/explanations/performance_metrics.html)
50+
* \[July 16, 2025\] We will be restructuring the MaxText repository for improved organization and clarity. Please review the [proposed structure](https://github.com/AI-Hypercomputer/maxtext/blob/aca5b24931ebcbadb55a82e56ebffe8024874028/RESTRUCTURE.md) and provide feedback.
51+
* \[July 11, 2025\] Multi-Token Prediction (MTP) training support\! Adds an auxiliary loss based on predicting multiple future tokens, inspired by [DeepSeek-V3 paper](https://arxiv.org/html/2412.19437v1), to enhance training efficiency.
52+
* \[June 25, 2025\] DeepSeek R1-0528 variant is now supported.
5153
* \[April 24, 2025\] Llama 4 Maverick models are now supported.
54+
<!-- NEWS END -->
5255

5356
## Use cases
5457

55-
MaxText provides a library of models and demonstrates how to perform pre-training or post-training with high performance and scale.
58+
MaxText provides a library of models and demonstrates how to perform pre-training or post-training with high performance and scale.
5659

5760
MaxText leverages [JAX AI libraries](https://docs.jaxstack.ai/en/latest/getting_started.html) and presents a cohesive and comprehensive demonstration of training at scale by using [Flax](https://flax.readthedocs.io/en/latest/) (neural networks), [Tunix](https://github.com/google/tunix) (post-training), [Orbax](https://orbax.readthedocs.io/en/latest/) (checkpointing), [Optax](https://optax.readthedocs.io/en/latest/) (optimization), and [Grain](https://google-grain.readthedocs.io/en/latest/) (dataloading).
5861

5962
In addition to pure text-based LLMs, we also support multi-modal training with Gemma 3 and Llama 4 VLMs.
6063

6164
### Pre-training
6265

63-
If you’re building models from scratch, MaxText can serve as a reference implementation for experimentation, ideation, and inspiration \- just fork and modify MaxText to train your model, whether it’s a small dense model like Llama 8B, or a large MoE like DeepSeek-V3. Experiment with configs and model design to build the most efficient model on TPU or GPU.
66+
If you’re building models from scratch, MaxText can serve as a reference implementation for experimentation, ideation, and inspiration \- just fork and modify MaxText to train your model, whether it’s a small dense model like Llama 8B, or a large MoE like DeepSeek-V3. Experiment with configs and model design to build the most efficient model on TPU or GPU.
6467

65-
MaxText provides opinionated implementations for how to achieve optimal performance across a wide variety of dimensions like sharding, quantization, and checkpointing.
68+
MaxText provides opinionated implementations for how to achieve optimal performance across a wide variety of dimensions like sharding, quantization, and checkpointing.
6669

6770
### Post-training
6871

69-
If you are post-training a model, whether it is proprietary or open source, MaxText provides a scalable framework using Tunix. For RL (like GRPO), we leverage vLLM for sampling and Pathways (soon) for multi-host.
72+
If you are post-training a model, whether it is proprietary or open source, MaxText provides a scalable framework using Tunix. For RL (like GRPO), we leverage vLLM for sampling and Pathways (soon) for multi-host.
7073

7174
Our goal is to provide a variety of models (dimension “a”) and techniques (dimension “b”), so you can easily explore (a) \* (b) combinations and efficiently train the perfect model for your use case.
7275

7376
Check out these getting started guides:
7477

75-
* [SFT](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/llama3.1/8b/run_sft.sh) (Supervised Fine Tuning)
76-
* [GRPO / GSPO](https://maxtext.readthedocs.io/en/latest/tutorials/grpo.html) (Group Relative & Group Sequence Policy Optimization – pass `loss_algo=gspo-token` to run GSPO)
78+
* Supervised Fine Tuning (SFT)
79+
* [SFT on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/sft.html)
80+
* [SFT on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/sft_on_multi_host.html)
81+
* Group Relative & Group Sequence Policy Optimization (GRPO & GSPO)
82+
* [GRPO on Single-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/grpo.html)
83+
* [GRPO on Multi-Host TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/grpo_with_pathways.html)
84+
* [GSPO](https://maxtext.readthedocs.io/en/latest/tutorials/grpo.html#run-gspo) (pass `loss_algo=gspo-token` to run GSPO)
7785

7886
### Model library
7987

80-
MaxText aims to provide you with the best OSS models, whether as a reference implementation, or to post-train and then serve with vLLM.
88+
MaxText aims to provide you with the best OSS models, whether as a reference implementation, or to post-train and then serve with vLLM.
8189

8290
**Supported JAX models in MaxText**
8391

84-
* Google
85-
* Gemma 3 (4B, 12B, 27B)
86-
* Gemma 2 (2B, 9B, 27B)
87-
* Gemma 1 (2B, 7B)
88-
* Alibaba
89-
* Qwen 3 MoE 2507 (235B, 480B)
90-
* Qwen 3 MoE (30B, 235B)
91-
* Qwen 3 Dense (0.6B, 1.7B, 4B, 8B, 14B, 32B)
92-
* DeepSeek
92+
* Google
93+
* Gemma 3 (4B, 12B, 27B)
94+
* Gemma 2 (2B, 9B, 27B)
95+
* Gemma 1 (2B, 7B)
96+
* Alibaba
97+
* Qwen 3 MoE 2507 (235B, 480B)
98+
* Qwen 3 MoE (30B, 235B)
99+
* Qwen 3 Dense (0.6B, 1.7B, 4B, 8B, 14B, 32B)
100+
* DeepSeek
93101
* DeepSeek-V3 0324 (671B) & DeepSeek-R1 0528 (671B)
94-
* DeepSeek-V2 (16B, 236B)
95-
* Meta
96-
* Llama 4 Scout (109B) & Maverick (400B)
97-
* Llama 3.3 70B, 3.1 (8B, 70B, 405B), 3.0 (8B, 70B, 405B)
98-
* Llama 2 (7B, 13B, 70B)
99-
* Open AI
102+
* DeepSeek-V2 (16B, 236B)
103+
* Meta
104+
* Llama 4 Scout (109B) & Maverick (400B)
105+
* Llama 3.3 70B, 3.1 (8B, 70B, 405B), 3.0 (8B, 70B, 405B)
106+
* Llama 2 (7B, 13B, 70B)
107+
* Open AI
100108
* GPT-OSS (20B, 120B)
101-
* GPT3 (52K, 6B, 22B, 175B)
102-
* Mistral
103-
* Mixtral (8x7B, 8x22B)
104-
* Mistral (7B)
105-
* Diffusion Models
109+
* GPT3 (52K, 6B, 22B, 175B)
110+
* Mistral
111+
* Mixtral (8x7B, 8x22B)
112+
* Mistral (7B)
113+
* Diffusion Models
106114
* See [MaxDiffusion](https://github.com/AI-Hypercomputer/maxdiffusion) (LTXV, Wan 2.1, Flux, SDXL, etc)
107115

108116
## Get involved

docs/_static/css/custom.css

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
/* Base file modifications */
2+
body:has(.hero) .bd-sidebar-primary,
3+
body:has(.hero) .sidebar-toggle,
4+
body:has(.hero) .bd-sidebar-secondary {
5+
display: none !important;
6+
}
7+
8+
body:has(.hero) .prev-next-footer {
9+
display: none;
10+
}
11+
12+
body:has(.hero) .bd-article-container {
13+
max-width: unset !important;
14+
}
15+
16+
body:has(.hero) .bd-page-width {
17+
max-width: unset !important;
18+
}
19+
20+
body:has(.hero) .bd-article {
21+
display: flex;
22+
flex-direction: column;
23+
padding: 0;
24+
}
25+
26+
body:has(.hero) .bd-container {
27+
flex-direction: column;
28+
}
29+
30+
body:has(.hero) .bd-article > section > h1 {
31+
display: none;
32+
}
33+
34+
@media (min-width: 960px) {
35+
body:has(.hero) .bd-header-article {
36+
justify-content: center;
37+
}
38+
39+
body:has(.hero) .header-article-items,
40+
body:has(.hero) .doc-body > section {
41+
max-width: 80% !important;
42+
align-self: center;
43+
width: -moz-available;
44+
width: -webkit-fill-available;
45+
width: fill-available;
46+
}
47+
48+
body:has(.hero) .doc-body > section.hero {
49+
max-width: 90rem !important;
50+
}
51+
52+
body:has(.hero) .doc-body > section.banner {
53+
max-width: 80rem !important;
54+
}
55+
}
56+
57+
/* -------------- Page styles ---------------- */
58+
.doc-body {
59+
display: flex;
60+
flex-direction: column;
61+
}
62+
63+
.hero {
64+
display: flex;
65+
flex-direction: column;
66+
justify-content: space-between;
67+
align-items: center;
68+
background: black;
69+
border-radius: 24px;
70+
max-width: 90%;
71+
}
72+
73+
.hero-image {
74+
width: 100%;
75+
border-radius: 24px 24px 0 0;
76+
}
77+
78+
.hero-text {
79+
display: flex;
80+
max-width: 70%;
81+
margin: 32px;
82+
}
83+
84+
.hero-text h1 {
85+
font: 700 52px 'Google Sans', 'Roboto', sans-serif;
86+
color: white;
87+
}
88+
89+
.hero-text h3 {
90+
font: 400 20px 'Roboto', sans-serif;
91+
color: #bdc1c6;
92+
}
93+
94+
@media (max-width: 1240px) {
95+
.hero {
96+
max-width: 100%;
97+
}
98+
.hero-text h1 {
99+
font-size: 42px;
100+
margin: 24px;
101+
}
102+
}
103+
104+
.hero-cta {
105+
display: flex;
106+
flex-direction: row;
107+
gap: 10px;
108+
flex-wrap: wrap;
109+
margin-top: 20px;
110+
}
111+
112+
.button-primary {
113+
background: #1A73E8;
114+
border-radius: 4px;
115+
color: white;
116+
font: 400 14px 'Google Sans', 'Roboto', sans-serif;
117+
text-decoration: none;
118+
padding: 9px 26px;
119+
transition: background-color .2s, border .2s, box-shadow .2s;
120+
width: max-content;
121+
}
122+
123+
.button-primary:visited:hover {
124+
color: white !important;
125+
}
126+
127+
.button-primary:hover {
128+
background-color: #1765cc;
129+
color: white;
130+
transition: background-color .2s, border .2s, box-shadow .2s;
131+
box-shadow: 0 1px 2px 0 rgba(60,64,67,.3),0 1px 3px 1px rgba(60,64,67,.15);
132+
}
133+
134+
.button-primary:active {
135+
background-color: #185abc;
136+
color: white;
137+
box-shadow: 0 1px 2px 0 rgba(60,64,67,.3),0 2px 6px 2px rgba(60,64,67,.15);
138+
}
139+
140+
.button-primary:visited {
141+
color: white;
142+
text-decoration: none;
143+
}
144+
145+
.banner {
146+
background: #E8F0FE;
147+
border-radius: 24px;
148+
margin-block: 80px 80px;
149+
padding-inline: 50px;
150+
padding-bottom: 24px;
151+
}
152+
153+
.three-up {
154+
display: grid;
155+
grid: auto-flow / 1fr 1fr 1fr;
156+
gap: 24px;
157+
padding-inline: 60px;
158+
margin-top: 20px;
159+
}
160+
161+
.text-body {
162+
display: flex;
163+
flex-direction: column;
164+
justify-content: center;
165+
padding-left: 10px;
166+
padding-right: 20px;
167+
}
168+
169+
@media (max-width: 860px) {
170+
.three-up {
171+
grid: auto-flow / 1fr;
172+
padding-inline: 0;
173+
}
174+
175+
.three-up h3 {
176+
margin-top: 8px;
177+
}
178+
}
179+
180+
html[data-theme="dark"] .text-body {
181+
color: #bdc1c6 !important;
182+
}
183+
184+
html[data-theme="dark"] .banner {
185+
background-color: #22252c;
186+
}
187+
188+
html[data-theme="dark"] .button-primary {
189+
background-color: #8ab4f8;
190+
color: #121212;
191+
}
192+
193+
html[data-theme="dark"] .button-primary:hover {
194+
background-color: #98bdf9;
195+
color: #121212;
196+
}
197+
198+
html[data-theme="dark"] .button-primary:active {
199+
background-color: #aecbfa;
200+
color: #121212;
201+
}
202+
203+
html[data-theme="dark"] .button-primary:visited:hover {
204+
color: #121212 !important;
205+
}
206+
207+
.latest-news {
208+
max-width: 90rem !important;
209+
padding-inline: 50px;
210+
padding-bottom: 24px;
211+
}

docs/_static/maxtext.png

3.51 MB
Loading

docs/conf.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,9 @@
4646
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
4747

4848
html_theme = "sphinx_book_theme"
49-
html_static_path = []
50-
# html_logo = "_static/flax.png"
49+
html_static_path = ["_static"]
50+
html_css_files = ["css/custom.css"]
51+
html_logo = "_static/maxtext.png"
5152

5253
# -- Options for myst ----------------------------------------------
5354
myst_heading_anchors = 3 # auto-generate 3 levels of heading anchors
@@ -58,6 +59,18 @@
5859
]
5960
myst_linkify_fuzzy_links = False
6061

62+
# Theme-specific options
63+
# https://sphinx-book-theme.readthedocs.io/en/stable/reference.html
64+
html_theme_options = {
65+
"show_navbar_depth": 1,
66+
"show_toc_level": 1,
67+
"repository_url": "https://github.com/AI-Hypercomputer/maxtext",
68+
"path_to_docs": "docs/",
69+
"use_repository_button": True,
70+
"navigation_with_keys": True,
71+
"home_page_in_toc": True,
72+
}
73+
6174
# Remove specific documents from ToC
6275
exclude_patterns = [
6376
"guides/run_maxtext/run_maxtext_via_multihost_job.md",

0 commit comments

Comments
 (0)