Skip to content

Commit 1d29ce1

Browse files
Merge pull request #2989 from AI-Hypercomputer:docs_fix
PiperOrigin-RevId: 859721293
2 parents 8204907 + f02444f commit 1d29ce1

4 files changed

Lines changed: 11 additions & 3 deletions

File tree

docs/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@
9797
"cloud_tpu_diagnostics",
9898
"google_cloud_mldiagnostics",
9999
"jetstream",
100+
"librosa",
100101
"ml_goodput_measurement",
101102
"pathwaysutils",
102103
"safetensors",
@@ -178,6 +179,7 @@ def run_apidoc(_):
178179
os.path.join(MAXTEXT_REPO_ROOT, "src", "MaxText", "scratch_code"),
179180
os.path.join(MAXTEXT_REPO_ROOT, "src", "MaxText", "utils", "ckpt_conversion"),
180181
os.path.join(MAXTEXT_REPO_ROOT, "src", "MaxText", "rl"),
182+
os.path.join(MAXTEXT_REPO_ROOT, "src", "MaxText", "multimodal_utils.py"),
181183
]
182184

183185
# Run the command and check for errors

docs/tutorials/posttraining/sft.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
-->
1616

1717
# SFT on single-host TPUs
18+
1819
Supervised fine-tuning (SFT) is a process where a pre-trained large language model is fine-tuned on a labeled dataset to adapt the model to perform better on specific tasks.
1920

2021
This tutorial demonstrates step-by-step instructions for setting up the environment and then training the model on a Hugging Face dataset using SFT.
@@ -64,16 +65,19 @@ export TRAIN_DATA_COLUMNS=<data columns to train on> # e.g., ['messages']
6465
```
6566

6667
## Get your model checkpoint
68+
6769
This section explains how to prepare your model checkpoint for use with MaxText. You have two options: using an existing MaxText checkpoint or converting a Hugging Face checkpoint.
6870

6971
### Option 1: Using an existing MaxText checkpoint
72+
7073
If you already have a MaxText-compatible model checkpoint, simply set the following environment variable and move on to the next section.
7174

7275
```sh
7376
export PRE_TRAINED_MODEL_CKPT_PATH=<gcs path for MaxText checkpoint> # e.g., gs://my-bucket/my-model-checkpoint/0/items
7477
```
7578

7679
### Option 2: Converting a Hugging Face checkpoint
80+
7781
If your model checkpoint is from Hugging Face, you need to run a conversion script to make it MaxText-compatible.
7882

7983
1. **Set the Output Path:** First, define where the converted MaxText checkpoint will be saved. For example:
@@ -101,6 +105,7 @@ export PRE_TRAINED_MODEL_CKPT_PATH=${PRE_TRAINED_MODEL_CKPT_DIRECTORY}/0/items
101105
```
102106

103107
## Run SFT on Hugging Face Dataset
108+
104109
Now you are ready to run SFT using the following command:
105110

106111
```sh
@@ -118,4 +123,5 @@ python3 -m MaxText.sft.sft_trainer src/MaxText/configs/sft.yml \
118123
train_data_columns=${TRAIN_DATA_COLUMNS} \
119124
profiler=xplane
120125
```
126+
121127
Your fine-tuned model checkpoints will be saved here: `$BASE_OUTPUT_DIRECTORY/$RUN_NAME/checkpoints`.

src/MaxText/maxengine.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"""Implementation of Engine API for MaxText."""
1616

1717
from collections import defaultdict
18-
from typing import Any, Callable
18+
from typing import Any, Callable, Union
1919
import functools
2020
import os.path
2121
import uuid
@@ -102,7 +102,7 @@ class MaxEngine(engine_api.Engine):
102102
JetStream efficient serving infrastructure.
103103
"""
104104

105-
def __init__(self, config: Any, devices: config_lib.Devices | None = None):
105+
def __init__(self, config: Any, devices: Union[config_lib.Devices, None] = None):
106106
self.config = config
107107

108108
# Mesh definition

src/MaxText/multimodal/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
class PreprocessorOutput:
2828
"""Holds the output of an image preprocessor.
2929
30-
Attributes:
30+
Args:
3131
pixel_values: A JAX array containing the processed image pixel data.
3232
The shape and format depend on the specific model and
3333
preprocessing steps (e.g., [H, W, C] for Gemma3 or

0 commit comments

Comments
 (0)