Skip to content

Commit 9e786c8

Browse files
SurbhiJainUSCGoogle-ML-Automation
authored andcommitted
Remove explicit base config file from MaxText commands in documentation
PiperOrigin-RevId: 887010680
1 parent cc0d3ae commit 9e786c8

8 files changed

Lines changed: 19 additions & 21 deletions

File tree

docs/guides/data_input_pipeline/data_input_grain.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr
112112
bash tools/setup/setup_gcsfuse.sh \
113113
DATASET_GCS_BUCKET=maxtext-dataset \
114114
MOUNT_PATH=/tmp/gcsfuse && \
115-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
115+
python3 -m maxtext.trainers.pre_train.train \
116116
run_name=<RUN_NAME> base_output_directory=gs://<MY_BUCKET> \
117117
dataset_type=grain \
118118
grain_file_type=arrayrecord # or parquet \

docs/guides/monitoring_and_debugging/features_and_diagnostics.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ After installing the dependencies listed above, you are ready to compile ahead o
5656

5757
```sh
5858
# Run the below on a single machine, e.g. a CPU
59-
python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=v5e-256 compile_topology_num_slices=2 \
59+
python3 -m maxtext.trainers.pre_train.train_compile compile_topology=v5e-256 compile_topology_num_slices=2 \
6060
global_parameter_scale=16 per_device_batch_size=4
6161
```
6262

@@ -71,7 +71,7 @@ Here is an example that saves then loads the compiled `train_step`, starting wit
7171
```sh
7272
# Run the below on a single machine, e.g. a CPU
7373
export LIBTPU_INIT_ARGS="--xla_enable_async_all_gather=true"
74-
python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=v5e-256 \
74+
python3 -m maxtext.trainers.pre_train.train_compile compile_topology=v5e-256 \
7575
compile_topology_num_slices=2 \
7676
compiled_trainstep_file=my_compiled_train.pickle global_parameter_scale=16 \
7777
per_device_batch_size=4 steps=10000 learning_rate=1e-3
@@ -84,7 +84,7 @@ To load the compiled train_step, you just need to pass `compiled_trainstep_file=
8484
```sh
8585
# Run the below on each host of the target hardware, e.g. each host on 2 slices of v5e-256
8686
export LIBTPU_INIT_ARGS="--xla_enable_async_all_gather=true"
87-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=example_load_compile \
87+
python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \
8888
compiled_trainstep_file=my_compiled_train.pickle \
8989
global_parameter_scale=16 per_device_batch_size=4 steps=10000 learning_rate=1e-3 \
9090
base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket
@@ -109,7 +109,7 @@ This example illustrates the flags to use for a multihost GPU compilation target
109109
```sh
110110
# Run the below on a single A3 machine
111111
export XLA_FLAGS="--xla_gpu_enable_async_collectives=true"
112-
python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=a3 \
112+
python3 -m maxtext.trainers.pre_train.train_compile compile_topology=a3 \
113113
compile_topology_num_slices=4 \
114114
compiled_trainstep_file=my_compiled_train.pickle global_parameter_scale=16 \
115115
attention=dot_product per_device_batch_size=4 steps=10000 learning_rate=1e-3
@@ -122,7 +122,7 @@ To load the compiled `train_step`, you just need to pass `compiled_trainstep_fil
122122
```sh
123123
# Run the below on each of the 4 target A3 hosts.
124124
export XLA_FLAGS="--xla_gpu_enable_async_collectives=true"
125-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=example_load_compile \
125+
python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \
126126
compiled_trainstep_file=my_compiled_train.pickle \
127127
attention=dot_product global_parameter_scale=16 per_device_batch_size=4 steps=10000 learning_rate=1e-3 \
128128
base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket

docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
3535
1. Enable ML Diagnostics to just capture Maxtext metrics and configs
3636

3737
```
38-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
38+
python3 -m maxtext.trainers.pre_train.train \
3939
run_name=${USER}-tpu-job \
4040
base_output_directory="gs://your-output-bucket/" \
4141
dataset_path="gs://your-dataset-bucket/" \
@@ -47,7 +47,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
4747
2. Enable ML Diagnostics to capture Maxtext metrics, configs and singlehost profiles (on the first TPU device)
4848

4949
```
50-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
50+
python3 -m maxtext.trainers.pre_train.train \
5151
run_name=${USER}-tpu-job \
5252
base_output_directory="gs://your-output-bucket/" \
5353
dataset_path="gs://your-dataset-bucket/" \
@@ -60,7 +60,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
6060
3. Enable ML Diagnostics to capture Maxtext metrics, configs and multihost profiles (on all TPU devices)
6161

6262
```
63-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
63+
python3 -m maxtext.trainers.pre_train.train \
6464
run_name=${USER}-tpu-job \
6565
base_output_directory="gs://your-output-bucket/" \
6666
dataset_path="gs://your-dataset-bucket/" \

docs/guides/monitoring_and_debugging/monitor_goodput.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ Please use a unique workload name, unless you intend to monitor cumulative Goodp
8989
MaxText enables Goodput recording and monitoring by default with `enable_goodput_recording=True` and `monitor_goodput=True`. You can configure the goodput upload frequency by setting `goodput_upload_interval_seconds`.
9090

9191
```bash
92-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \
92+
python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} \
9393
dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30
9494
```
9595

@@ -98,7 +98,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_ou
9898
MaxText enables step time deviation monitoring by default with `monitor_step_time_deviation=True`. You can configure the upload frequency by setting `step_deviation_interval_seconds`.
9999

100100
```bash
101-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \
101+
python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} \
102102
dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 step_deviation_interval_seconds=30
103103
```
104104

@@ -111,7 +111,7 @@ Enabling `enable_pathways_goodput` turns on Goodput measurement for Pathways wor
111111
```
112112

113113
```bash
114-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
114+
python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
115115
run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_pathways_goodput=True
116116
```
117117

@@ -168,7 +168,7 @@ and `enable_gcp_step_deviation_metrics` to `False` for disabling step deviation
168168
metrics.
169169

170170
```bash
171-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
171+
python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
172172
run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_gcp_goodput_metrics=False \
173173
enable_gcp_step_deviation_metrics=False
174174
```

docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ When you run a training job, MaxText produces detailed output logs. This guide s
2323
To start, run a simple pretraining job on a single-host TPU. For instance, we can run the following command on TPU v5p-8. The resulting log is used as an example throughout this guide.
2424

2525
```bash
26-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
26+
python3 -m maxtext.trainers.pre_train.train \
2727
base_output_directory=gs://runner-maxtext-logs run_name=demo \
2828
model_name=deepseek2-16b \
2929
per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic enable_checkpointing=false
@@ -123,7 +123,7 @@ To generate all optional artifacts in one run, you can set the corresponding fla
123123
This command enables tensorboard, profiler, text metrics, config saving, and checkpointing:
124124

125125
```bash
126-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
126+
python3 -m maxtext.trainers.pre_train.train \
127127
base_output_directory=gs://runner-maxtext-logs run_name=demo2 \
128128
model_name=deepseek2-16b \
129129
per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic \

docs/reference/core_concepts/quantization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ Common options for the `quantization` flag when using Qwix include:
8787
Here is an example of how to run a training job with int8 quantization enabled via Qwix:
8888

8989
```bash
90-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
90+
python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
9191
```
9292

9393
#### The Qwix Interception API
@@ -142,7 +142,7 @@ When using AQT, you can pass one of the following values to the `quantization` f
142142
#### Example command for AQT
143143

144144
```bash
145-
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
145+
python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
146146
```
147147

148148
Note that `use_qwix_quantization` is not set to `True`.

docs/tutorials/inference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ We include a script for convenient offline inference of MaxText models in `src/m
6363
An example of how to run this script can be found below:
6464

6565
```bash
66-
python3 -m maxtext.inference.vllm_decode src/maxtext/configs/base.yml \
66+
python3 -m maxtext.inference.vllm_decode \
6767
model_name=qwen3-30b-a3b \
6868
tokenizer_path=Qwen/Qwen3-30B-A3B \
6969
load_parameters_path=$CHECKPOINT_PATH \
@@ -133,7 +133,7 @@ curl http://localhost:8000/v1/completions \
133133
To use a MaxText model architecture for samplers in reinforcement learning algorithms like GRPO, we can override the vLLM model architecture and pass in MaxText specific config arguments similar to the [online inference](online-inference) use-case. An example of an RL command using the MaxText model for samplers can be found below:
134134

135135
```bash
136-
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml \
136+
python3 -m src.maxtext.trainers.post_train.rl.train_rl \
137137
model_name=qwen3-0.6b \
138138
tokenizer_path=Qwen/Qwen3-0.6B \
139139
run_name=$WORKLOAD \

docs/tutorials/posttraining/multimodal.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,6 @@ To run a forward pass and verify the model's output, use the following command:
7373
```shell
7474
# Gemma3 decode
7575
python -m maxtext.inference.decode \
76-
maxtext/configs/base.yml \
7776
model_name=gemma3-4b \
7877
hf_access_token=${HF_ACCESS_TOKEN?} \
7978
tokenizer_path=src/maxtext/assets/tokenizers/tokenizer.gemma3 \
@@ -109,7 +108,6 @@ export TARGET_LENGTH=... # Adjust to fit expected output length
109108
export PREDICT_LENGTH=... # Adjust to fit image tokens + text prompt
110109
111110
python -m maxtext.inference.decode \
112-
maxtext/configs/base.yml \
113111
model_name=gemma3-4b \
114112
... \
115113
max_prefill_predict_length=${PREDICT_LENGTH?} # Adjust to fit image tokens + text prompt \

0 commit comments

Comments
 (0)