Skip to content

Commit 6863f9b

Browse files
Merge pull request #3273 from AI-Hypercomputer:hengtaoguo-re2
PiperOrigin-RevId: 876399995
2 parents a4f874d + 87d406f commit 6863f9b

12 files changed

Lines changed: 12 additions & 12 deletions

File tree

docs/tutorials/posttraining/knowledge_distillation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ You can now fine-tune your smaller student model using supervised fine-tuning te
170170
Example command to run fine-tuning on a TPU v6e-8:
171171

172172
```bash
173-
python3 -m MaxText.sft_trainer src/maxtext/configs/post_train/sft.yml \
173+
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated src/maxtext/configs/post_train/sft.yml \
174174
run_name=${RUN_NAME} \
175175
base_output_directory=${BASE_DIRECTORY}/distillation/qwen3-32b-distill-llama3.1-8b \
176176
tokenizer_path=meta-llama/Llama-3.1-8B-Instruct tokenizer_type=huggingface \
@@ -209,7 +209,7 @@ largest_dir="${sorted_dirs[-1]}"
209209
FINE_TUNED_MODEL_CKPT_PATH=${CHECKPOINTS_PATH}/${largest_dir}/model_params
210210

211211
# Fine-tune student model on original dataset
212-
python3 -m MaxText.sft.sft_trainer src/maxtext/configs/post_train/sft.yml \
212+
python3 -m maxtext.trainers.post_train.sft.train_sft src/maxtext/configs/post_train/sft.yml \
213213
run_name=${RUN_NAME}_stage2 \
214214
base_output_directory=${BASE_DIRECTORY}/distillation/qwen3-32b-distill-llama3.1-8b \
215215
tokenizer_path=meta-llama/Llama-3.1-8B-Instruct tokenizer_type=huggingface \

docs/tutorials/posttraining/multimodal.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ Here, we use [ChartQA](https://huggingface.co/datasets/HuggingFaceM4/ChartQA) as
129129

130130
```shell
131131
export UNSCANNED_CKPT_PATH=... # either set to an already available MaxText ckpt or to the one we just converted in the previous step
132-
python -m MaxText.sft_trainer \
132+
python -m maxtext.trainers.post_train.sft.train_sft_deprecated \
133133
src/maxtext/configs/post_train/sft-vision-chartqa.yml \
134134
run_name="chartqa-sft" \
135135
model_name=gemma3-4b \

src/maxtext/examples/multimodal_gemma3_demo.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@
164164
"STEPS=10\n",
165165
"PER_DEVICE_BATCH_SIZE=1\n",
166166
"\n",
167-
"!python -m MaxText.sft_trainer \\\n",
167+
"!python -m maxtext.trainers.post_train.sft.train_sft_deprecated \\\n",
168168
" $MAXTEXT_CONFIGS_DIR/sft-vision-chartqa.yml \\\n",
169169
" run_name=$WORKLOAD_NAME \\\n",
170170
" model_name=$MODEL_NAME \\\n",

src/MaxText/sft_trainer.py renamed to src/maxtext/trainers/post_train/sft/train_sft_deprecated.py

File renamed without changes.

tests/end_to_end/tpu/deepseek/Run_DeepSeek.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
114114
One example command to run supervised finetuning with V3 on v5p-256. Supervised fine-tuning is only working with HuggingFace conversational datasets. And, you can customize the dataset path using the `hf_path` config and provide your access token with `hf_access_token` config.
115115

116116
```sh
117-
python3 -m MaxText.sft_trainer src/maxtext/configs/post_train/sft.yml \
117+
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated src/maxtext/configs/post_train/sft.yml \
118118
base_output_directory=${BASE_OUTPUT_DIRECTORY} \
119119
load_parameters_path=${CONVERTED_CHECKPOINT} \
120120
run_name=matmul_supervised_fine_tuning \

tests/end_to_end/tpu/gemma3/4b/test_gemma3_multimodal_sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ python3 -m maxtext.inference.decode "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:
4444

4545
# 3. SFT the MaxText converted checkpoint on ChartQA dataset
4646
export BASE_OUTPUT_DIRECTORY=${MODEL_BUCKET}/${MODEL_VARIATION}/unscanned/sft
47-
python -m MaxText.sft_trainer "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs}"//sft-vision-chartqa.yml \
47+
python -m maxtext.trainers.post_train.sft.train_sft_deprecated "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs}"//sft-vision-chartqa.yml \
4848
run_name=$idx \
4949
model_name=$MODEL_NAME tokenizer_path="google/gemma-3-4b-pt" \
5050
per_device_batch_size=1 \

tests/end_to_end/tpu/gpt_oss/120b/test_gpt_oss.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ python3 -m maxtext.trainers.pre_train.train "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_RE
6060
python3 -m maxtext.trainers.pre_train.train "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs}"//base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_fine_tuning model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_path=${DATASET_PATH} enable_checkpointing=true async_checkpointing=false load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=True attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=1 ici_expert_parallelism=32
6161

6262
# Run supervised fine-tuning - megablox implementation
63-
python3 -m MaxText.sft_trainer "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs/post_train}"//sft.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_supervised_fine_tuning model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_type=hf enable_checkpointing=true async_checkpointing=false load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=True attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=1 ici_expert_parallelism=32
63+
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs/post_train}"//sft.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_supervised_fine_tuning model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_type=hf enable_checkpointing=true async_checkpointing=false load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=True attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=1 ici_expert_parallelism=32
6464

6565
# Run decoding - megablox implementation
6666
# Note decode requires the access token for huggingface tokenizer even if the model is not gated

tests/end_to_end/tpu/gpt_oss/20b/test_gpt_oss.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ python3 -m maxtext.trainers.pre_train.train "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_RE
6464
python3 -m maxtext.trainers.pre_train.train "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs}"//base.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_fine_tuning model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_path=${DATASET_PATH} enable_checkpointing=true async_checkpointing=false load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=True attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=1 ici_expert_parallelism=4
6565

6666
# Run supervised fine-tuning - megablox implementation
67-
python3 -m MaxText.sft_trainer "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs/post_train}"//sft.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_supervised_fine_tuning model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_type=hf enable_checkpointing=true async_checkpointing=false load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=True attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=1 ici_expert_parallelism=4
67+
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs/post_train}"//sft.yml base_output_directory=${BASE_OUTPUT_PATH} run_name=megablox_supervised_fine_tuning model_name=${MODEL_NAME} tokenizer_type=huggingface tokenizer_path=${TOKENIZER_PATH} dataset_type=hf enable_checkpointing=true async_checkpointing=false load_parameters_path=${SCANNED_CKPT_PATH} scan_layers=True attention=flash sparse_matmul=True megablox=True dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=4 steps=5 max_target_length=1024 ici_fsdp_parallelism=1 ici_expert_parallelism=4
6868

6969
# Run decoding - megablox implementation
7070
# Note decode requires the access token for huggingface tokenizer even if the model is not gated

tests/end_to_end/tpu/gpt_oss/run_gpt_oss.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
110110
One example command to run supervised finetuning with gpt-oss-20b on v5p-8. Supervised finetuning is only working with HuggingFace conversational datasets. And, you can customize the dataset path using the `hf_path` config. If using [gated dataset](https://huggingface.co/docs/hub/en/datasets-gated) or [gated model](https://huggingface.co/docs/hub/en/models-gated), you need additionally provide the access token with `hf_access_token` config.
111111

112112
```sh
113-
python3 -m MaxText.sft_trainer src/maxtext/configs/post_train/sft.yml \
113+
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated src/maxtext/configs/post_train/sft.yml \
114114
base_output_directory=${BASE_OUTPUT_PATH} \
115115
run_name=megablox_supervised_fine_tuning \
116116
model_name=gpt-oss-20b \

tests/end_to_end/tpu/run_sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ fi
5959
echo "Running fine-tuning on checkpoint: ${PRE_TRAINED_MODEL_CKPT_PATH}"
6060

6161
# Run Supervised Fine-Tuning on MaxText checkpoint using HuggingFaceH4/ultrachat_200k dataset
62-
python3 -m MaxText.sft_trainer "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs/post_train}"//sft.yml \
62+
python3 -m maxtext.trainers.post_train.sft.train_sft_deprecated "${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs/post_train}"//sft.yml \
6363
run_name=${RUN_NAME} base_output_directory=${BASE_OUTPUT_DIRECTORY}/${PRE_TRAINED_MODEL} \
6464
model_name=${PRE_TRAINED_MODEL} load_parameters_path=${PRE_TRAINED_MODEL_CKPT_PATH} \
6565
hf_access_token=$HF_TOKEN tokenizer_path=${PRE_TRAINED_MODEL_TOKENIZER} \

0 commit comments

Comments
 (0)