Skip to content

Commit 80c7e07

Browse files
author
Sharon Yu
committed
update doc
1 parent 50f0082 commit 80c7e07

1 file changed

Lines changed: 23 additions & 17 deletions

File tree

docs/tutorials/posttraining/full_finetuning.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,12 @@ install_maxtext_github_deps
4444
```sh
4545
# -- Model configuration --
4646
export MODEL_NAME=<model name> # e.g., 'llama2-7b'
47+
export MODEL_TOKENIZER=<tokenizer path> # e.g., 'meta-llama/Llama-3.1-8B-Instruct'
4748
export HF_TOKEN=<Hugging Face access token>
4849

4950
# -- MaxText configuration --
5051
export BASE_OUTPUT_DIRECTORY=<output directory to store run logs> # e.g., gs://my-bucket/my-output-directory
52+
export RUN_NAME=<name for this run> # e.g., $(date +%Y-%m-%d-%H-%M-%S)
5153
```
5254

5355
## Hugging Face checkpoint to Maxtext checkpoint
@@ -80,20 +82,6 @@ python3 -m MaxText.utils.ckpt_conversion.to_maxtext src/MaxText/configs/base.yml
8082
base_output_directory=${MODEL_CKPT_DIRECTORY} \
8183
scan_layers=True skip_jax_distributed_system=True
8284
```
83-
## MaxText checkpoint to Hugging Face checkpoint
84-
85-
Use the `to_huggingface.py` script to convert a MaxText checkpoint into the Hugging Face format. This is useful for sharing your models or integrating them with the Hugging Face ecosystem.
86-
87-
```sh
88-
python3 -m MaxText.utils.ckpt_conversion.to_huggingface src/MaxText/configs/base.yml \
89-
model_name=${MODEL_NAME} \
90-
load_parameters_path=${MODEL_CKPT_PATH}$ \
91-
base_output_directory=${BASE_OUTPUT_DIRECTORY} \
92-
scan_layers=false \
93-
use_multimodal=false \
94-
hf_access_token=${HF_TOKEN} \
95-
weight_dtype=bfloat16
96-
```
9785
## Dataset
9886

9987
MaxText provides examples to work with [Common Crawl](https://commoncrawl.org/). The dataset is available in TFRecords format in a cloud bucket. MaxText provides scripts to copy the dataset to a Google Cloud Storage Bucket.
@@ -118,17 +106,35 @@ The above will download the c4 dataset to the GCS BUCKET.
118106

119107
## Sample Full Fine tuning script
120108

121-
Below is a sample training script for LLama2-7b on v6e-8 TPU VM.
109+
Below is a sample training script with an existing MaxText checkpoint (Option 1: Using an existing MaxText checkpoint).
122110

123111
```sh
124112
python3 -m MaxText.train \
125113
src/MaxText/configs/base.yml \
126-
run_name="llama2-finetune-maxtext" \
114+
run_name=${RUN_NAME} \
127115
base_output_directory=${BASE_OUTPUT_DIRECTORY} \
128116
load_parameters_path=${MODEL_CKPT_PATH} \
129-
model_name='llama2-7b' \
117+
model_name=${MODEL_NAME} \
130118
dataset_path=${DATASET_GCS_BUCKET} \
131119
async_checkpointing=False \
120+
tokenizer_path=${MODEL_TOKENIZER} \
121+
hf_access_token=${HF_TOKEN} \
122+
steps=10 per_device_batch_size=1
123+
```
124+
125+
Below is a sample training script with a converted a Hugging Face checkpoint (Option 2: Converting a Hugging Face checkpoint).
126+
127+
```sh
128+
python3 -m MaxText.train \
129+
src/MaxText/configs/base.yml \
130+
run_name=${RUN_NAME} \
131+
base_output_directory=${BASE_OUTPUT_DIRECTORY} \
132+
load_parameters_path=${MODEL_CKPT_DIRECTORY}/0/items \
133+
model_name=${MODEL_NAME} \
134+
dataset_path=${DATASET_GCS_BUCKET} \
135+
async_checkpointing=False \
136+
tokenizer_path=${MODEL_TOKENIZER} \
137+
hf_access_token=${HF_TOKEN} \
132138
steps=10 per_device_batch_size=1
133139
```
134140

0 commit comments

Comments
 (0)