You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/posttraining/full_finetuning.md
+76-27Lines changed: 76 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,21 +31,44 @@ The high level steps involve:
31
31
a disk or gcs bucket. Or it can also input data directly from the hugging face
32
32
dataset.
33
33
- Running the training script with the checkpoint
34
-
- Note: You may need to change the training parameters to fit the model to the
35
-
TPU or GPU shape and to obtain an optimized performance.
34
+
- Note: Training parameters may require adjustment to align the model with the specific TPU or GPU topology and achieve optimal performance.
36
35
37
36
## MaxText checkpoints
38
37
39
38
MaxText checkpoints are in their own format. You can see the format in the script for llama conversion script.
40
39
40
+
### Meta's PyTorch checkpoint to Maxtext (Orbax) checkpoint
41
+
41
42
The conversion scripts for LLama work with Meta’s original checkpoints and not with HuggingFace Checkpoint.
42
43
43
-
E.g.
44
+
#### Pre-requist
45
+
- Download the Meta format checkpoints
46
+
47
+
Option 1: Download the checkpoint from Meta (https://llama.meta.com/llama-downloads/) in your local directory.
48
+
49
+
Option 2: Download the checkpoint from a GCS Bucket to a local directoty with command ```gcloud storage cp -r <GCS path for META format checkpoint> <local/path>``` .
50
+
51
+
- Install Torch CPU because TPU or GPU is not required in this convertion script.
The conversion scripts do not use accelerators but need large host memory to perform the conversion.
51
74
@@ -54,48 +77,74 @@ The conversion scripts do not use accelerators but need large host memory to per
54
77
- For large size model (e.g. 70B model), this script requires large memory VM.
55
78
- The script load and save weights in a single pass.
56
79
57
-
### Sample full fine tuning script
80
+
### MaxText checkpoint to Hugging Face
58
81
59
-
Below is a sample training script for LLama2-7b.
82
+
Post finetuning or pre-training, MaxText also provides scripts to convert MaxText format weights back to [Hugging Face](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_scripts/llama_mistral_mixtral_orbax_to_hf.py).
60
83
61
-
```bash
62
-
python3 -m MaxText.train \
63
-
src/MaxText/configs/base.yml \
64
-
run_name="llama2-finetune-maxtext" \
65
-
base_output_directory=${output_directory} \
66
-
load_parameters_path=${path_to_checkpoint} \
67
-
model_name='llama2-7b' \
68
-
dataset_path=${dataset_path} \
69
-
async_checkpointing=False \
70
-
model_name='llama2-7b' \
71
-
steps=10 per_device_batch_size=.25
72
-
```
84
+
#### Sample for coverting Maxtext format weight to Hugging Face format
73
85
74
-
You can find some [end to end scripts here](https://github.com/AI-Hypercomputer/maxtext/tree/main/end_to_end/tpu).
75
-
These scripts can provide a reference point for various scripts.
86
+
- Setup Environment Variables
76
87
77
-
### MaxText checkpoint to Hugging Face
88
+
```bash
89
+
export BASE_OUTPUT_DIRECTORY=<output directory to store run logs># e.g., gs://my-bucket/my-output-directory
90
+
export PATH_TO_CHECKPOINT=<GCS path for saving converted checkpoint>/0/items # e.g., ${CONVERTED_CHECKPOINT_PATH}/0/items
91
+
export HF_MODLE_PATH=<local path for hf># e.g., /local/convert_ckp
92
+
```
93
+
- Running the conversion script
78
94
79
-
Post finetuning or pre-training, MaxText also provides scripts to convert MaxText format weights back to [Hugging Face](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_scripts/llama_mistral_mixtral_orbax_to_hf.py).
95
+
The following example is executing a v6e-8 TPU VM with llama2-7b.
MaxText provides examples to work with [Common Crawl](https://commoncrawl.org/). The dataset is available in TFRecords format in a cloud bucket. MaxText provides scripts to copy the dataset to a Google Cloud Storage Bucket.
84
111
85
112
##### Common Crawl (c4) dataset setup
86
113
87
-
You need to run these steps once per project prior to any local development or cluster experiments.
114
+
Run these steps once per project prior to any local development or cluster experiments.
88
115
89
116
1. Create two gcs buckets in your project, one for downloading and retrieving the dataset and the other for storing the logs.
90
117
2. Download the dataset in your gcs bucket
91
118
92
119
MaxText assumes these GCS buckets are created in the same project and that it has permissions to read and write from them:
0 commit comments