|
| 1 | +<!-- |
| 2 | + Copyright 2023-2026 Google LLC |
| 3 | +
|
| 4 | + Licensed under the Apache License, Version 2.0 (the "License"); |
| 5 | + you may not use this file except in compliance with the License. |
| 6 | + You may obtain a copy of the License at |
| 7 | +
|
| 8 | + https://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +
|
| 10 | + Unless required by applicable law or agreed to in writing, software |
| 11 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | + See the License for the specific language governing permissions and |
| 14 | + limitations under the License. |
| 15 | + --> |
| 16 | + |
| 17 | +# Build and Upload MaxText Docker Images |
| 18 | + |
| 19 | +This guide covers setting up a MaxText development environment and building container images for TPU and GPU workloads. These images can be used to run MaxText on GKE clusters with TPUs or GPUs, and are also required for running MaxText through XPK. |
| 20 | + |
| 21 | +## Prerequisites |
| 22 | + |
| 23 | +Before starting, ensure you have the following tools installed and configured: |
| 24 | + |
| 25 | +1. Environment Prep: Install and configure all [XPK prerequisites](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites). |
| 26 | + |
| 27 | +2. Docker Permissions: Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/) to run Docker without `sudo`. |
| 28 | + |
| 29 | +3. Artifact Registry Access: Authenticate with [Google Artifact Registry](https://docs.cloud.google.com/artifact-registry/docs/docker/authentication#gcloud-helper) for permission to push your images and other access. |
| 30 | + |
| 31 | +4. Authentication & Access: Run the following commands to authenticate your account and configure Docker: |
| 32 | + |
| 33 | +```bash |
| 34 | +# Authenticate your user account for gcloud CLI access |
| 35 | +gcloud auth login |
| 36 | + |
| 37 | +# Configure application default credentials for Docker and other tools |
| 38 | +gcloud auth application-default login |
| 39 | + |
| 40 | +# Configure Docker credentials and test your access |
| 41 | +gcloud auth configure-docker |
| 42 | +docker run hello-world |
| 43 | +``` |
| 44 | + |
| 45 | +## Installation Modes |
| 46 | + |
| 47 | +We recommend building MaxText inside a Python virtual environment using `uv` for speed and dependency management. |
| 48 | + |
| 49 | +### Option 1: From PyPI (Recommended) |
| 50 | + |
| 51 | +This is the easiest way to get started with the latest stable version. |
| 52 | + |
| 53 | +```bash |
| 54 | +# Install uv, a fast Python package installer |
| 55 | +pip install uv |
| 56 | + |
| 57 | +# Create virtual environment |
| 58 | +export VENV_NAME=<your virtual env name> # e.g., docker_venv |
| 59 | +uv venv --python 3.12 --seed ${VENV_NAME?} |
| 60 | +source ${VENV_NAME?}/bin/activate |
| 61 | + |
| 62 | +# Install MaxText with the [runner] extra |
| 63 | +# This enables Docker image building and workload scheduling via XPK |
| 64 | +uv pip install maxtext[runner] --resolution=lowest |
| 65 | +``` |
| 66 | + |
| 67 | +> **Note:** The `maxtext[runner]` extra includes all necessary dependencies for building MaxText Docker images and running workloads through XPK. It automatically installs XPK, so you do not need to install it separately to manage your clusters and workloads. |
| 68 | +
|
| 69 | +### Option 2: From Source |
| 70 | + |
| 71 | +If you plan to contribute to MaxText or need the latest unreleased features, install from source. |
| 72 | + |
| 73 | +```bash |
| 74 | +# Clone the repository |
| 75 | +git clone https://github.com/AI-Hypercomputer/maxtext.git |
| 76 | +cd maxtext |
| 77 | + |
| 78 | +# Create virtual environment |
| 79 | +export VENV_NAME=<your virtual env name> # e.g., docker_venv |
| 80 | +uv venv --python 3.12 --seed ${VENV_NAME?} |
| 81 | +source ${VENV_NAME?}/bin/activate |
| 82 | + |
| 83 | +# Install MaxText with the [runner] extra in editable mode |
| 84 | +uv pip install .[runner] --resolution=lowest |
| 85 | +``` |
| 86 | + |
| 87 | +> **Note:** The `maxtext[runner]` extra includes all necessary dependencies for building MaxText Docker images and running workloads through XPK. It automatically installs XPK, so you do not need to install it separately to manage your clusters and workloads. |
| 88 | +
|
| 89 | +## Build MaxText Docker Image |
| 90 | + |
| 91 | +Select the appropriate build commands based on your hardware (`TPU` or `GPU`) and your specific workflow (`pre-training` or `post-training`). Each of these commands will generate a local Docker image named `maxtext_base_image`. |
| 92 | + |
| 93 | +### TPU Pre-Training Docker Image |
| 94 | + |
| 95 | +```bash |
| 96 | +# Option 1: Build with the stable versions of dependencies (default) |
| 97 | +build_maxtext_docker_image |
| 98 | + |
| 99 | +# Option 2: Build with latest nightly versions of jax/jaxlib |
| 100 | +build_maxtext_docker_image MODE=nightly |
| 101 | + |
| 102 | +# Option 3: Build with the specified jax/jaxlib version |
| 103 | +build_maxtext_docker_image MODE=nightly JAX_VERSION=$JAX_VERSION |
| 104 | +``` |
| 105 | + |
| 106 | +### GPU Pre-Training Docker Image |
| 107 | + |
| 108 | +```bash |
| 109 | +# Option 1: Build with the stable versions of dependencies (default) |
| 110 | +build_maxtext_docker_image DEVICE=gpu |
| 111 | + |
| 112 | +# Option 2: Build with latest nightly versions of jax/jaxlib |
| 113 | +build_maxtext_docker_image DEVICE=gpu MODE=nightly |
| 114 | + |
| 115 | +# Option 3: Build with base image as `ghcr.io/nvidia/jax:base-2024-12-04` |
| 116 | +build_maxtext_docker_image DEVICE=gpu MODE=pinned |
| 117 | + |
| 118 | +# Option 4: Build with the specified jax/jaxlib version |
| 119 | +build_maxtext_docker_image DEVICE=gpu MODE=nightly JAX_VERSION=$JAX_VERSION |
| 120 | +``` |
| 121 | + |
| 122 | +### TPU Post-Training Docker Image |
| 123 | + |
| 124 | +```bash |
| 125 | +# This build process takes approximately 10 to 15 minutes. |
| 126 | +build_maxtext_docker_image WORKFLOW=post-training |
| 127 | +``` |
| 128 | + |
| 129 | +## Upload MaxText Docker Image to Artifact Registry |
| 130 | + |
| 131 | +> **Note:** You will need the [**Artifact Registry Writer**](https://docs.cloud.google.com/artifact-registry/docs/access-control#permissions) role to push Docker images to your project's Artifact Registry and to allow the cluster to pull them during workload execution. If you don't have this permission, contact your project administrator to grant you this role through "Google Cloud Console -> IAM -> Grant access". |
| 132 | +
|
| 133 | +```bash |
| 134 | +# Make sure to replace <Docker Image Name> with your desired image name. |
| 135 | +export CLOUD_IMAGE_NAME=<Docker Image Name> |
| 136 | +upload_maxtext_docker_image CLOUD_IMAGE_NAME=${CLOUD_IMAGE_NAME?} |
| 137 | +``` |
0 commit comments