AI-Hypercomputer
diff --git a/‎docs/build_maxtext.md‎
Lines changed: 137 additions & 0 deletions b/‎docs/build_maxtext.md‎
Lines changed: 137 additions & 0 deletions
diff --git a/‎docs/guides/data_input_pipeline/data_input_grain.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/guides/data_input_pipeline/data_input_grain.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/index.md‎
Lines changed: 12 additions & 6 deletions b/‎docs/index.md‎
Lines changed: 12 additions & 6 deletions
diff --git a/‎docs/install_maxtext.md‎
Lines changed: 8 additions & 24 deletions b/‎docs/install_maxtext.md‎
Lines changed: 8 additions & 24 deletions
diff --git a/‎docs/run_maxtext/run_maxtext_localhost.md‎
Lines changed: 1 addition & 16 deletions b/‎docs/run_maxtext/run_maxtext_localhost.md‎
Lines changed: 1 addition & 16 deletions
diff --git a/‎docs/run_maxtext/run_maxtext_single_host_gpu.md‎
Lines changed: 1 addition & 31 deletions b/‎docs/run_maxtext/run_maxtext_single_host_gpu.md‎
Lines changed: 1 addition & 31 deletions
diff --git a/‎docs/run_maxtext/run_maxtext_via_pathways.md‎
Lines changed: 2 additions & 22 deletions b/‎docs/run_maxtext/run_maxtext_via_pathways.md‎
Lines changed: 2 additions & 22 deletions
@@ -0,0 +1,137 @@
+<!--
+ Copyright 2023-2026 Google LLC
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      https://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
+
+# Build and Upload MaxText Docker Images
+
+This guide covers setting up a MaxText development environment and building container images for TPU and GPU workloads. These images can be used to run MaxText on GKE clusters with TPUs or GPUs, and are also required for running MaxText through XPK.
+
+## Prerequisites
+
+Before starting, ensure you have the following tools installed and configured:
+
+1. Environment Prep: Install and configure all [XPK prerequisites](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites).
+
+2. Docker Permissions: Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/) to run Docker without `sudo`.
+
+3. Artifact Registry Access: Authenticate with [Google Artifact Registry](https://docs.cloud.google.com/artifact-registry/docs/docker/authentication#gcloud-helper) for permission to push your images and other access.
+
+4. Authentication & Access: Run the following commands to authenticate your account and configure Docker:
+
+```bash
+# Authenticate your user account for gcloud CLI access
+gcloud auth login
+
+# Configure application default credentials for Docker and other tools
+gcloud auth application-default login
+
+# Configure Docker credentials and test your access
+gcloud auth configure-docker
+docker run hello-world
+```
+
+## Installation Modes
+
+We recommend building MaxText inside a Python virtual environment using `uv` for speed and dependency management.
+
+### Option 1: From PyPI (Recommended)
+
+This is the easiest way to get started with the latest stable version.
+
+```bash
+# Install uv, a fast Python package installer
+pip install uv
+
+# Create virtual environment
+export VENV_NAME=<your virtual env name> # e.g., docker_venv
+uv venv --python 3.12 --seed ${VENV_NAME?}
+source ${VENV_NAME?}/bin/activate
+
+# Install MaxText with the [runner] extra
+# This enables Docker image building and workload scheduling via XPK
+uv pip install maxtext[runner] --resolution=lowest
+```
+
+> **Note:** The `maxtext[runner]` extra includes all necessary dependencies for building MaxText Docker images and running workloads through XPK. It automatically installs XPK, so you do not need to install it separately to manage your clusters and workloads.
+
+### Option 2: From Source
+
+If you plan to contribute to MaxText or need the latest unreleased features, install from source.
+
+```bash
+# Clone the repository
+git clone https://github.com/AI-Hypercomputer/maxtext.git
+cd maxtext
+
+# Create virtual environment
+export VENV_NAME=<your virtual env name> # e.g., docker_venv
+uv venv --python 3.12 --seed ${VENV_NAME?}
+source ${VENV_NAME?}/bin/activate
+
+# Install MaxText with the [runner] extra in editable mode
+uv pip install .[runner] --resolution=lowest
+```
+
+> **Note:** The `maxtext[runner]` extra includes all necessary dependencies for building MaxText Docker images and running workloads through XPK. It automatically installs XPK, so you do not need to install it separately to manage your clusters and workloads.
+
+## Build MaxText Docker Image
+
+Select the appropriate build commands based on your hardware (`TPU` or `GPU`) and your specific workflow (`pre-training` or `post-training`). Each of these commands will generate a local Docker image named `maxtext_base_image`.
+
+### TPU Pre-Training Docker Image
+
+```bash
+# Option 1: Build with the stable versions of dependencies (default)
+build_maxtext_docker_image
+
+# Option 2: Build with latest nightly versions of jax/jaxlib
+build_maxtext_docker_image MODE=nightly
+
+# Option 3: Build with the specified jax/jaxlib version
+build_maxtext_docker_image MODE=nightly JAX_VERSION=$JAX_VERSION
+```
+
+### GPU Pre-Training Docker Image
+
+```bash
+# Option 1: Build with the stable versions of dependencies (default)
+build_maxtext_docker_image DEVICE=gpu
+
+# Option 2: Build with latest nightly versions of jax/jaxlib
+build_maxtext_docker_image DEVICE=gpu MODE=nightly
+
+# Option 3: Build with base image as `ghcr.io/nvidia/jax:base-2024-12-04`
+build_maxtext_docker_image DEVICE=gpu MODE=pinned
+
+# Option 4: Build with the specified jax/jaxlib version
+build_maxtext_docker_image DEVICE=gpu MODE=nightly JAX_VERSION=$JAX_VERSION
+```
+
+### TPU Post-Training Docker Image
+
+```bash
+# This build process takes approximately 10 to 15 minutes.
+build_maxtext_docker_image WORKFLOW=post-training
+```
+
+## Upload MaxText Docker Image to Artifact Registry
+
+> **Note:** You will need the [**Artifact Registry Writer**](https://docs.cloud.google.com/artifact-registry/docs/access-control#permissions) role to push Docker images to your project's Artifact Registry and to allow the cluster to pull them during workload execution. If you don't have this permission, contact your project administrator to grant you this role through "Google Cloud Console -> IAM -> Grant access".
+
+```bash
+# Make sure to replace <Docker Image Name> with your desired image name.
+export CLOUD_IMAGE_NAME=<Docker Image Name>
+upload_maxtext_docker_image CLOUD_IMAGE_NAME=${CLOUD_IMAGE_NAME?}
+```
@@ -34,7 +34,7 @@ Grain ensures determinism in data input pipelines by saving the pipeline's state
 
 1. Grain currently supports two data formats: [ArrayRecord](https://github.com/google/array_record) (random access) and [Parquet](https://arrow.apache.org/docs/python/parquet.html) (partial random-access through row groups). Only the ArrayRecord format supports the global shuffle mentioned above. For converting a dataset into ArrayRecord, see [Apache Beam Integration for ArrayRecord](https://github.com/google/array_record/tree/main/beam). Additionally, other random access data sources can be supported via a custom [data source](https://google-grain.readthedocs.io/en/latest/data_sources.html) class.
    - **Community Resource**: The MaxText community has created a [ArrayRecord Documentation](https://array-record.readthedocs.io/). Note: we appreciate the contribution from the community, but as of now it has not been verified by the MaxText or ArrayRecord developers yet.
-2. When the dataset is hosted on a Cloud Storage bucket, Grain can read it through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh). The script configures some parameters for the mount.
+2. When the dataset is hosted on a Cloud Storage bucket, Grain can read it through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/src/dependencies/scripts/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh). The script configures some parameters for the mount.
 
 ```sh
 bash tools/setup/setup_gcsfuse.sh \
 
@@ -17,7 +17,9 @@
 # MaxText
 
 ```{raw} html
-:file: index.html
+---
+file: index.html
+---
 ```
 
 :link: reference/api
@@ -26,18 +28,22 @@
 <section class="latest-news">
 
 ```{include} ../README.md
-:start-after: <!-- NEWS START -->
-:end-before: <!-- NEWS END -->
+---
+start-after: <!-- NEWS START -->
+end-before: <!-- NEWS END -->
+---
 ```
 
 </section>
 </div>
 
 ```{toctree}
-:maxdepth: 2
-:hidden:
-
+---
+maxdepth: 2
+hidden:
+---
 install_maxtext
+build_maxtext
 tutorials
 run_maxtext
 guides
 
@@ -17,7 +17,7 @@
 # Install MaxText
 
 This document discusses how to install MaxText. We recommend installing MaxText inside a Python virtual environment.
-MaxText offers three installation modes:
+MaxText offers following installation modes:
 
 1. maxtext[tpu]. Used for pre-training and decode on TPUs.
 2. maxtext[cuda12]. Used for pre-training and decode on GPUs.
@@ -37,18 +37,18 @@ uv venv --python 3.12 --seed maxtext_venv
 source maxtext_venv/bin/activate
 
 # 3. Install MaxText and its dependencies. Choose a single
-#      installation option from this list to fit your use case.
+# installation option from this list to fit your use case.
 
 # Option 1: Installing maxtext[tpu]
-uv pip install "maxtext[tpu]>=0.2.0" --resolution=lowest
+uv pip install maxtext[tpu] --resolution=lowest
 install_maxtext_tpu_github_deps
 
 # Option 2: Installing maxtext[cuda12]
-uv pip install "maxtext[cuda12]>=0.2.0" --resolution=lowest
+uv pip install maxtext[cuda12] --resolution=lowest
 install_maxtext_cuda12_github_dep
 
 # Option 3: Installing maxtext[tpu-post-train]
-uv pip install "maxtext[tpu-post-train]>=0.2.0" --resolution=lowest
+uv pip install maxtext[tpu-post-train] --resolution=lowest
 install_maxtext_tpu_post_train_extra_deps
 
 # Option 4: Installing maxtext[runner]
@@ -91,7 +91,7 @@ uv pip install -e .[tpu-post-train] --resolution=lowest
 install_maxtext_tpu_post_train_extra_deps
 
 # Option 4: Installing maxtext[runner]
-uv pip install .[runner] --resolution=lowest
+uv pip install -e .[runner] --resolution=lowest
 ```
 
 After installation, you can verify the package is available with `python3 -c "import maxtext"` and run training jobs with `python3 -m maxtext.trainers.pre_train.train ...`.
@@ -176,22 +176,6 @@ After generating the new requirements, you need to update the files in the MaxTe
 
 Finally, test that the new dependencies install correctly and that MaxText runs as expected.
 
-1. **Create a clean environment:** It's best to start with a fresh Python virtual environment.
-
-```bash
-uv venv --python 3.12 --seed maxtext_venv
-source maxtext_venv/bin/activate
-```
-
-2. **Run the setup script:** Execute `bash setup.sh` to install the new dependencies.
-
-```bash
-pip install uv
-# install the tpu package
-uv pip install -e .[tpu] --resolution=lowest
-# or install the gpu package by running the following line:
-# uv pip install -e .[cuda12] --resolution=lowest
-install_maxtext_github_deps
-```
+1. **Install MaxText and dependencies**: For instructions on installing MaxText on your VM, please refer to the [official documentation](https://maxtext.readthedocs.io/en/maxtext-v0.2.0/install_maxtext.html#from-source).
 
-3. **Run tests:** Run MaxText tests to ensure there are no regressions.
+2. **Verify the installation**: Run MaxText tests to ensure everything is working as expected with the newly installed dependencies and there are no regressions.
@@ -36,22 +36,7 @@ Local development on a single host TPU/GPU VM is a convenient way to run MaxText
 
 1. Create and SSH to the single host VM of your choice. You can use any available single host TPU, such as `v5litepod-8`, `v5p-8`, or `v4-8`. For GPUs, you can use `nvidia-h100-mega-80gb`, `nvidia-h200-141gb`, or `nvidia-b200`. For setting up a TPU VM, use the Cloud TPU documentation available at https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm. For a GPU setup, refer to the guide at https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus.
 
-2. Clone MaxText onto that VM.
-
-   ```bash
-   git clone https://github.com/google/maxtext.git
-   cd maxtext
-   ```
-
-3. Once you have cloned the repository, you have two primary options for setting up the necessary dependencies on your VM: Installing in a Python Environment, or building a Docker container. For single host workloads, we recommend to install dependencies in a python environment, and for multihost workloads we recommend the containerized approach.
-
-Within the root directory of the cloned repo, create a virtual environment and install dependencies and the pre-commit hook by running:
-
-```bash
-python3.12 -m venv ~/venv-maxtext
-source ~/venv-maxtext/bin/activate
-bash tools/setup/setup.sh DEVICE={tpu|gpu}
-```
+2. For instructions on installing MaxText on your VM, please refer to the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html).
 
 #### Run a Test Training Job
 
 
@@ -60,39 +60,9 @@ If you get the NVML Error: Please follow these instructions.
 
 https://stackoverflow.com/questions/72932940/failed-to-initialize-nvml-unknown-error-in-docker-after-few-hours
 
-## Install MaxText
-
-Clone MaxText:
-
-```bash
-git clone https://github.com/AI-Hypercomputer/maxtext.git
-```
-
 ## Build MaxText Docker image
 
-This builds a docker image called `maxtext_base_image`. You can retag to a different name.
-
-1. Check out the code changes:
-
-```bash
-cd maxtext
-```
-
-2. Run the following commands to build and push the docker image:
-
-```bash
-export LOCAL_IMAGE_NAME=<docker_image_name>
-sudo bash docker_build_dependency_image.sh DEVICE=gpu
-docker tag maxtext_base_image ${LOCAL_IMAGE_NAME?}
-docker push ${LOCAL_IMAGE_NAME?}
-```
-
-Note that when running `bash docker_build_dependency_image.sh DEVICE=gpu`, it
-uses `MODE=stable` by default. If you want to use other modes, you need to
-specify it explicitly:
-
-- using nightly mode: `bash docker_build_dependency_image.sh DEVICE=gpu MODE=nightly`
-- using pinned mode: `bash docker_build_dependency_image.sh DEVICE=gpu MODE=pinned`
+For instructions on building the MaxText Docker image, please refer to the [official documentation](https://maxtext.readthedocs.io/en/latest/build_maxtext.html).
 
 ## Test
 
 
@@ -35,27 +35,7 @@ Before you can run a MaxText workload, you must complete the following setup ste
 
 2. **Create a GKE cluster** configured for Pathways.
 
-3. **Build and upload a MaxText Docker image** to your project's Artifact Registry.
-
-   [Follow the steps to configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/) before running the commands below.
-
-   Step 1: Build the Docker image for a TPU device. This image contains MaxText and its dependencies.
-
-   ```shell
-   bash src/dependencies/scripts/docker_build_dependency_image.sh DEVICE=tpu MODE=stable
-   ```
-
-   Step 2: Configure Docker to authenticate with Google Cloud
-
-   ```shell
-   gcloud auth configure-docker
-   ```
-
-   Step 3: Upload the image to your project's registry. Replace `$USER_runner` with your desired image name.
-
-   ```shell
-   bash src/dependencies/scripts/docker_upload_runner.sh CLOUD_IMAGE_NAME=$USER_runner
-   ```
+3. **Build and upload a MaxText Docker image** to your project's Artifact Registry. For instructions on building and uploading the MaxText Docker image, please refer to the [official documentation](https://maxtext.readthedocs.io/en/latest/build_maxtext.html).
 
 ## 2. Environment configuration
 
@@ -76,7 +56,7 @@ export WORKLOAD_NODEPOOL_COUNT=1 # Number of TPU slices for your job
 export BUCKET_NAME="your-gcs-bucket-name"
 export RUN_NAME="maxtext-run-1"
 # The Docker image you pushed in the prerequisite step
-export DOCKER_IMAGE="gcr.io/${PROJECT?}/${USER}_runner"
+export DOCKER_IMAGE="gcr.io/${PROJECT?}/${CLOUD_IMAGE_NAME}"
 ```
 
 ## 3. Running a batch workload