AI-Hypercomputer
diff --git a/‎README.md‎
Lines changed: 11 additions & 1 deletion b/‎README.md‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎docs/profiling.md‎
Lines changed: 34 additions & 0 deletions b/‎docs/profiling.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎src/maxdiffusion/configs/base14.yml‎
Lines changed: 5 additions & 0 deletions b/‎src/maxdiffusion/configs/base14.yml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎src/maxdiffusion/configs/base21.yml‎
Lines changed: 6 additions & 1 deletion b/‎src/maxdiffusion/configs/base21.yml‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎src/maxdiffusion/configs/base_2_base.yml‎
Lines changed: 5 additions & 0 deletions b/‎src/maxdiffusion/configs/base_2_base.yml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎src/maxdiffusion/configs/base_flux_dev.yml‎
Lines changed: 4 additions & 0 deletions b/‎src/maxdiffusion/configs/base_flux_dev.yml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎src/maxdiffusion/configs/base_flux_dev_multi_res.yml‎
Lines changed: 4 additions & 0 deletions b/‎src/maxdiffusion/configs/base_flux_dev_multi_res.yml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎src/maxdiffusion/configs/base_flux_schnell.yml‎
Lines changed: 6 additions & 1 deletion b/‎src/maxdiffusion/configs/base_flux_schnell.yml‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎src/maxdiffusion/configs/base_wan_14b.yml‎
Lines changed: 6 additions & 1 deletion b/‎src/maxdiffusion/configs/base_wan_14b.yml‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎src/maxdiffusion/configs/base_wan_1_3b.yml‎
Lines changed: 5 additions & 0 deletions b/‎src/maxdiffusion/configs/base_wan_1_3b.yml‎
Lines changed: 5 additions & 0 deletions
@@ -17,6 +17,7 @@
 [![Unit Tests](https://github.com/AI-Hypercomputer/maxdiffusion/actions/workflows/UnitTests.yml/badge.svg)](https://github.com/AI-Hypercomputer/maxdiffusion/actions/workflows/UnitTests.yml)
 
 # What's new?
+- **`2026/04/16`**: Support for Tokamax Ring Attention kernel is now added.
 - **`2026/03/31`**: Wan2.2 SenCache inference is now supported for T2V and I2V (up to 1.4x speedup)
 - **`2026/03/25`**: Wan2.1 and Wan2.2 Magcache inference is now supported
 - **`2026/03/25`**: LTX-2 Video Inference is now supported
@@ -535,6 +536,12 @@ To generate images, run the following command:
 
   Supports both Text2Vid and Img2Vid pipelines.
 
+  **Note**: The product of per_device_batch_size and num_devices must be equal to a whole number.
+
+  The below command uses 4 devices and a per_device_batch_size=0.25. Thus, 4 * 0.25 = 1. This will generate a single video. Setting per_device_batch_size to 0.5, will generate 2 videos and so on.
+
+  If using 8 devices, then per_device_batch_size=0.125 will generate 1 video, per_device_batch_size=0.25 generates 2 videos.
+
   The following command will run Wan2.1 T2V:
 
   ```bash
@@ -553,7 +560,7 @@ To generate images, run the following command:
   width=1280 \
   height=720 \
   jax_cache_dir=gs://jfacevedo-maxdiffusion/jax_cache/ \
-  per_device_batch_size=.125 \
+  per_device_batch_size=.0.25 \
   ici_data_parallelism=2 \
   ici_context_parallelism=2 \
   flow_shift=5.0 \
@@ -790,3 +797,6 @@ This script will automatically format your code with `pyink` and help you identi
 
 
 The full suite of -end-to end tests is in `tests` and `src/maxdiffusion/tests`. We run them with a nightly cadance.
+
+## Profiling
+To learn how to enable ML Diagnostics and XProf profiling for your runs, please see our [ML Diagnostics Guide](docs/profiling.md).
@@ -0,0 +1,34 @@
+# ML Diagnostics and Profiling
+
+MaxDiffusion supports automated profiling and performance tracking via [Google Cloud ML Diagnostics](https://docs.cloud.google.com/tpu/docs/ml-diagnostics/sdk).
+
+## 1. Manual Installation
+To keep the core MaxDiffusion repository lightweight and ensure it runs without dependencies for users who don't need profiling, the ML Diagnostics packages are **not** installed by default.
+
+To use this feature, you must manually install the required package in your environment:
+```bash
+pip install google-cloud-mldiagnostics
+```
+
+## 2. Configuration Settings
+To enable ML Diagnostics for your training or generation jobs, you need to update your configuration. You can either add these directly to your .yml config file or pass them as command-line arguments:
+
+```yaml
+# ML Diagnostics settings
+enable_ml_diagnostics: True
+profiler_gcs_path: "gs://<your-bucket-name>/profiler/ml_diagnostics"
+enable_ondemand_xprof: True
+```
+
+## 3. GCS Bucket Permissions (Troubleshooting)
+The GCS bucket you provide in `profiler_gcs_path` **must** have the correct IAM permissions to allow the Hypercompute Cluster service account to write data.
+
+If permissions are not configured correctly, your job will fail with an error similar to this:
+> `message: 'service-32478767326@gcp-sa-hypercomputecluster.iam.gserviceaccount.com does not have storage.buckets.get access to the GCS bucket <your-bucket>: permission denied'`
+
+**Fix:** Ensure you grant the required Storage roles (e.g., `Storage Object Admin`) to the service account mentioned in your error message for your specific GCS bucket.
+
+## 4. Viewing Your Runs
+Once your job is running with diagnostics enabled, you can monitor the profiles, execution times, and metrics in the Cluster Director console here:
+
+🔗 **https://pantheon.corp.google.com/cluster-director/diagnostics**
@@ -247,3 +247,8 @@ quantization: ''
 quantization_local_shard_count: -1
 use_qwix_quantization: False 
 compile_topology_num_slices: -1 # Number of target slices, set to a positive integer.
+
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False
@@ -247,4 +247,9 @@ quantization: ''
 # Shard the range finding operation for quantization. By default this is set to number of slices.
 quantization_local_shard_count: -1
 compile_topology_num_slices: -1 # Number of target slices, set to a positive integer.
-use_qwix_quantization: False 
+use_qwix_quantization: False 
+
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False
@@ -263,3 +263,8 @@ quantization: ''
 quantization_local_shard_count: -1
 use_qwix_quantization: False 
 compile_topology_num_slices: -1 # Number of target slices, set to a positive integer.
+
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False
@@ -306,3 +306,7 @@ quantization_local_shard_count: -1
 use_qwix_quantization: False 
 compile_topology_num_slices: -1 # Number of target slices, set to a positive integer.
 
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False
@@ -291,3 +291,7 @@ quantization_local_shard_count: -1
 use_qwix_quantization: False 
 compile_topology_num_slices: -1 # Number of target slices, set to a positive integer.
 
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False
@@ -300,4 +300,9 @@ quantization_local_shard_count: -1
 use_qwix_quantization: False 
 compile_topology_num_slices: -1 # Number of target slices, set to a positive integer.
 
-save_final_checkpoint: False
+save_final_checkpoint: False
+
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False
@@ -409,4 +409,9 @@ eval_data_dir: ""
 enable_generate_video_for_eval: False # This will increase the used TPU memory.
 eval_max_number_of_samples_in_bucket: 60 # The number of samples per bucket for evaluation. This is calculated by num_eval_samples / len(timesteps_list).
 
-enable_ssim: False
+enable_ssim: False
+
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False
@@ -350,3 +350,8 @@ enable_generate_video_for_eval: False # This will increase the used TPU memory.
 eval_max_number_of_samples_in_bucket: 60 # The number of samples per bucket for evaluation. This is calculated by num_eval_samples / len(timesteps_list).
 
 enable_ssim: False
+
+# ML Diagnostics settings
+enable_ml_diagnostics: False
+profiler_gcs_path: ""
+enable_ondemand_xprof: False