Skip to content

Commit 25a1558

Browse files
committed
Document ulysses attention for wan inference
1 parent 10451b0 commit 25a1558

1 file changed

Lines changed: 20 additions & 1 deletion

File tree

README.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -572,6 +572,26 @@ To generate images, run the following command:
572572
* For Wan2.2 T2V, use `base_wan_27b.yml`.
573573
* For Wan2.2 I2V, use `base_wan_i2v_27b.yml`.
574574

575+
### Ulysses Attention
576+
577+
MaxDiffusion supports Ulysses attention for WAN TPU inference. Enable it by setting `attention="ulysses"`.
578+
579+
Internally, this follows the Ulysses sequence-parallel attention pattern and trades sequence shards for head shards around the local TPU splash kernel. For background, see [DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models](https://arxiv.org/abs/2309.14509).
580+
581+
To enable Ulysses attention, set the corresponding override in your config YAML or pass it as a command-line override:
582+
583+
```bash
584+
python src/maxdiffusion/generate_wan.py \
585+
src/maxdiffusion/configs/base_wan_i2v_27b.yml \
586+
attention="ulysses" \
587+
ici_context_parallelism=4 \
588+
...
589+
```
590+
591+
Ulysses requires `ici_context_parallelism` greater than 1, and the number of attention heads must be divisible by the context shard count. `flash_block_sizes` tuning is optional and can still be used for hardware-specific tuning.
592+
593+
In our Wan2.2 I2V benchmarks at 40 inference steps, 81 frames, and `720x1280` resolution, Ulysses improved inference time by roughly `~10%` compared with flash attention, with about `~20s` lower latency on the v6e-8 and v7x-8 TPU setup.
594+
575595
### Caching Mechanisms
576596

577597
Wan 2.x pipelines support several caching strategies to accelerate inference by skipping redundant transformer forward passes. These are **mutually exclusive** — enable only one at a time.
@@ -772,4 +792,3 @@ This script will automatically format your code with `pyink` and help you identi
772792
773793
774794
The full suite of -end-to end tests is in `tests` and `src/maxdiffusion/tests`. We run them with a nightly cadance.
775-

0 commit comments

Comments
 (0)