Integrate torchax custom attention kernel into ulysses#392

Open

eltsai wants to merge 1 commit intomainfrom

torchax_attention

Collaborator

eltsai commented Apr 27, 2026

Adding torchax path's custom kernel into ulysses (triggered when attention=ulysses_custom)

Inference time:

==================================================
  TIMING SUMMARY
==================================================
  Load (checkpoint):      80.6s
  Compile:               186.9s
  ────────────────────────────────────────
  Inference:             167.2s
==================================================

eltsai requested a review from entrpn as a code owner

April 27, 2026 05:21

github-actions Bot commented Apr 27, 2026

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

Perseus14 reviewed

View reviewed changes

src/maxdiffusion/models/attention_flax.py Outdated

Comment on lines +689 to +693

+                  bq = 2048
+                  bkv = 2048
+                  bkv_compute = 1024
+                  bkv_compute_in = 256
+                  heads_per_tile = 1

Collaborator

Perseus14 Apr 27, 2026

Updating this to

bq = 4864
bkv = 1024
bkv_compute = 1024
bkv_compute_in = 1024
heads_per_tile = 1

and using this command gave me the following latency

  Load (checkpoint):     297.0s
  Compile:               219.8s
  ───────────────────────────────
  Inference:             147.4s

Perseus14 reviewed

View reviewed changes

src/maxdiffusion/models/custom_splash_attention.py

Perseus14 added gemini-review labels

github-actions Bot commented Apr 27, 2026

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions Bot commented Apr 27, 2026

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions Bot commented Apr 27, 2026

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions Bot commented Apr 27, 2026

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions Bot commented Apr 27, 2026

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions Bot commented Apr 27, 2026

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

entrpn reviewed

View reviewed changes

src/maxdiffusion/models/custom_splash_attention.py

src/maxdiffusion/models/custom_splash_attention.py

src/maxdiffusion/models/attention_flax.py Outdated


          Integrate torchax custom attention kernel into ulysses

daf4a31

eltsai force-pushed the torchax_attention branch from 56c76b8 to daf4a31 Compare

April 27, 2026 20:21

Collaborator Author

eltsai commented Apr 27, 2026

Updated stats:

Accelerator	Sharding	E2E time	log	Video
v7x-8	dp2-context4-tp1	139.3s	log	Video
v7x-16	dp2-context8-tp1	70.2s	log	Video

entrpn approved these changes

View reviewed changes

github-actions Bot added the pull ready label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gemini-review pull ready