From 65fe07943a0434e9c02b5a8012ba679ced9f17af Mon Sep 17 00:00:00 2001 From: "serenagu@google.com" Date: Mon, 28 Jul 2025 23:55:49 +0000 Subject: [PATCH 1/7] ltx instruction update --- README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 68b887f9f..e2992431c 100644 --- a/README.md +++ b/README.md @@ -171,7 +171,16 @@ To generate images, run the following command: ```bash python -m src.maxdiffusion.generate src/maxdiffusion/configs/base21.yml run_name="my_run" ``` - +- **LTX Video** + 1. In the folder src/maxdiffusion/models/ltx_video/utils, run: + ```bash + python convert_torch_weights_to_jax.py --ckpt_path [LOCAL DIRECTORY FOR WEIGHTS] --transformer_config_path ../xora_v1.2-13B-balanced-128.json + ``` + 2. In the repo folder, run: + ```bash + python src/maxdiffusion/generate_ltx_video.py src/maxdiffusion/configs/ltx_video.yml output_dir="[SAME DIRECTORY]" config_path="src/maxdiffusion/models/ltx_video/xora_v1.2-13B-balanced-128.json" + ``` + 3. Other generation parameters can be set in ltx_video.yml file. ## Flux First make sure you have permissions to access the Flux repos in Huggingface. From 8af7225b4b78a99373ab4c26a63de4df414faab1 Mon Sep 17 00:00:00 2001 From: "serenagu@google.com" Date: Tue, 29 Jul 2025 18:59:35 +0000 Subject: [PATCH 2/7] updated whatsnew --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index e2992431c..64b043410 100644 --- a/README.md +++ b/README.md @@ -24,6 +24,7 @@ - **`2024/10/22`**: LoRA support for Hyper SDXL. - **`2024/8/1`**: Orbax is the new default checkpointer. You can still use `pipeline.save_pretrained` after training to save in diffusers format. - **`2024/7/20`**: Dreambooth training for Stable Diffusion 1.x,2.x is now supported. +- **`2025/7/29`**: LTX-Video text2video generation is now supported. # Overview @@ -41,6 +42,7 @@ MaxDiffusion supports * Load Multiple LoRA (SDXL inference). * ControlNet inference (Stable Diffusion 1.4 & SDXL). * Dreambooth training support for Stable Diffusion 1.x,2.x. +* LTX-Video (inference). # Table of Contents @@ -172,15 +174,15 @@ To generate images, run the following command: python -m src.maxdiffusion.generate src/maxdiffusion/configs/base21.yml run_name="my_run" ``` - **LTX Video** - 1. In the folder src/maxdiffusion/models/ltx_video/utils, run: + - In the folder src/maxdiffusion/models/ltx_video/utils, run: ```bash python convert_torch_weights_to_jax.py --ckpt_path [LOCAL DIRECTORY FOR WEIGHTS] --transformer_config_path ../xora_v1.2-13B-balanced-128.json ``` - 2. In the repo folder, run: + - In the repo folder, run: ```bash python src/maxdiffusion/generate_ltx_video.py src/maxdiffusion/configs/ltx_video.yml output_dir="[SAME DIRECTORY]" config_path="src/maxdiffusion/models/ltx_video/xora_v1.2-13B-balanced-128.json" ``` - 3. Other generation parameters can be set in ltx_video.yml file. + - Other generation parameters can be set in ltx_video.yml file. ## Flux First make sure you have permissions to access the Flux repos in Huggingface. From 14b8c1f1a56043ee9601d23c7d8ae0c6b6047145 Mon Sep 17 00:00:00 2001 From: "serenagu@google.com" Date: Tue, 29 Jul 2025 22:16:02 +0000 Subject: [PATCH 3/7] updated table of contents --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 64b043410..929ea2146 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ - **`2024/10/22`**: LoRA support for Hyper SDXL. - **`2024/8/1`**: Orbax is the new default checkpointer. You can still use `pipeline.save_pretrained` after training to save in diffusers format. - **`2024/7/20`**: Dreambooth training for Stable Diffusion 1.x,2.x is now supported. -- **`2025/7/29`**: LTX-Video text2video generation is now supported. +- **`2025/7/29`**: LTX-Video text2vid generation is now supported. # Overview @@ -42,7 +42,7 @@ MaxDiffusion supports * Load Multiple LoRA (SDXL inference). * ControlNet inference (Stable Diffusion 1.4 & SDXL). * Dreambooth training support for Stable Diffusion 1.x,2.x. -* LTX-Video (inference). +* LTX-Video text2vid (inference). # Table of Contents @@ -55,6 +55,7 @@ MaxDiffusion supports - [Training](#training) - [Dreambooth](#dreambooth) - [Inference](#inference) + - [LTX-Video](#ltx-video) - [Flux](#flux) - [Fused Attention for GPU:](#fused-attention-for-gpu) - [Hyper SDXL LoRA](#hyper-sdxl-lora) @@ -173,7 +174,7 @@ To generate images, run the following command: ```bash python -m src.maxdiffusion.generate src/maxdiffusion/configs/base21.yml run_name="my_run" ``` -- **LTX Video** + ## LTX-Video - In the folder src/maxdiffusion/models/ltx_video/utils, run: ```bash python convert_torch_weights_to_jax.py --ckpt_path [LOCAL DIRECTORY FOR WEIGHTS] --transformer_config_path ../xora_v1.2-13B-balanced-128.json @@ -216,7 +217,6 @@ To generate images, run the following command: ```bash python src/maxdiffusion/generate_flux.py src/maxdiffusion/configs/base_flux_schnell.yml jax_cache_dir=/tmp/cache_dir run_name=flux_test output_dir=/tmp/ prompt="photograph of an electronics chip in the shape of a race car with trillium written on its side" per_device_batch_size=1 ici_data_parallelism=1 ici_fsdp_parallelism=-1 offload_encoders=False ``` - ## Fused Attention for GPU: Fused Attention for GPU is supported via TransformerEngine. Installation instructions: @@ -333,3 +333,5 @@ This script will automatically format your code with `pyink` and help you identi The full suite of -end-to end tests is in `tests` and `src/maxdiffusion/tests`. We run them with a nightly cadance. + + From 06a8f55596c87f487ca9d60f3c8ad60260bf87af Mon Sep 17 00:00:00 2001 From: "serenagu@google.com" Date: Tue, 29 Jul 2025 22:27:23 +0000 Subject: [PATCH 4/7] changed order --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 929ea2146..e3d7ab075 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ [![Unit Tests](https://github.com/google/maxtext/actions/workflows/UnitTests.yml/badge.svg)](https://github.com/google/maxdiffusion/actions/workflows/UnitTests.yml) # What's new? +- **`2025/7/29`**: LTX-Video text2vid generation is now supported. - **`2025/04/17`**: Flux Finetuning. - **`2025/02/12`**: Flux LoRA for inference. - **`2025/02/08`**: Flux schnell & dev inference. @@ -24,7 +25,6 @@ - **`2024/10/22`**: LoRA support for Hyper SDXL. - **`2024/8/1`**: Orbax is the new default checkpointer. You can still use `pipeline.save_pretrained` after training to save in diffusers format. - **`2024/7/20`**: Dreambooth training for Stable Diffusion 1.x,2.x is now supported. -- **`2025/7/29`**: LTX-Video text2vid generation is now supported. # Overview From 16f6b1f66627183a5b61832bbccf698ed45e0d98 Mon Sep 17 00:00:00 2001 From: Serenagu525 <41308432+Serenagu525@users.noreply.github.com> Date: Tue, 5 Aug 2025 15:28:12 -0700 Subject: [PATCH 5/7] Fix logging error --- .../pipelines/ltx_video/ltx_video_pipeline.py | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/src/maxdiffusion/pipelines/ltx_video/ltx_video_pipeline.py b/src/maxdiffusion/pipelines/ltx_video/ltx_video_pipeline.py index 0ca816f9e..1e0abe698 100644 --- a/src/maxdiffusion/pipelines/ltx_video/ltx_video_pipeline.py +++ b/src/maxdiffusion/pipelines/ltx_video/ltx_video_pipeline.py @@ -60,12 +60,10 @@ def validate_transformer_inputs(prompt_embeds, fractional_coords, latents, encoder_attention_segment_ids): # Note: reference shape annotated for first pass default inference parameters - max_logging.log("prompts_embeds.shape: ", prompt_embeds.shape, prompt_embeds.dtype) # (3, 256, 4096) float32 - max_logging.log("fractional_coords.shape: ", fractional_coords.shape, fractional_coords.dtype) # (3, 3, 3072) float32 - max_logging.log("latents.shape: ", latents.shape, latents.dtype) # (1, 3072, 128) float 32 - max_logging.log( - "encoder_attention_segment_ids.shape: ", encoder_attention_segment_ids.shape, encoder_attention_segment_ids.dtype - ) # (3, 256) int32 + max_logging.log(f"prompts_embeds.shape: {prompt_embeds.shape}") # (3, 256, 4096) float32 + max_logging.log(f"fractional_coords.shape: {fractional_coords.shape}") # (3, 3, 3072) float32 + max_logging.log(f"latents.shape: {latents.shape}") # (1, 3072, 128) float 32 + max_logging.log(f"encoder_attention_segment_ids.shape: {encoder_attention_segment_ids.shape}") # (3, 256) int32 class LTXVideoPipeline: From bf4a64635d87cf6ecae6cfa288526f370b9ddb4a Mon Sep 17 00:00:00 2001 From: "serenagu@google.com" Date: Wed, 6 Aug 2025 17:00:47 +0000 Subject: [PATCH 6/7] renamed json --- .../ltx_video/{xora_v1.2-13B-balanced-128.json => ltxv-13B.json} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename src/maxdiffusion/models/ltx_video/{xora_v1.2-13B-balanced-128.json => ltxv-13B.json} (100%) diff --git a/src/maxdiffusion/models/ltx_video/xora_v1.2-13B-balanced-128.json b/src/maxdiffusion/models/ltx_video/ltxv-13B.json similarity index 100% rename from src/maxdiffusion/models/ltx_video/xora_v1.2-13B-balanced-128.json rename to src/maxdiffusion/models/ltx_video/ltxv-13B.json From 738afda0a617d839e9860690d32fea95f6320aba Mon Sep 17 00:00:00 2001 From: "serenagu@google.com" Date: Wed, 6 Aug 2025 18:56:35 +0000 Subject: [PATCH 7/7] renamed files --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e3d7ab075..4d776ca21 100644 --- a/README.md +++ b/README.md @@ -177,11 +177,11 @@ To generate images, run the following command: ## LTX-Video - In the folder src/maxdiffusion/models/ltx_video/utils, run: ```bash - python convert_torch_weights_to_jax.py --ckpt_path [LOCAL DIRECTORY FOR WEIGHTS] --transformer_config_path ../xora_v1.2-13B-balanced-128.json + python convert_torch_weights_to_jax.py --ckpt_path [LOCAL DIRECTORY FOR WEIGHTS] --transformer_config_path ../ltxv-13B.json ``` - In the repo folder, run: ```bash - python src/maxdiffusion/generate_ltx_video.py src/maxdiffusion/configs/ltx_video.yml output_dir="[SAME DIRECTORY]" config_path="src/maxdiffusion/models/ltx_video/xora_v1.2-13B-balanced-128.json" + python src/maxdiffusion/generate_ltx_video.py src/maxdiffusion/configs/ltx_video.yml output_dir="[SAME DIRECTORY]" config_path="src/maxdiffusion/models/ltx_video/ltxv-13B.json" ``` - Other generation parameters can be set in ltx_video.yml file. ## Flux