hi, I got it worked for 5090 in window but it still appears problems after a few generations. I reseach it and found out it is a known limitation of torch.compile + CUDA Graphs when the inference runs inside Gradio's background worker thread (anyio.to_thread.run_sync).
The first 1–2 generations sometimes succeed (fresh compile), but subsequent ones hit the thread-local storage assertion. is there anyway I can get this worked with optimize=True (torch.compile + CUDA Graphs) at maximum speed on RTX 5090 in window?
hi, I got it worked for 5090 in window but it still appears problems after a few generations. I reseach it and found out it is a known limitation of torch.compile + CUDA Graphs when the inference runs inside Gradio's background worker thread (anyio.to_thread.run_sync).
The first 1–2 generations sometimes succeed (fresh compile), but subsequent ones hit the thread-local storage assertion. is there anyway I can get this worked with optimize=True (torch.compile + CUDA Graphs) at maximum speed on RTX 5090 in window?