is there more stable way to use for 5090 in window?

hi, I got it worked for 5090 in window but it still appears problems after a few generations. I reseach it and found out it is a known limitation of torch.compile + CUDA Graphs when the inference runs inside Gradio's background worker thread (anyio.to_thread.run_sync).
The first 1–2 generations sometimes succeed (fresh compile), but subsequent ones hit the thread-local storage assertion. is there anyway I can get this worked with optimize=True (torch.compile + CUDA Graphs) at maximum speed on  RTX 5090 in window?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is there more stable way to use for 5090 in window? #258

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

is there more stable way to use for 5090 in window? #258

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions