Skip to content

is there more stable way to use for 5090 in window? #258

@chef0531

Description

@chef0531

hi, I got it worked for 5090 in window but it still appears problems after a few generations. I reseach it and found out it is a known limitation of torch.compile + CUDA Graphs when the inference runs inside Gradio's background worker thread (anyio.to_thread.run_sync).
The first 1–2 generations sometimes succeed (fresh compile), but subsequent ones hit the thread-local storage assertion. is there anyway I can get this worked with optimize=True (torch.compile + CUDA Graphs) at maximum speed on RTX 5090 in window?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions