Skip to content

Fix object mask ASan races and scratch overflows#20774

Open
sjuxax wants to merge 1 commit intodarktable-org:masterfrom
sjuxax:sjuxax/fix-object-mask-races
Open

Fix object mask ASan races and scratch overflows#20774
sjuxax wants to merge 1 commit intodarktable-org:masterfrom
sjuxax:sjuxax/fix-object-mask-races

Conversation

@sjuxax
Copy link
Copy Markdown

@sjuxax sjuxax commented Apr 8, 2026

This resolves 3-4 crashes I get with the AI masking feature. Some of them happen when running with threads=18 and don't happen with OMP_NUM_THREADS=12 darktable --threads 12. Some of them crash regardless on mask click.

@wpferguson
Copy link
Copy Markdown
Member

Please raise an issue before submitting a PR to fix something. That way if another user encounters the error when they go to raise an issue they can search and find the existing one with the PR linked to it.

@andriiryzhkov
Copy link
Copy Markdown
Contributor

Please describe the problem you are facing. What are the steps to reproduce it?

@TurboGit
Copy link
Copy Markdown
Member

Given the title I suppose this issue arise when compiling Darktable with --asan the address sanitizer.

@andriiryzhkov
Copy link
Copy Markdown
Contributor

andriiryzhkov commented Apr 14, 2026

Hey @sjuxax – there's been no movement on this PR for a while, so I opened #20815 to port the parts I could confidently review.

#20815 takes the _run_decoder thread-join portion. That one fixes a genuine race: ENCODE_READY is published before warmup finishes, so a fast click can race against warmup on the shared dt_seg_context_t. The join is the right primitive there.

I left out the box_filters.cc and guided_filter.c changes. My concern is that they swap a pre-allocated per-thread buffer for per-iteration malloc/free in hot loops that run across many modules, not just object mask. That would hit performance for every user. If OpenMP thread IDs actually exceed the pool size, I'd expect the fix to be in the allocation helper rather than in each consumer.

A few questions that would help me understand:

  • Does the overrun cause real crashes or data corruption in non-ASan release builds, or is it only flagged under ASan?
  • You noted --threads 18 triggers it while --threads 12 does not. Do you know what thread ID OpenMP is using when it overruns?
  • Is there an upstream libomp issue for the ASan/thread-id interaction you're seeing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants