feat: Bounded Stream Supervisor by aho135 · Pull Request #19372 · apache/druid

aho135 · 2026-04-24T17:30:05Z

Description

Introduces a new property to the Stream Supervisor spec IOConfig called boundedStreamConfig which allows operators to specify start and end offset ranges for short-lived supervised ingestion. This property modifies the main Supervisor run loop to only ingest from and monitor partitions specified in the boundedStreamConfig. After the offset range has been consumed the Supervisor will transition into a terminal state (COMPLETED). The motivation for this PR came out of #19191 which submits backfill tasks that are unsupervised. Once this change is merged, 19191 can be enhanced to use the boundedStreamConfig so that the backfill tasks are supervised.

Release note

Adds a property called boundedStreamConfig to the SeekableStreamSupervisorIOConfig which allows operators to spin up a Supervisor that consumes only a specified offset range.

Key changed/added classes in this PR

SeekableStreamSupervisorIOConfig
BoundedStreamConfig
SeekableStreamSupervisor

This PR has:

+          exclusiveStartSequenceNumberPartitions,
+          generateSequenceName(
+              unfilteredStartingSequencesForSequenceName == null
+              ? startingSequences


abhishekrb19

Thanks @aho135 - took a quick glance and the approach looks good to me overall!
I’m still going through some of the main files and just checkpointing my review. Do you think it would be possible to add a simple embedded test with the new config for some end-to-end coverage?

abhishekrb19 · 2026-04-24T23:35:19Z

+    this.startSequenceNumbers = Preconditions.checkNotNull(startSequenceNumbers, "startSequenceNumbers");
+    this.endSequenceNumbers = Preconditions.checkNotNull(endSequenceNumbers, "endSequenceNumbers");
+
+    // Validation


As a guard rail, I think it'll be good to have stricter checks for each partition so there's no unintended behavior with incorrect ranges specified:
startSequenceNumbers < endSequenceNumbers

This validation is a bit difficult to do within BoundedStreamConfig because of the generic typing. But I do see there is already validation for this in Task IOConfig. kafka

The validation there isn't strict enough though because startOffset can equal endOffset. In that scenario the Supervisor spins up a task that consumes nothing and then shuts down. But since no data was consumed there is no metadata update so it gets stuck in a loop where it keeps spinning up tasks. I added additional validation to handle this scenario in this commit

FrankChen021 · 2026-04-25T06:34:19Z

@@ -4255,6 +4418,23 @@ private OrderedSequenceNumber<SequenceOffsetType> getOffsetFromStorageForPartiti
      }


P1 Bounded starts can be ignored when metadata exists

getOffsetFromStorageForPartition only falls back to boundedStreamConfig.startSequenceNumbers when no metadata/checkpoint offset exists. If a supervisor is reset or reconfigured with a requested bounded start while metadata storage still has an older offset, the stored metadata wins and the task starts from the stale position instead of the user-supplied bounded start. That can skip the requested backfill range or process a different interval than configured. Bounded mode should either clear/namespace old metadata for the run or explicitly prefer the configured start when initializing the bounded task group.

Thanks for the review @FrankChen021!

Bounded mode should either clear/namespace old metadata for the run

I'm a bit wary of automatic cleanup of metadata. I'm thinking through the scenario where a cluster operator has a running Supervisor. They want to re-ingest some older data so they resubmit the exact same spec (forgetting to update id) so the Bounded Supervisor succeeds but the previously committed offset gets lost.

explicitly prefer the configured start when initializing the bounded task group.

This falls into the same issue as above where if the operator forgets to set the id to something different than the running Supervisor then the previous committed offset is lost forever

I'm leaning towards adding validation that if metadata already exists for the id then just throw an exception and suggest the operator to resubmit with a different id or reset the Supervisor. Curious to hear your thoughts on this approach. Thanks again

That validation approach sounds right to me. The important part is preventing a bounded supervisor from accidentally mixing explicit bounded start offsets with existing committed metadata for the same id; failing fast with guidance to use a different id or reset the supervisor would avoid the silent stale-offset behavior without deleting or overwriting a running supervisor's offsets.

@FrankChen021 Thinking through this one a bit more, the validation approach is a bit tricky. It's not straightforward to tell if the existing metadata is from the bounded Supervisor itself (in which case starting from the metadata would be the correct behavior) or if it's from a previous Supervisor.

One approach we can take is that if the metadata offset falls within the configured start/end offsets then use that, otherwise fall back to startOffset. This does run the risk of partial ingestion of the specified range though.

I agree the source of the metadata is the hard part here, but I would avoid using "metadata is within the configured start/end range" as the deciding rule. If the stored offset is inside [start, end), starting there can still silently skip the prefix [start, storedOffset), which is the same class of surprise as the original issue. I think the safer behavior is still to fail fast when bounded mode finds existing metadata for the configured bounded partitions unless there is an explicit signal that this is a resume of the same bounded run.

Thanks @FrankChen021! I took a stab at this in the most recent commit
Please let me know your thoughts when you get the chance

Thanks, this looks like the right direction to me. Persisting the bounded config in the datasource metadata and rejecting existing metadata whose bounded config is missing or different addresses the silent stale-offset case, while still allowing a supervisor to resume metadata from the same bounded run.

abhishekrb19 · 2026-04-28T19:39:35Z

+  /**
+   * Handle bounded processing completion by shutting down the supervisor.
+   * At this point, all task groups are already empty (verified by isBoundedWorkComplete),
+   * so we just need to mark the supervisor as completed.
+   */
+  private void handleBoundedCompletion()
+  {
+    log.info("Bounded processing complete for supervisor[%s]. Marking as COMPLETED.", supervisorId);
+    stateManager.maybeSetState(SupervisorStateManager.BasicState.COMPLETED);


Should this call stop() with this COMPLETED state so things get unregistered and the executor is removed?

One workflow I was testing out was to submit a bounded Supervisor and have it run to completion. Then I adjusted the start/end offsets and re-submitted the spec. Then I did a hard reset to clear the metadata so it could ingest the new offset range. For this kind of workflow we would need the executor to continue running even after the initial completion

The latest commit handles this workflow

…ata validation

FrankChen021

Severity	Findings
P0	0
P1	1
P2	0
P3	0
Total	1

This is an automated review by Codex GPT-5

FrankChen021 · 2026-05-05T14:50:20Z

+    for (PartitionIdType partition : partitionsInGroup) {
+      SequenceOffsetType start = startOffsets.get(partition);
+      SequenceOffsetType end = endOffsets.get(partition);
+      if (!isOffsetAtOrBeyond(start, end)) {


[P1] Kinesis bounded ranges with start == end are skipped

The new empty-range check treats start >= end as completed for all bounded supervisors before creating any task. That is valid for Kafka's exclusive end offsets, but Kinesis declares bounded end offsets as inclusive and its task runner returns isEndOffsetExclusive() == false, so a Kinesis bounded ingestion for a single record where startSequenceNumbers equals endSequenceNumbers is marked COMPLETED without reading that record. This should be provider-aware, for example only treating equality as empty when the end offset is exclusive, while still rejecting/handling start > end appropriately.

FrankChen021

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

This is an automated review by Codex GPT-5

aho135 added 3 commits April 23, 2026 16:29

Initial implementation for BoundedStreamConfig

dfda88c

Implement isOffsetAtOrBeyond for Rabbit and Kinesis

4180ace

Unit test coverage

8cb75f6

aho135 changed the title ~~Bounded stream supervisor~~ feat: Bounded stream supervisor Apr 24, 2026

aho135 changed the title ~~feat: Bounded stream supervisor~~ feat: Bounded Stream Supervisor Apr 24, 2026

github-actions Bot added Area - Streaming Ingestion Area - Ingestion labels Apr 24, 2026

aho135 requested a review from abhishekrb19 April 24, 2026 17:48

aho135 added 3 commits April 24, 2026 11:10

Fix BoundedStreamConfigTest

300ebe3

Remove unused import

9af9729

Remove unneeded tests

e0ffef6

github-advanced-security AI found potential problems Apr 24, 2026

View reviewed changes

aho135 added 3 commits April 24, 2026 13:15

Unit test fix

162e1f3

Fix import and add coverage for RabbitStreamSupervisor

3ea2b0b

Test coverage for validateBoundedStreamConfig

8e3e81c

abhishekrb19 reviewed Apr 24, 2026

View reviewed changes

abhishekrb19 reviewed Apr 25, 2026

View reviewed changes

Comment thread ...ervice/src/main/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorIOConfig.java

FrankChen021 reviewed Apr 25, 2026

View reviewed changes

abhishekrb19 reviewed Apr 28, 2026

View reviewed changes

Re-initialize partition group and reset state after reset

4bed658

github-advanced-security AI found potential problems Apr 30, 2026

View reviewed changes

Comment thread .../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Fixed

aho135 added 9 commits April 29, 2026 19:36

Handle edge case where startOffset equals endOffset

c9181f0

Compare Kinesis sequence numbers using BigInteger

9e85331

Remove stale test case

9a32ce0

Remove redundant validation of boundedStreamConfig

b04e907

Throw DruidException with ADMIN persona for BoundedStreamConfig

8e6dfb8

Clean up unused Logger

f03abb6

javadoc and comment cleanup for isBoundedWorkComplete

ae9083f

Add embedded test for bounded ingestion

f8a313b

Add boundedStreamConfig to SeekableStreamDataSourceMetadata for metad…

5965ac4

…ata validation

aho135 added 3 commits May 1, 2026 20:24

Revert pendingCompletionGroups check

9e66948

Unit test fix

902e118

embedded-test for metadata mismatch

670749c

github-advanced-security AI found potential problems May 3, 2026

View reviewed changes

Comment thread ...sts/src/test/java/org/apache/druid/testing/embedded/indexing/KafkaBoundedSupervisorTest.java Fixed

aho135 added 3 commits May 2, 2026 17:54

Remove unused var

2457caf

Unit test fix

3943ad0

Add boundedStreamConfig documentation

395fa9a

github-actions Bot added the Area - Documentation label May 4, 2026

aho135 added 10 commits May 4, 2026 11:03

Fix spellcheck

4cde39f

Increase code coverage

7e86ec6

Increase coverage for BoundedStreamConfig

d985dc4

Remove unnecessary test

021e721

Simplify completion check in createNewTasks

d23d9c4

Remove unused function

42ada89

Unit test bounded supervisor completion

ed589c2

Improve coverage on RabbitStreamSupervisor

234bc82

Unit test coverage

1cd928d

Unit test for IllegalArgumentException for KafkaSupervisor

094427d

FrankChen021 reviewed May 5, 2026

View reviewed changes

aho135 added 5 commits May 5, 2026 10:39

Check if end offsets are exclusive for bounded work completion

3ad278d

Increase branch coverage

cf623b8

Increase branch coverage

a037ccb

Unit test coverage

0e6466c

Fix import

b1b1179

github-advanced-security AI found potential problems May 6, 2026

View reviewed changes

Comment thread ...va/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisorSpecTest.java Dismissed

aho135 added 2 commits May 5, 2026 19:00

Remove use of deprecated function

126638f

Revert to deprecated function since not initialized in mock object

2ff9cfd

FrankChen021 reviewed May 6, 2026

View reviewed changes

aho135 added 2 commits May 6, 2026 10:38

Merge branch 'master' into bounded-stream-supervisor

2d42ab4

Fix merge conflict

31c870e

		@@ -4255,6 +4418,23 @@ private OrderedSequenceNumber<SequenceOffsetType> getOffsetFromStorageForPartiti
		}

Conversation

aho135 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Key changed/added classes in this PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhishekrb19 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aho135 May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aho135 Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aho135 commented Apr 24, 2026 •

edited

Loading

aho135 May 2, 2026 •

edited

Loading

aho135 Apr 30, 2026 •

edited

Loading