Skip to content

[VQueues] Provisions to add user limits to datafusion#4615

Open
AhmedSoliman wants to merge 5 commits intomainfrom
pr4615
Open

[VQueues] Provisions to add user limits to datafusion#4615
AhmedSoliman wants to merge 5 commits intomainfrom
pr4615

Conversation

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d7a6a97ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +49 to +50
sys_vqueues_sort_order(),
remote_scanner_manager.create_distributed_scanner(NAME, local_scanner),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove misleading sort-order metadata from sys_vqueues

The provider declares sys_vqueues_sort_order() as a guaranteed physical ordering, but VQueuesScanner::for_each_row scans per-stage batches and runs multiple stage scans concurrently (stages.chunks(...) + try_join_all(...)), so rows from different stages can interleave and are not globally ordered by partition_key,id,stage,.... Because DataFusion uses declared ordering to optimize/suppress sorts, this can produce incorrectly ordered results for plans that rely on that metadata (for example ORDER BY and merge-based plans).

Useful? React with 👍 / 👎.


use datafusion::arrow::datatypes::DataType;

define_sort_order!(sys_vqueue_stats(partition_key, scope, lock_name));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align sys_vqueue_stats sort order with actual scan order

sys_vqueue_stats advertises (partition_key, scope, lock_name) as its physical sort order, but rows are produced by for_each_vqueue_meta over metadata keys keyed by VQueueId, not by scope/lock_name. This means the optimizer can assume an ordering that the stream does not satisfy, which risks incorrect sort-elision decisions; the declared order should match the real key traversal (or be omitted).

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Test Results

  7 files  ±0    7 suites  ±0   2m 40s ⏱️ -13s
 47 tests ±0   47 ✅ ±0  0 💤 ±0  0 ❌ ±0 
200 runs  ±0  200 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit fc569e9. ± Comparison against base commit 4e30658.

♻️ This comment has been updated with latest results.

Expose vqueue entries as a single DataFusion table with stage-aware scanning.
When the query filters `stage`, only matching stage key kinds are scanned;
without a stage filter, all inbox stages are scanned and merged.

Also project the latest entry metadata for observability (status plus
EntryStatistics timestamps and counters), and add targeted tests for stage
predicate extraction and sys_vqueues stage filtering behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant