Skip to content

fix: Validate spill read schema#21738

Open
2010YOUY01 wants to merge 1 commit intoapache:mainfrom
2010YOUY01:validate-spill-read
Open

fix: Validate spill read schema#21738
2010YOUY01 wants to merge 1 commit intoapache:mainfrom
2010YOUY01:validate-spill-read

Conversation

@2010YOUY01
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

Follow-up to a review comment in : #21713 (comment)
Not a bug fix, this PR tries to be more defensive and catch potential bugs.

Before, when you write a spill file from a SpillManager, then read with another SpillManager of different schema, it would succeed. This is not a expected use pattern, an error will get propagated to the caller, and become harder to debug.

This PR validates the schema when reading the first batch, and fail fast if the schema does not match.

Note it only validates the schema, if two SpillManagers with the same schema do read and write, it's still allowed, but this is not a expected use pattern. Validating this case requires assigning SpillManager UID, and add that to the Arrow IPC file metadata, can be tricky, so leave this as TODO for simplicity.

What changes are included in this PR?

Are these changes tested?

UTs

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Apr 20, 2026
Comment on lines +139 to +144
return Err(exec_datafusion_err!(
"Spill file schema mismatch: expected {}, got {}. \
The caller must use the same SpillManager that created the spill file to read it.",
expected_schema,
actual_schema
));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Err(exec_datafusion_err!(
"Spill file schema mismatch: expected {}, got {}. \
The caller must use the same SpillManager that created the spill file to read it.",
expected_schema,
actual_schema
));
return exec_err!(
"Spill file schema mismatch: expected {}, got {}. \
The caller must use the same SpillManager that created the spill file to read it.",
expected_schema,
actual_schema
);

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @2010YOUY01 makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants