You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-ups to the cherry-picked refactor that landed the file_index
keying:
* Reject `TABLESAMPLE` without an explicit method instead of silently
treating it as `SYSTEM`
(#22000 (comment)).
PostgreSQL requires an explicit method and Spark defaults to
block-level; picking one here in core would commit to semantics
callers may not want. Added an slt case to lock the new error.
* Rephrase the `SamplePushdown` planning error from "TABLESAMPLE is
not supported for this source" to "TABLESAMPLE could not be pushed
down" since the failure may originate at any node along the
passthrough chain, not just the leaf source
(#22000 (comment)).
Updated the slt assertion to match.
* Dedupe the SYSTEM-mode adaptive split comments in the parquet
opener; the outer block now covers determinism and the inner block
covers the row-group-vs-row split math without overlap
(#22000 (comment)).
* Update the `select.md` and `relation_planner/table_sample.rs`
REPEATABLE wording to reflect that sampling now keys on the
execution `partition_index`, not the on-disk file path
(#22000 (comment)
and #discussion_r3187445171).
* Replace the opener-level "REPEATABLE ignores file name" test with a
"sampling keys on partition_index" test that verifies same
partition_index → same selection regardless of file name and
different partition_index → uncorrelated samples. Added
`with_partition_index` to the test builder.
* Refresh the `run_examples-7` snapshot to match the new seed mix
(the per-row-group hash now folds in the optional REPEATABLE seed
alongside `file_index`; deterministic but a different draw).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments