Skip to content

Commit bdfe987

Browse files
authored
Document the relationship between FileFormat::projection / FileFormat::filter and FileScanConfig::output_ordering (#20196)
## Which issue does this PR close? - closes #20173 - Similar to #20188 ## Rationale for this change I spent a long time trying to figure out what was going on in our fork after a DataFusion 52 upgrade, and the root cause was that the output_ordering in DataFusion 52 does *NOT* include the projection (more details here #20173 (comment)) This was not clear to me from the DataFusion code or documentation, and I think it would be helpful to clarify this in the documentation. ## What changes are included in this PR? 1. Document FileScanConfig::output_ordering better ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent cc670e8 commit bdfe987

1 file changed

Lines changed: 17 additions & 1 deletion

File tree

datafusion/datasource/src/file_scan_config.rs

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,13 @@ pub struct FileScanConfig {
168168
/// correct results (e.g., for `ORDER BY ... LIMIT` queries). When `false`,
169169
/// DataFusion may reorder file processing for optimization without affecting correctness.
170170
pub preserve_order: bool,
171-
/// All equivalent lexicographical orderings that describe the schema.
171+
/// All equivalent lexicographical output orderings of this file scan, in terms of
172+
/// [`FileSource::table_schema`]. See [`FileScanConfigBuilder::with_output_ordering`] for more
173+
/// details.
174+
///
175+
/// [`Self::eq_properties`] uses this information along with projection
176+
/// and filtering information to compute the effective
177+
/// [`EquivalenceProperties`]
172178
pub output_ordering: Vec<LexOrdering>,
173179
/// File compression type
174180
pub file_compression_type: FileCompressionType,
@@ -441,6 +447,13 @@ impl FileScanConfigBuilder {
441447
}
442448

443449
/// Set the output ordering of the files
450+
///
451+
/// The expressions are in terms of the entire table schema (file schema +
452+
/// partition columns), before any projection or filtering from the file
453+
/// scan is applied.
454+
///
455+
/// This is used for optimization purposes, e.g. to determine if a file scan
456+
/// can satisfy an `ORDER BY` without an additional sort.
444457
pub fn with_output_ordering(mut self, output_ordering: Vec<LexOrdering>) -> Self {
445458
self.output_ordering = output_ordering;
446459
self
@@ -716,6 +729,9 @@ impl DataSource for FileScanConfig {
716729
Partitioning::UnknownPartitioning(self.file_groups.len())
717730
}
718731

732+
/// Computes the effective equivalence properties of this file scan, taking
733+
/// into account the file schema, any projections or filters applied by the
734+
/// file source, and the output ordering.
719735
fn eq_properties(&self) -> EquivalenceProperties {
720736
let schema = self.file_source.table_schema().table_schema();
721737
let mut eq_properties = EquivalenceProperties::new_with_orderings(

0 commit comments

Comments
 (0)