fix: Preserve quoted mixed-case identifiers in the `pivot_unpivot` example by niebayes · Pull Request #21432 · apache/datafusion

niebayes · 2026-04-07T07:24:20Z

Summary

This PR fixes a bug in the datafusion-examples/examples/relation_planner/pivot_unpivot.rs
example implementation.

The example planner rewrites PIVOT into a GROUP BY plus CASE expressions. During that
rewrite, it rebuilt column references using unquoted col("...") expressions. That loses the
original identifier quoting and case-sensitivity, which breaks queries that pivot on quoted
mixed-case columns such as "pointNumber".

This PR preserves the original parsed column expression instead of reconstructing it from a plain
string.

Problem

Consider a query like:

SELECT *
FROM point_stats
PIVOT (
  MAX(max_value)
  FOR "pointNumber" IN ('16951' AS p16951, '16952' AS p16952)
)
ORDER BY ts

Before this change, the example planner:

Parsed "pointNumber" correctly as a quoted, case-sensitive identifier.
Extracted its name as pointNumber.
Reconstructed new expressions with col(&pivot_col_name) and col(...) for GROUP BY.

That reconstruction treated the identifier as an unquoted column reference, which could be
normalized differently from the original schema field. In practice, this means the planner could
end up looking for pointnumber while the schema still contained "pointNumber".

Root Cause

Two places in the example rewrite logic rebuilt column references from bare strings:

The generated CASE expression for the pivot column
The inferred GROUP BY expressions

That is fine for simple lowercase identifiers, but it is not correct for quoted identifiers or
qualified fields because the reconstructed expression no longer carries the original identifier
semantics.

Fix

The fix is minimal:

Reuse the already planned pivot_col expression when building the CASE expression.
Build GROUP BY expressions directly from the input schema via Expr::from(...) rather than
re-creating them with col(field_name).

This preserves:

quoted mixed-case identifiers
qualifiers
original field resolution semantics

Code Changes

File changed:

datafusion-examples/examples/relation_planner/pivot_unpivot.rs

Main changes:

Replace:

case(col(&pivot_col_name))

with:

case(pivot_col.clone())

Replace string-based GROUP BY reconstruction:

schema
    .fields()
    .iter()
    .map(|f| f.name().as_str())
    ...
    .map(col)

with schema-derived expressions:

schema
    .iter()
    .filter(...)
    .map(Expr::from)

Example Coverage

This PR also adds an additional example scenario to the same file:

a point_stats input table
a PIVOT query using quoted mixed-case column "pointNumber"
a snapshot asserting the expected output

This makes the bug and the fix visible directly in the example itself.

alamb

Thanks @niebayes -- I had just one question about one of the changes, but I don't think it blocks this PR

alamb · 2026-04-07T12:45:17Z

-        .map(|f| f.name().as_str())
-        .filter(|name| *name != pivot_col_name.as_str() && !agg_input_cols.contains(name))
-        .map(col)
+        .filter(|(_, field)| {


is this change necessary? It seems like it is more verbose but does the same thing

alamb · 2026-04-07T12:45:33Z

        for (value_alias, pivot_value) in &pivot_values {
            // CASE pivot_col WHEN pivot_value THEN agg_input END
-            let case_expr = case(col(&pivot_col_name))
+            let case_expr = case(pivot_col.clone())


this is the real fix, right?

alamb · 2026-04-11T09:09:46Z

Thanks again @niebayes

…ample (apache#21432) ## Summary This PR fixes a bug in the `datafusion-examples/examples/relation_planner/pivot_unpivot.rs` example implementation. The example planner rewrites `PIVOT` into a `GROUP BY` plus `CASE` expressions. During that rewrite, it rebuilt column references using unquoted `col("...")` expressions. That loses the original identifier quoting and case-sensitivity, which breaks queries that pivot on quoted mixed-case columns such as `"pointNumber"`. This PR preserves the original parsed column expression instead of reconstructing it from a plain string. ## Problem Consider a query like: ```sql SELECT * FROM point_stats PIVOT ( MAX(max_value) FOR "pointNumber" IN ('16951' AS p16951, '16952' AS p16952) ) ORDER BY ts ``` Before this change, the example planner: 1. Parsed `"pointNumber"` correctly as a quoted, case-sensitive identifier. 2. Extracted its name as `pointNumber`. 3. Reconstructed new expressions with `col(&pivot_col_name)` and `col(...)` for `GROUP BY`. That reconstruction treated the identifier as an unquoted column reference, which could be normalized differently from the original schema field. In practice, this means the planner could end up looking for `pointnumber` while the schema still contained `"pointNumber"`. ## Root Cause Two places in the example rewrite logic rebuilt column references from bare strings: 1. The generated `CASE` expression for the pivot column 2. The inferred `GROUP BY` expressions That is fine for simple lowercase identifiers, but it is not correct for quoted identifiers or qualified fields because the reconstructed expression no longer carries the original identifier semantics. ## Fix The fix is minimal: 1. Reuse the already planned `pivot_col` expression when building the `CASE` expression. 2. Build `GROUP BY` expressions directly from the input schema via `Expr::from(...)` rather than re-creating them with `col(field_name)`. This preserves: - quoted mixed-case identifiers - qualifiers - original field resolution semantics ## Code Changes File changed: - `datafusion-examples/examples/relation_planner/pivot_unpivot.rs` Main changes: - Replace: ```rust case(col(&pivot_col_name)) ``` with: ```rust case(pivot_col.clone()) ``` - Replace string-based `GROUP BY` reconstruction: ```rust schema .fields() .iter() .map(|f| f.name().as_str()) ... .map(col) ``` with schema-derived expressions: ```rust schema .iter() .filter(...) .map(Expr::from) ``` ## Example Coverage This PR also adds an additional example scenario to the same file: - a `point_stats` input table - a `PIVOT` query using quoted mixed-case column `"pointNumber"` - a snapshot asserting the expected output This makes the bug and the fix visible directly in the example itself.

niebayes added 2 commits April 7, 2026 07:21

fix pivot

f8dd481

fix

7b88380

alamb approved these changes Apr 7, 2026

View reviewed changes

alamb added this pull request to the merge queue Apr 11, 2026

Merged via the queue into apache:main with commit d4e629f Apr 11, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Preserve quoted mixed-case identifiers in the `pivot_unpivot` example#21432

fix: Preserve quoted mixed-case identifiers in the `pivot_unpivot` example#21432
alamb merged 2 commits intoapache:mainfrom
niebayes:fix/pivot

niebayes commented Apr 7, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Apr 7, 2026

Uh oh!

alamb Apr 7, 2026

Uh oh!

alamb commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

niebayes commented Apr 7, 2026

Summary

Problem

Root Cause

Fix

Code Changes

Example Coverage

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants