Skip to content

Commit 192ceb6

Browse files
Snowflake Unparser dialect and UNNEST support (#21593)
## Which issue does this PR close? - Closes #21592. ## Rationale for this change The SQL unparser needs a Snowflake dialect. Basic dialect settings (identifier quoting, `NULLS FIRST`/`NULLS LAST`, timestamp types) are straightforward, but `UNNEST` support required more than configuration. Snowflake has no `UNNEST` keyword. Its equivalent, `LATERAL FLATTEN(INPUT => expr)`, is a table function in the `FROM` clause with output accessed via `alias."VALUE"`. This differs structurally from standard SQL: the unparser must emit a `FROM`-clause table factor with a `CROSS JOIN` instead of a `SELECT`-clause expression. It also must rewrite column references to point at the FLATTEN output, and handle several optimizer-produced plan shapes (intermediate `Limit`/`Sort` nodes, `SubqueryAlias` wrappers, composed expressions wrapping the unnest output, multi-expression projections). None of this can be expressed through `CustomDialectBuilder`. ## What changes are included in this PR? **`dialect.rs`** - New `SnowflakeDialect` with double-quote identifiers, `NULLS FIRST`/`NULLS LAST`, no empty select lists, no column aliases in table aliases, Snowflake timestamp types, and `unnest_as_lateral_flatten()`. Also wired into `CustomDialect`/`CustomDialectBuilder`. **`ast.rs`** - New `FlattenRelationBuilder` that produces `LATERAL FLATTEN(INPUT => expr, OUTER => bool)` table factors, parallel to the existing `UnnestRelationBuilder`. **`utils.rs`** - New `unproject_unnest_expr_as_flatten_value` transform that rewrites unnest placeholder columns to `_unnest.VALUE` references. **`plan.rs`** - Changes to `select_to_sql_recursively`: - The `Projection` handler scans all expressions for unnest placeholders (not just single-expression projections), then branches into the FLATTEN path or the existing table-factor path. - `peel_to_unnest_with_modifiers` walks through `Limit`/`Sort` nodes between `Projection` and `Unnest`, applying their SQL modifiers to the query builder. This handles an optimizer behavior where these nodes are inserted between the two. - `peel_to_inner_projection` walks through `SubqueryAlias` to find the inner `Projection` that feeds an `Unnest`. - `reconstruct_select_statement` gained FLATTEN-aware expression rewriting and a `has_internal_unnest_alias` predicate to strip internal `UNNEST(...)` display names. - The `Unnest` handler rejects struct columns for the FLATTEN dialect with a clear error. ## Are these changes tested? Yes. 18 new tests covering: - Simple inline arrays, string arrays, cross joins - Implicit `FROM` (UNNEST in SELECT clause) - User aliases, table aliases, literal + unnest - Subselect source with filters and limit - UDF result as FLATTEN input - `Limit` between `Projection` and `Unnest` - `Sort` between `Projection` and `Unnest` - `Limit` + `SubqueryAlias` combined - Composed expressions wrapping unnest output (e.g. `CAST`) - Composed expressions with `Limit` - Multi-expression projections - Multi-expression projections with `Limit` - `SubqueryAlias` between `Unnest` and inner `Projection` ## Are there any user-facing changes? Yes. New public API surface: - `SnowflakeDialect` struct and its constructor - `Dialect::unnest_as_lateral_flatten()` method (default `false`) - `CustomDialectBuilder::with_unnest_as_lateral_flatten()` - `FlattenRelationBuilder` and `FLATTEN_DEFAULT_ALIAS` in the AST module None of these are breaking changes, and all previous APIs should work. New traits have default implementations to ease migrations. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 5f8b131 commit 192ceb6

7 files changed

Lines changed: 1418 additions & 43 deletions

File tree

datafusion/sql/src/unparser/ast.rs

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,9 +162,52 @@ pub struct SelectBuilder {
162162
qualify: Option<ast::Expr>,
163163
value_table_mode: Option<ast::ValueTableMode>,
164164
flavor: Option<SelectFlavor>,
165+
/// Counter for generating unique LATERAL FLATTEN aliases within this SELECT.
166+
flatten_alias_counter: usize,
167+
/// Table aliases that correspond to LATERAL FLATTEN relations.
168+
/// Column references into these aliases must use `VALUE` as the column name.
169+
flatten_table_aliases: Vec<String>,
165170
}
166171

172+
/// Prefix used for auto-generated LATERAL FLATTEN table aliases.
173+
const FLATTEN_ALIAS_PREFIX: &str = "_unnest";
174+
167175
impl SelectBuilder {
176+
/// Generate a unique alias for a LATERAL FLATTEN relation
177+
/// (`_unnest_1`, `_unnest_2`, …). Each call returns a fresh name.
178+
pub fn next_flatten_alias(&mut self) -> String {
179+
self.flatten_alias_counter += 1;
180+
format!("{FLATTEN_ALIAS_PREFIX}_{}", self.flatten_alias_counter)
181+
}
182+
183+
/// Register a table alias as pointing to a LATERAL FLATTEN relation.
184+
pub fn add_flatten_table_alias(&mut self, alias: String) {
185+
self.flatten_table_aliases.push(alias);
186+
}
187+
188+
/// Returns true if no FLATTEN table aliases have been registered.
189+
pub fn flatten_table_aliases_empty(&self) -> bool {
190+
self.flatten_table_aliases.is_empty()
191+
}
192+
193+
/// Returns true if the given table alias refers to a FLATTEN relation.
194+
pub fn is_flatten_table_alias(&self, alias: &str) -> bool {
195+
self.flatten_table_aliases.iter().any(|a| a == alias)
196+
}
197+
198+
/// Returns the most recently generated flatten alias, or `None` if
199+
/// `next_flatten_alias` has not been called yet.
200+
pub fn current_flatten_alias(&self) -> Option<String> {
201+
if self.flatten_alias_counter > 0 {
202+
Some(format!(
203+
"{FLATTEN_ALIAS_PREFIX}_{}",
204+
self.flatten_alias_counter
205+
))
206+
} else {
207+
None
208+
}
209+
}
210+
168211
pub fn distinct(&mut self, value: Option<ast::Distinct>) -> &mut Self {
169212
self.distinct = value;
170213
self
@@ -371,6 +414,8 @@ impl SelectBuilder {
371414
qualify: Default::default(),
372415
value_table_mode: Default::default(),
373416
flavor: Some(SelectFlavor::Standard),
417+
flatten_alias_counter: 0,
418+
flatten_table_aliases: Vec::new(),
374419
}
375420
}
376421
}
@@ -432,11 +477,11 @@ pub struct RelationBuilder {
432477
}
433478

434479
#[derive(Clone)]
435-
#[expect(clippy::large_enum_variant)]
436480
enum TableFactorBuilder {
437481
Table(TableRelationBuilder),
438482
Derived(DerivedRelationBuilder),
439483
Unnest(UnnestRelationBuilder),
484+
Flatten(FlattenRelationBuilder),
440485
Empty,
441486
}
442487

@@ -458,6 +503,11 @@ impl RelationBuilder {
458503
self
459504
}
460505

506+
pub fn flatten(&mut self, value: FlattenRelationBuilder) -> &mut Self {
507+
self.relation = Some(TableFactorBuilder::Flatten(value));
508+
self
509+
}
510+
461511
pub fn empty(&mut self) -> &mut Self {
462512
self.relation = Some(TableFactorBuilder::Empty);
463513
self
@@ -474,6 +524,9 @@ impl RelationBuilder {
474524
Some(TableFactorBuilder::Unnest(ref mut rel_builder)) => {
475525
rel_builder.alias = value;
476526
}
527+
Some(TableFactorBuilder::Flatten(ref mut rel_builder)) => {
528+
rel_builder.alias = value;
529+
}
477530
Some(TableFactorBuilder::Empty) => (),
478531
None => (),
479532
}
@@ -484,6 +537,7 @@ impl RelationBuilder {
484537
Some(TableFactorBuilder::Table(ref value)) => Some(value.build()?),
485538
Some(TableFactorBuilder::Derived(ref value)) => Some(value.build()?),
486539
Some(TableFactorBuilder::Unnest(ref value)) => Some(value.build()?),
540+
Some(TableFactorBuilder::Flatten(ref value)) => Some(value.build()?),
487541
Some(TableFactorBuilder::Empty) => None,
488542
None => return Err(Into::into(UninitializedFieldError::from("relation"))),
489543
})
@@ -688,6 +742,77 @@ impl Default for UnnestRelationBuilder {
688742
}
689743
}
690744

745+
/// Builds a `LATERAL FLATTEN(INPUT => expr, OUTER => bool)` table factor
746+
/// for Snowflake-style unnesting.
747+
#[derive(Clone)]
748+
pub struct FlattenRelationBuilder {
749+
pub alias: Option<ast::TableAlias>,
750+
/// The input expression to flatten (e.g. a column reference).
751+
pub input_expr: Option<ast::Expr>,
752+
/// Whether to preserve rows for NULL/empty inputs (Snowflake `OUTER` param).
753+
pub outer: bool,
754+
}
755+
756+
impl FlattenRelationBuilder {
757+
pub fn alias(&mut self, value: Option<ast::TableAlias>) -> &mut Self {
758+
self.alias = value;
759+
self
760+
}
761+
762+
pub fn input_expr(&mut self, value: ast::Expr) -> &mut Self {
763+
self.input_expr = Some(value);
764+
self
765+
}
766+
767+
pub fn outer(&mut self, value: bool) -> &mut Self {
768+
self.outer = value;
769+
self
770+
}
771+
772+
pub fn build(&self) -> Result<ast::TableFactor, BuilderError> {
773+
let input = self.input_expr.clone().ok_or_else(|| {
774+
BuilderError::from(UninitializedFieldError::from("input_expr"))
775+
})?;
776+
777+
let mut args = vec![ast::FunctionArg::Named {
778+
name: ast::Ident::new("INPUT"),
779+
arg: ast::FunctionArgExpr::Expr(input),
780+
operator: ast::FunctionArgOperator::RightArrow,
781+
}];
782+
783+
if self.outer {
784+
args.push(ast::FunctionArg::Named {
785+
name: ast::Ident::new("OUTER"),
786+
arg: ast::FunctionArgExpr::Expr(ast::Expr::Value(
787+
ast::Value::Boolean(true).into(),
788+
)),
789+
operator: ast::FunctionArgOperator::RightArrow,
790+
});
791+
}
792+
793+
Ok(ast::TableFactor::Function {
794+
lateral: true,
795+
name: ast::ObjectName::from(vec![ast::Ident::new("FLATTEN")]),
796+
args,
797+
alias: self.alias.clone(),
798+
})
799+
}
800+
801+
fn create_empty() -> Self {
802+
Self {
803+
alias: None,
804+
input_expr: None,
805+
outer: false,
806+
}
807+
}
808+
}
809+
810+
impl Default for FlattenRelationBuilder {
811+
fn default() -> Self {
812+
Self::create_empty()
813+
}
814+
}
815+
691816
/// Runtime error when a `build()` method is called and one or more required fields
692817
/// do not have a value.
693818
#[derive(Debug, Clone)]

datafusion/sql/src/unparser/dialect.rs

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,15 @@ pub trait Dialect: Send + Sync {
211211
false
212212
}
213213

214+
/// Unparse the unnest plan as `LATERAL FLATTEN(INPUT => expr, ...)`.
215+
///
216+
/// Snowflake uses FLATTEN as a table function instead of the SQL-standard UNNEST.
217+
/// When this returns `true`, the unparser emits
218+
/// `LATERAL FLATTEN(INPUT => <col>, OUTER => <bool>)` in the FROM clause.
219+
fn unnest_as_lateral_flatten(&self) -> bool {
220+
false
221+
}
222+
214223
/// Allows the dialect to override column alias unparsing if the dialect has specific rules.
215224
/// Returns None if the default unparsing should be used, or Some(String) if there is
216225
/// a custom implementation for the alias.
@@ -718,6 +727,59 @@ impl BigQueryDialect {
718727
}
719728
}
720729

730+
/// Dialect for Snowflake SQL.
731+
///
732+
/// Key differences from the default dialect:
733+
/// - Uses double-quote identifier quoting
734+
/// - Supports `NULLS FIRST`/`NULLS LAST` in `ORDER BY`
735+
/// - Does not support empty select lists (`SELECT FROM t`)
736+
/// - Does not support column aliases in table alias definitions
737+
/// (Snowflake accepts the syntax but silently ignores the renames in join contexts)
738+
/// - Unparses `UNNEST` plans as `LATERAL FLATTEN(INPUT => expr, ...)`
739+
pub struct SnowflakeDialect {}
740+
741+
#[expect(clippy::new_without_default)]
742+
impl SnowflakeDialect {
743+
#[must_use]
744+
pub fn new() -> Self {
745+
Self {}
746+
}
747+
}
748+
749+
impl Dialect for SnowflakeDialect {
750+
fn identifier_quote_style(&self, _: &str) -> Option<char> {
751+
Some('"')
752+
}
753+
754+
fn supports_nulls_first_in_sort(&self) -> bool {
755+
true
756+
}
757+
758+
fn supports_empty_select_list(&self) -> bool {
759+
false
760+
}
761+
762+
fn supports_column_alias_in_table_alias(&self) -> bool {
763+
false
764+
}
765+
766+
fn timestamp_cast_dtype(
767+
&self,
768+
_time_unit: &TimeUnit,
769+
tz: &Option<Arc<str>>,
770+
) -> ast::DataType {
771+
if tz.is_some() {
772+
ast::DataType::Timestamp(None, TimezoneInfo::WithTimeZone)
773+
} else {
774+
ast::DataType::Timestamp(None, TimezoneInfo::None)
775+
}
776+
}
777+
778+
fn unnest_as_lateral_flatten(&self) -> bool {
779+
true
780+
}
781+
}
782+
721783
pub struct CustomDialect {
722784
identifier_quote_style: Option<char>,
723785
supports_nulls_first_in_sort: bool,
@@ -740,6 +802,7 @@ pub struct CustomDialect {
740802
window_func_support_window_frame: bool,
741803
full_qualified_col: bool,
742804
unnest_as_table_factor: bool,
805+
unnest_as_lateral_flatten: bool,
743806
}
744807

745808
impl Default for CustomDialect {
@@ -769,6 +832,7 @@ impl Default for CustomDialect {
769832
window_func_support_window_frame: true,
770833
full_qualified_col: false,
771834
unnest_as_table_factor: false,
835+
unnest_as_lateral_flatten: false,
772836
}
773837
}
774838
}
@@ -883,6 +947,10 @@ impl Dialect for CustomDialect {
883947
fn unnest_as_table_factor(&self) -> bool {
884948
self.unnest_as_table_factor
885949
}
950+
951+
fn unnest_as_lateral_flatten(&self) -> bool {
952+
self.unnest_as_lateral_flatten
953+
}
886954
}
887955

888956
/// `CustomDialectBuilder` to build `CustomDialect` using builder pattern
@@ -921,6 +989,7 @@ pub struct CustomDialectBuilder {
921989
window_func_support_window_frame: bool,
922990
full_qualified_col: bool,
923991
unnest_as_table_factor: bool,
992+
unnest_as_lateral_flatten: bool,
924993
}
925994

926995
impl Default for CustomDialectBuilder {
@@ -956,6 +1025,7 @@ impl CustomDialectBuilder {
9561025
window_func_support_window_frame: true,
9571026
full_qualified_col: false,
9581027
unnest_as_table_factor: false,
1028+
unnest_as_lateral_flatten: false,
9591029
}
9601030
}
9611031

@@ -983,6 +1053,7 @@ impl CustomDialectBuilder {
9831053
window_func_support_window_frame: self.window_func_support_window_frame,
9841054
full_qualified_col: self.full_qualified_col,
9851055
unnest_as_table_factor: self.unnest_as_table_factor,
1056+
unnest_as_lateral_flatten: self.unnest_as_lateral_flatten,
9861057
}
9871058
}
9881059

@@ -1129,4 +1200,12 @@ impl CustomDialectBuilder {
11291200
self.unnest_as_table_factor = unnest_as_table_factor;
11301201
self
11311202
}
1203+
1204+
pub fn with_unnest_as_lateral_flatten(
1205+
mut self,
1206+
unnest_as_lateral_flatten: bool,
1207+
) -> Self {
1208+
self.unnest_as_lateral_flatten = unnest_as_lateral_flatten;
1209+
self
1210+
}
11321211
}

0 commit comments

Comments
 (0)