Skip to content

Commit 7e1a710

Browse files
authored
fix(unparser): make BigQueryDialect more robust (#21296)
## Which issue does this PR close? PR improves `BigQueryDialect` dialect to make generated SQL `BigQuery`-compatible (fix execution errors). ## What changes are included in this PR? Eight `Dialect` trait overrides added to `BigQueryDialect`: https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types 1. `date_field_extract_style` → `Extract` + `scalar_function_to_sql_overrides` BigQuery does not support `date_part()`. TPC-H Q7, Q8, Q9 fail with `Function not found: date_part`. | Before (error) | After | |---|---| | `date_part('YEAR', l_shipdate)` | `EXTRACT(YEAR FROM l_shipdate)` | 2. `interval_style` → `SQLStandard` BigQuery does not support PostgreSQL-style interval abbreviations. TPC-H Q4, Q20 fail with `Syntax error: Unexpected ")"`. | Before (error) | After | |---|---| | `INTERVAL '3 MONS'` | `INTERVAL '3' MONTH` | 3. `float64_ast_dtype` → `Float64` BigQuery does not support `DOUBLE`. Fails with `Type not found: DOUBLE`. | Before (error) | After | |---|---| | `CAST(a AS DOUBLE)` | `CAST(a AS FLOAT64)` | 4. `supports_column_alias_in_table_alias` → `false` BigQuery does not support column aliases in table alias definitions. Fails with `Expected ")" but got "("`. | Before (error) | After | |---|---| | `SELECT c.key FROM (...) AS c(key)` | `SELECT c.key FROM (SELECT o_orderkey AS key FROM orders) AS c` | 5. `utf8_cast_dtype` + `large_utf8_cast_dtype` → `String` BigQuery does not support `VARCHAR`/`TEXT`. Fails with `Type not found: VARCHAR`, `Type not found: Text`. | Before (error) | After | |---|---| | `CAST(a AS VARCHAR)` | `CAST(a AS STRING)` | | `CAST(a AS TEXT)` | `CAST(a AS STRING)` | 6. ~`int64_cast_dtype` → `Int64`~ 7. `timestamp_cast_dtype` → `Timestamp` (no timezone qualifier) https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type BigQuery does not support `TIMESTAMP WITH TIME ZONE`. Fails with `Syntax error: Expected ')' or keyword FORMAT but got keyword WITH`. `TIMESTAMP` should be used (preserves time zone information)/ | Before (error) | After | |---|---| | `CAST(a AS TIMESTAMP WITH TIME ZONE)` | `CAST(a AS TIMESTAMP)` | ## Are these changes tested? Yes. Added `test_bigquery_dialect_overrides` unit test covering all eight overrides, verified against BigQuery before and after. ## Are there any user-facing changes? No API changes. `BigQueryDialect` now generates valid BigQuery SQL for the affected expressions.
1 parent fd882fb commit 7e1a710

2 files changed

Lines changed: 94 additions & 0 deletions

File tree

datafusion/sql/src/unparser/dialect.rs

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -664,6 +664,51 @@ impl Dialect for BigQueryDialect {
664664
fn unnest_as_table_factor(&self) -> bool {
665665
true
666666
}
667+
668+
fn supports_column_alias_in_table_alias(&self) -> bool {
669+
false
670+
}
671+
672+
fn float64_ast_dtype(&self) -> ast::DataType {
673+
ast::DataType::Float64
674+
}
675+
676+
fn utf8_cast_dtype(&self) -> ast::DataType {
677+
ast::DataType::String(None)
678+
}
679+
680+
fn large_utf8_cast_dtype(&self) -> ast::DataType {
681+
ast::DataType::String(None)
682+
}
683+
684+
fn timestamp_cast_dtype(
685+
&self,
686+
_time_unit: &TimeUnit,
687+
_tz: &Option<Arc<str>>,
688+
) -> ast::DataType {
689+
ast::DataType::Timestamp(None, TimezoneInfo::None)
690+
}
691+
692+
fn date_field_extract_style(&self) -> DateFieldExtractStyle {
693+
DateFieldExtractStyle::Extract
694+
}
695+
696+
fn interval_style(&self) -> IntervalStyle {
697+
IntervalStyle::SQLStandard
698+
}
699+
700+
fn scalar_function_to_sql_overrides(
701+
&self,
702+
unparser: &Unparser,
703+
func_name: &str,
704+
args: &[Expr],
705+
) -> Result<Option<ast::Expr>> {
706+
if func_name == "date_part" {
707+
return date_part_to_sql(unparser, self.date_field_extract_style(), args);
708+
}
709+
710+
Ok(None)
711+
}
667712
}
668713

669714
impl BigQueryDialect {

datafusion/sql/src/unparser/expr.rs

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3526,4 +3526,53 @@ mod tests {
35263526
}
35273527
Ok(())
35283528
}
3529+
3530+
#[test]
3531+
fn test_bigquery_dialect_overrides() -> Result<()> {
3532+
let bigquery_dialect: Arc<dyn Dialect> = Arc::new(BigQueryDialect::new());
3533+
let unparser = Unparser::new(bigquery_dialect.as_ref());
3534+
3535+
// date_field_extract_style: EXTRACT instead of date_part
3536+
let expr = Expr::ScalarFunction(ScalarFunction {
3537+
func: Arc::new(ScalarUDF::new_from_impl(
3538+
datafusion_functions::datetime::date_part::DatePartFunc::new(),
3539+
)),
3540+
args: vec![lit("YEAR"), col("date_col")],
3541+
});
3542+
let actual = format!("{}", unparser.expr_to_sql(&expr)?);
3543+
assert_eq!(actual, "EXTRACT(YEAR FROM `date_col`)");
3544+
3545+
// interval_style: SQL standard instead of PostgresVerbose
3546+
let expr = interval_year_month_lit("3 months");
3547+
let actual = format!("{}", unparser.expr_to_sql(&expr)?);
3548+
assert_eq!(actual, "INTERVAL '3' MONTH");
3549+
3550+
// float64_ast_dtype: FLOAT64 instead of DOUBLE
3551+
let expr = cast(col("a"), DataType::Float64);
3552+
let actual = format!("{}", unparser.expr_to_sql(&expr)?);
3553+
assert_eq!(actual, "CAST(`a` AS FLOAT64)");
3554+
3555+
// supports_column_alias_in_table_alias: false
3556+
assert!(!bigquery_dialect.supports_column_alias_in_table_alias());
3557+
3558+
// utf8_cast_dtype: STRING instead of VARCHAR
3559+
let expr = cast(col("a"), DataType::Utf8);
3560+
let actual = format!("{}", unparser.expr_to_sql(&expr)?);
3561+
assert_eq!(actual, "CAST(`a` AS STRING)");
3562+
3563+
// large_utf8_cast_dtype: STRING instead of TEXT
3564+
let expr = cast(col("a"), DataType::LargeUtf8);
3565+
let actual = format!("{}", unparser.expr_to_sql(&expr)?);
3566+
assert_eq!(actual, "CAST(`a` AS STRING)");
3567+
3568+
// timestamp_cast_dtype: TIMESTAMP (no WITH TIME ZONE)
3569+
let expr = cast(
3570+
col("a"),
3571+
DataType::Timestamp(TimeUnit::Microsecond, Some("+00:00".into())),
3572+
);
3573+
let actual = format!("{}", unparser.expr_to_sql(&expr)?);
3574+
assert_eq!(actual, "CAST(`a` AS TIMESTAMP)");
3575+
3576+
Ok(())
3577+
}
35293578
}

0 commit comments

Comments
 (0)