Skip to content

Commit 2cf923d

Browse files
adriangbclaude
andcommitted
fix(sql): disambiguate EXPLAIN (...) option list from grouped query
DataFusion has long accepted parentheses wrapping the query after `EXPLAIN` (e.g. `EXPLAIN (SELECT ...)` and `EXPLAIN (q1 EXCEPT q2) UNION ALL (q3 EXCEPT q4)`). The initial cut of the Postgres-style option-list parser treated every leading `(` as an option list, breaking those cases. Disambiguate by peeking one token past the `(`: if it starts a query (`SELECT`, `WITH`, `VALUES`, `TABLE`, `INSERT`, `UPDATE`, `DELETE`, `MERGE`, or another `(`), fall through to the legacy parser. Adds `explain_paren_grouping_query_is_not_mistaken_for_options` to cover the regression set that CI surfaced (`references.slt`, `union.slt`). Also reformats `docs/source/user-guide/explain-usage.md` to satisfy `prettier 2.7.1` (column-width alignment only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 46af363 commit 2cf923d

2 files changed

Lines changed: 61 additions & 11 deletions

File tree

datafusion/sql/src/parser.rs

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -482,6 +482,27 @@ impl<'a, 'b> DFParserBuilder<'a, 'b> {
482482
}
483483
}
484484

485+
/// Returns true when `tok` is the start of a query / parenthesized query
486+
/// group. Used to disambiguate `EXPLAIN (SELECT ...)` (a parenthesized query)
487+
/// from `EXPLAIN (ANALYZE) SELECT ...` (a Postgres-style option list).
488+
fn token_starts_query(tok: &Token) -> bool {
489+
match tok {
490+
Token::LParen => true,
491+
Token::Word(Word { keyword, .. }) => matches!(
492+
keyword,
493+
Keyword::SELECT
494+
| Keyword::WITH
495+
| Keyword::VALUES
496+
| Keyword::TABLE
497+
| Keyword::INSERT
498+
| Keyword::UPDATE
499+
| Keyword::DELETE
500+
| Keyword::MERGE
501+
),
502+
_ => false,
503+
}
504+
}
505+
485506
impl<'a> DFParser<'a> {
486507
#[deprecated(since = "46.0.0", note = "DFParserBuilder")]
487508
pub fn new(sql: &'a str) -> Result<Self, DataFusionError> {
@@ -799,11 +820,16 @@ impl<'a> DFParser<'a> {
799820
}
800821

801822
/// Parse a SQL `EXPLAIN`
823+
///
824+
/// After the `EXPLAIN` keyword, if the dialect supports the Postgres-style
825+
/// option list and the next non-whitespace token is `(`, we must
826+
/// disambiguate between an option list (`EXPLAIN (ANALYZE) SELECT ...`)
827+
/// and a parenthesized query (`EXPLAIN (SELECT ...)` or
828+
/// `EXPLAIN (q1 EXCEPT q2) UNION ALL ...`). See [`token_starts_query`].
802829
pub fn parse_explain(&mut self) -> Result<Statement, DataFusionError> {
803-
// Dialects that support Postgres-style `EXPLAIN (opt, ...)` may use a
804-
// leading left-paren instead of the keyword form.
805830
if self.supports_explain_with_utility_options
806831
&& self.parser.peek_token().token == Token::LParen
832+
&& !token_starts_query(&self.parser.peek_nth_token(1).token)
807833
{
808834
let raw = self.parser.parse_utility_options()?;
809835
let options = ExplainStatementOptions::from_utility_options(&raw)?;
@@ -2330,6 +2356,30 @@ mod tests {
23302356
);
23312357
}
23322358

2359+
#[test]
2360+
fn explain_paren_grouping_query_is_not_mistaken_for_options() {
2361+
// Historic DataFusion behavior allows parentheses around the
2362+
// query after EXPLAIN (e.g. `EXPLAIN (SELECT ...)` or
2363+
// `EXPLAIN (q1 EXCEPT q2) UNION ALL (q3 EXCEPT q4)`). The dialect
2364+
// gate for Postgres-style options must not swallow these.
2365+
for sql in [
2366+
"EXPLAIN (SELECT 1)",
2367+
"EXPLAIN (WITH t AS (SELECT 1) SELECT * FROM t)",
2368+
"EXPLAIN (VALUES (1), (2))",
2369+
"EXPLAIN ((SELECT 1))",
2370+
] {
2371+
let stmt = parse_with_pg(sql).unwrap_or_else(|e| {
2372+
panic!("{sql} failed under PG dialect: {e}");
2373+
});
2374+
let Statement::Explain(ExplainStatement { options, .. }) = stmt else {
2375+
panic!("Expected Statement::Explain for {sql}");
2376+
};
2377+
assert!(!options.analyze, "{sql} should not be ANALYZE");
2378+
assert!(!options.verbose, "{sql} should not be VERBOSE");
2379+
assert!(options.format.is_none(), "{sql} should have no FORMAT");
2380+
}
2381+
}
2382+
23332383
#[test]
23342384
fn explain_paren_form_analyze_verbose() {
23352385
let stmt = parse_with_pg("EXPLAIN (ANALYZE, VERBOSE) SELECT 1").unwrap();

docs/source/user-guide/explain-usage.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -255,16 +255,16 @@ SELECT ... ;
255255

256256
The recognized options are:
257257

258-
| Option | Argument | Effect |
259-
| --------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------- |
260-
| `ANALYZE` | boolean, optional | Execute the plan and collect metrics. Defaults to `TRUE` when bare. Equivalent to the `ANALYZE` keyword. |
261-
| `VERBOSE` | boolean, optional | Show per-partition metrics and additional detail. Equivalent to the `VERBOSE` keyword. |
262-
| `FORMAT` | identifier/string | One of `indent`, `tree`, `pgjson`, `graphviz`. Equivalent to the `FORMAT <format>` clause. |
258+
| Option | Argument | Effect |
259+
| --------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
260+
| `ANALYZE` | boolean, optional | Execute the plan and collect metrics. Defaults to `TRUE` when bare. Equivalent to the `ANALYZE` keyword. |
261+
| `VERBOSE` | boolean, optional | Show per-partition metrics and additional detail. Equivalent to the `VERBOSE` keyword. |
262+
| `FORMAT` | identifier/string | One of `indent`, `tree`, `pgjson`, `graphviz`. Equivalent to the `FORMAT <format>` clause. |
263263
| `METRICS` | string | Filter `ANALYZE` metrics by category. Accepts `'all'`, `'none'`, or any comma-separated subset of `rows,bytes,timing,uncategorized`. |
264-
| `LEVEL` | identifier/string | `summary` or `dev`. Controls metric verbosity for `ANALYZE`. |
265-
| `TIMING` | boolean | Sugar over `METRICS`: toggles inclusion of the `timing` category. |
266-
| `SUMMARY` | boolean | Sugar over `LEVEL`: `TRUE``summary`, `FALSE``dev`. |
267-
| `COSTS` | boolean | Include statistics in plain `EXPLAIN` output (equivalent to `SET datafusion.explain.show_statistics`). Not valid with `ANALYZE`. |
264+
| `LEVEL` | identifier/string | `summary` or `dev`. Controls metric verbosity for `ANALYZE`. |
265+
| `TIMING` | boolean | Sugar over `METRICS`: toggles inclusion of the `timing` category. |
266+
| `SUMMARY` | boolean | Sugar over `LEVEL`: `TRUE``summary`, `FALSE``dev`. |
267+
| `COSTS` | boolean | Include statistics in plain `EXPLAIN` output (equivalent to `SET datafusion.explain.show_statistics`). Not valid with `ANALYZE`. |
268268

269269
Boolean arguments can be written bare (`ANALYZE``true`), as `TRUE`/`FALSE`,
270270
`ON`/`OFF`, or `0`/`1`.

0 commit comments

Comments
 (0)