Fix big performance issue in string serialization by lovasoa · Pull Request #1848 · apache/datafusion-sqlparser-rs

lovasoa · 2025-05-11T22:43:21Z

The old code was handling string escaping character by character. For every literal string in the AST, it would push it to the underlying writer character by character, resulting in thousands of write calls for long strings.

The new code calls the write function only once, with the entire string, in most cases.

Only when the string is stored unescaped do we really need to call write multiple times; and even then, we don't need to call it more than the total number of characters to escape plus one.

Here are benchmark results for serializing the following sql statement: "SELECT 'xxx...(x 10000)' as long_string" to a string in memory:

sqlparser-rs parsing benchmark/format_long_string
                        time:   [10.544 µs 10.645 µs 10.743 µs]
                        change: [-84.871% -84.195% -83.553%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

The previous implementation wrote each character individually using write! macro, which is inefficient for string formatting. The new implementation uses write_str to write larger chunks of the string at once, significantly reducing the number of write operations and formatting overhead. This change maintains the same escaping behavior but improves performance by avoiding character-by-character writes.

…splay implementation

alamb · 2025-05-13T15:31:11Z

When you say "old" code do you know what PR introduced this regression?

alamb

looks good to me -- thanks @lovasoa

lovasoa · 2025-05-13T15:35:32Z

@alamb : I meant that the code before this PR handled the string char by char. I don't think this was a regression.

alamb · 2025-05-13T15:36:39Z

Thanks @lovasoa and @jayzhan211 for the review

cc @iffyio

lovasoa · 2025-05-13T15:37:24Z

thanks for merging !

lovasoa added 3 commits May 12, 2025 00:36

add comments

f7a2f77

Update comment for clarity on quote handling in EscapeQuotedString di…

6b38948

…splay implementation

jayzhan211 approved these changes May 12, 2025

View reviewed changes

lovasoa mentioned this pull request May 13, 2025

[EPIC] Improve sqlparser performance #1557

Open

alamb approved these changes May 13, 2025

View reviewed changes

alamb merged commit 178a351 into apache:main May 13, 2025
9 checks passed

lovasoa deleted the string-literal-display-perf branch May 13, 2025 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix big performance issue in string serialization#1848

Fix big performance issue in string serialization#1848
alamb merged 3 commits intoapache:mainfrom
lovasoa:string-literal-display-perf

lovasoa commented May 11, 2025 •

edited

Loading

Uh oh!

alamb commented May 13, 2025

Uh oh!

alamb left a comment

Uh oh!

lovasoa commented May 13, 2025

Uh oh!

Uh oh!

alamb commented May 13, 2025

Uh oh!

lovasoa commented May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lovasoa commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented May 13, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

lovasoa commented May 13, 2025

Uh oh!

Uh oh!

alamb commented May 13, 2025

Uh oh!

lovasoa commented May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lovasoa commented May 11, 2025 •

edited

Loading