What happens?
I'm still in the process of paring down my query into a minimal repro (involving hundreds lines of duckdb SQL -- mostly the same column names repeated in every subquery -- and 1M row parquet files) but wanted to flag this in case there's a known issue I'm missing: when running a complicated query that entails joining two parquet tables, the query hangs indefinitely in versions 1.1 onward; however, in 1.0 it finishes in seconds. Whenever I enable trace logging, the last output is
[LOG] 2025-09-10 04:38:12.669, FileSystem, TRACE, {"fs":"LocalFileSystem","path":"hanging_2/orderlog_X1t_20250901.current.parquet","op":"READ","bytes":"22","pos":"5782417"}, CONNECTION, 2, 3, NULL
I've attempted to bisect the parquet file (by trimming off large sections off it) but it seems like the error is non-deterministic. Is it possible to see if there's a problematic block / value associated with the pos field of 5782417?
What I do know is that this doesn't seem specific to the query plan; if I trim off enough of the file, it finishes instantly.
To Reproduce
Still working on getting a repro example
OS:
RHEL 8.6
DuckDB Package Version:
1.1+ (tested in 1.3.2)
Python Version:
3.12
Full Name:
Zach Silversmith
Affiliation:
None
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?
What happens?
I'm still in the process of paring down my query into a minimal repro (involving hundreds lines of duckdb SQL -- mostly the same column names repeated in every subquery -- and 1M row parquet files) but wanted to flag this in case there's a known issue I'm missing: when running a complicated query that entails joining two parquet tables, the query hangs indefinitely in versions 1.1 onward; however, in 1.0 it finishes in seconds. Whenever I enable trace logging, the last output is
I've attempted to bisect the parquet file (by trimming off large sections off it) but it seems like the error is non-deterministic. Is it possible to see if there's a problematic block / value associated with the
posfield of5782417?What I do know is that this doesn't seem specific to the query plan; if I trim off enough of the file, it finishes instantly.
To Reproduce
Still working on getting a repro example
OS:
RHEL 8.6
DuckDB Package Version:
1.1+ (tested in 1.3.2)
Python Version:
3.12
Full Name:
Zach Silversmith
Affiliation:
None
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?