What happens?
When I use sniff_csv implicitly using read_csv with the arguments ", header=True, delim=';', sample_size=10241", it triggers an "unable to detect csv format"-error.
- I can place multiple different lines on line 20241 and they all trigger it. The lines seem fine and not different than any others.
- When I reduce the sample_size to 10240, the import works again.
- When I then remove lines 10236-10240 from the original file and keep the sample_size on 10240, it also works.
This indicates to me that 10240 is an upper limit for the sampling before it goes wrong, but the default is of 20xxx.
I unfortunately can't provide you the sample.
The duckdb-version is "v1.2.0"
To Reproduce
self._db.execute(f"CREATE OR REPLACE TABLE {all_table_name} AS SELECT * FROM read_csv('{self.event.tmp_file_path}', header=True, delim=';', sample_size=10241)")
OS:
Linux
DuckDB Package Version:
1.2.0
Python Version:
3.13.7
Full Name:
Michel
Affiliation:
Acme
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?
What happens?
When I use sniff_csv implicitly using read_csv with the arguments ", header=True, delim=';', sample_size=10241", it triggers an "unable to detect csv format"-error.
This indicates to me that 10240 is an upper limit for the sampling before it goes wrong, but the default is of 20xxx.
I unfortunately can't provide you the sample.
The duckdb-version is "v1.2.0"
To Reproduce
OS:
Linux
DuckDB Package Version:
1.2.0
Python Version:
3.13.7
Full Name:
Michel
Affiliation:
Acme
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?