Describe the bug
See apache/datafusion#21450
Root cause: there's a spawn_blocking call per each 8KiBs read from the file, adding significant context switch overhead
To Reproduce
See apache/datafusion#21446
For the tests I've used a c7a.16xlarge ec2 instance, with a trimmed down version of hits.json to 51G (original has 217 GiB), with a warm cache (by running cat hits_50.json > /dev/null)
Expected behavior
A more efficient implementation (e.g. tokio uses a buffer size of 2MiBs when reading files)
Additional context
apache/datafusion#21478 (comment)
Describe the bug
See apache/datafusion#21450
Root cause: there's a
spawn_blockingcall per each 8KiBs read from the file, adding significant context switch overheadTo Reproduce
See apache/datafusion#21446
For the tests I've used a c7a.16xlarge ec2 instance, with a trimmed down version of hits.json to 51G (original has 217 GiB), with a warm cache (by running cat hits_50.json > /dev/null)
Expected behavior
A more efficient implementation (e.g. tokio uses a buffer size of 2MiBs when reading files)
Additional context
apache/datafusion#21478 (comment)