Skip to content

Commit b7e1940

Browse files
committed
Enhance DataFrame.collect to utilize Rayon for parallel RecordBatch conversion
- Added Rayon as a dependency in Cargo.toml. - Implemented a benchmark script (collect_gil_bench.py) to measure performance of serial vs parallel conversions. - Updated documentation (collect-gil.md) to explain the impact of GIL on performance and how to run the benchmark. - Modified the collect method in PyDataFrame to release the GIL and convert RecordBatches to PyArrow in parallel, improving CPU utilization.
1 parent a047e92 commit b7e1940

5 files changed

Lines changed: 363 additions & 297 deletions

File tree

0 commit comments

Comments
 (0)