Commit b7e1940
committed
Enhance DataFrame.collect to utilize Rayon for parallel RecordBatch conversion
- Added Rayon as a dependency in Cargo.toml.
- Implemented a benchmark script (collect_gil_bench.py) to measure performance of serial vs parallel conversions.
- Updated documentation (collect-gil.md) to explain the impact of GIL on performance and how to run the benchmark.
- Modified the collect method in PyDataFrame to release the GIL and convert RecordBatches to PyArrow in parallel, improving CPU utilization.1 parent a047e92 commit b7e1940
5 files changed
Lines changed: 363 additions & 297 deletions
0 commit comments