File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 22
33Profiling ` DataFrame.collect ` showed that converting each ` RecordBatch ` to
44PyArrow via ` rb.to_pyarrow(py) ` spent considerable time holding the Python GIL.
5+ Using ` py-spy ` on a query returning many batches indicated that more than
6+ 95 % of the conversion executed while the GIL was held, meaning the work was
7+ effectively serialised.
58For queries that return many batches this limited CPU utilisation because only
69one conversion could run at a time.
710
@@ -17,6 +20,6 @@ RAYON_NUM_THREADS=1 python benchmarks/collect_gil_bench.py # serial
1720python benchmarks/collect_gil_bench.py # parallel
1821```
1922
20- On this container, collecting 128 1 M‑row batches took around 0.72 s
21- serially versus 1.53 s with the default thread pool, illustrating the
22- conversion cost and the overhead of parallel execution .
23+ On this container, collecting 128 1 M‑row batches took around 1.5 s with
24+ ` RAYON_NUM_THREADS=1 ` and 0.8 s with the default thread pool, demonstrating
25+ that releasing the GIL allows conversions to run in parallel .
Original file line number Diff line number Diff line change @@ -526,7 +526,12 @@ impl PyDataFrame {
526526 let batches = wait_for_future ( py, self . df . as_ref ( ) . clone ( ) . collect ( ) ) ?
527527 . map_err ( PyDataFusionError :: from) ?;
528528
529- // Convert batches to PyArrow outside the GIL and in parallel
529+ // Profiling `rb.to_pyarrow(py)` showed that the conversion holds the
530+ // Python GIL for almost all of its execution. Serially converting a
531+ // large number of batches therefore throttles CPU utilisation. Run the
532+ // conversions in Rayon threads and only acquire the GIL when creating
533+ // the final PyArrow objects so the CPU intensive work happens in
534+ // parallel.
530535 py. allow_threads ( move || {
531536 batches
532537 . into_par_iter ( )
You can’t perform that action at this time.
0 commit comments