Skip to content

Commit a047e92

Browse files
committed
Enhance documentation for target partitions in SessionConfig to clarify parallelism control and provide usage example.
1 parent 6abfc52 commit a047e92

2 files changed

Lines changed: 28 additions & 1 deletion

File tree

docs/source/user-guide/configuration.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,5 +47,26 @@ a :py:class:`~datafusion.context.SessionConfig` and :py:class:`~datafusion.conte
4747
print(ctx)
4848
4949
50+
.. _target_partitions:
51+
52+
Target partitions and threads
53+
-----------------------------
54+
55+
The :py:meth:`~datafusion.context.SessionConfig.with_target_partitions` method
56+
controls how many partitions DataFusion uses when executing a query. Each
57+
partition is processed on its own thread, so this setting effectively limits
58+
the number of threads that will be scheduled.
59+
60+
For most workloads a good starting value is the number of logical CPU cores on
61+
your machine. You can use :func:`os.cpu_count` to automatically configure this::
62+
63+
import os
64+
config = SessionConfig().with_target_partitions(os.cpu_count())
65+
66+
Choosing a value significantly higher than the available cores can lead to
67+
excessive context switching without performance gains, while a much lower value
68+
may underutilize the machine.
69+
70+
5071
You can read more about available :py:class:`~datafusion.context.SessionConfig` options in the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
5172
and about :code:`RuntimeEnvBuilder` options in the rust `online API documentation <https://docs.rs/datafusion/latest/datafusion/execution/runtime_env/struct.RuntimeEnvBuilder.html>`_.

python/datafusion/context.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,13 @@ def with_batch_size(self, batch_size: int) -> SessionConfig:
161161
def with_target_partitions(self, target_partitions: int) -> SessionConfig:
162162
"""Customize the number of target partitions for query execution.
163163
164-
Increasing partitions can increase concurrency.
164+
Each partition is processed on its own thread, so this value controls
165+
the degree of parallelism. A good starting point is the number of
166+
logical CPU cores on your machine, for example
167+
``SessionConfig().with_target_partitions(os.cpu_count())``.
168+
169+
See the :ref:`configuration guide <target_partitions>` for more
170+
discussion on choosing a value.
165171
166172
Args:
167173
target_partitions: Number of target partitions.

0 commit comments

Comments
 (0)