Skip to content

Commit b5a52b2

Browse files
committed
docs: clarify SessionContext retention in DataFrame and stream consumption
1 parent 687c226 commit b5a52b2

2 files changed

Lines changed: 8 additions & 0 deletions

File tree

docs/source/user-guide/dataframe/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,9 @@ out-of-memory errors.
168168
for batch in reader:
169169
... # process each batch as it is produced
170170
171+
Note that streams retain the originating ``SessionContext`` internally, so the
172+
context can be safely dropped once the stream has been obtained.
173+
171174
DataFrames are also iterable, yielding :class:`datafusion.RecordBatch` objects
172175
that implement the Arrow C data interface. These batches can be consumed by
173176
libraries like PyArrow without copying:

python/datafusion/dataframe.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1116,6 +1116,11 @@ def __arrow_c_stream__(self, requested_schema: object | None = None) -> object:
11161116
provided, only straightforward projections such as column selection or
11171117
reordering are applied.
11181118
1119+
The returned capsule holds a reference to the originating
1120+
:class:`SessionContext`, keeping it alive until the stream is fully
1121+
consumed. This makes it safe to drop the original context after obtaining
1122+
the stream.
1123+
11191124
Args:
11201125
requested_schema: Attempt to provide the DataFrame using this schema.
11211126

0 commit comments

Comments
 (0)