Skip to content

Commit 2941710

Browse files
committed
Add documentation for DataFrame string column handling and expressions
1 parent 8c30fad commit 2941710

1 file changed

Lines changed: 38 additions & 0 deletions

File tree

docs/source/user-guide/dataframe/index.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,44 @@ DataFusion's DataFrame API offers a wide range of operations:
126126
# Drop columns
127127
df = df.drop("temporary_column")
128128
129+
String Columns and Expressions
130+
------------------------------
131+
132+
Some ``DataFrame`` methods accept plain strings when an argument refers to an
133+
existing column. These include:
134+
135+
* :py:meth:`~datafusion.DataFrame.select`
136+
* :py:meth:`~datafusion.DataFrame.sort`
137+
* :py:meth:`~datafusion.DataFrame.drop`
138+
* :py:meth:`~datafusion.DataFrame.join` (``on`` argument)
139+
* :py:meth:`~datafusion.DataFrame.aggregate` (grouping columns)
140+
141+
For such methods, you can pass column names directly:
142+
143+
.. code-block:: python
144+
145+
df.sort('id')
146+
147+
The same operation can also be written with an explicit column expression:
148+
149+
.. code-block:: python
150+
151+
from datafusion import col
152+
df.sort(col('id'))
153+
154+
Whenever an argument represents an expression—such as in
155+
:py:meth:`~datafusion.DataFrame.filter` or
156+
:py:meth:`~datafusion.DataFrame.with_column`—use ``col()`` to reference columns
157+
and wrap constant values with ``lit()`` (also available as ``literal()``):
158+
159+
.. code-block:: python
160+
161+
from datafusion import col, lit
162+
df.filter(col('age') > lit(21))
163+
164+
Without ``lit()`` DataFusion would treat ``21`` as a column name rather than a
165+
constant value.
166+
129167
Terminal Operations
130168
-------------------
131169

0 commit comments

Comments
 (0)