You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains Python bindings for Rust's DataFusion.
4
+
5
+
## Development workflow
6
+
- Ensure git submodules are initialized: `git submodule update --init`.
7
+
- Build the Rust extension before running tests:
8
+
-`uv run --no-project maturin develop --uv`
9
+
- Run tests with pytest:
10
+
-`uv --no-project pytest .`
11
+
12
+
## Linting and formatting
13
+
- Use pre-commit for linting/formatting.
14
+
- Run hooks for changed files before committing:
15
+
-`pre-commit run --files <files>`
16
+
- or `pre-commit run --all-files`
17
+
- Hooks enforce:
18
+
- Python linting/formatting via Ruff
19
+
- Rust formatting via `cargo fmt`
20
+
- Rust linting via `cargo clippy`
21
+
22
+
## Notes
23
+
- The repository mixes Python and Rust; ensure changes build for both languages.
24
+
- If adding new dependencies, update `pyproject.toml` and run `uv sync --dev --no-install-package datafusion`.
25
+
26
+
## Helper Functions
27
+
-`python/datafusion/io.py` offers global context readers:
28
+
-`read_parquet`
29
+
-`read_json`
30
+
-`read_csv`
31
+
-`read_avro`
32
+
-`python/datafusion/user_defined.py` exports convenience creators for user-defined functions:
33
+
-`udf` (scalar)
34
+
-`udaf` (aggregate)
35
+
-`udwf` (window)
36
+
-`udtf` (table)
37
+
-`python/datafusion/col.py` exposes the `Col` helper with `col` and `column` instances for building column expressions using attribute access.
38
+
-`python/datafusion/catalog.py` provides Python-based catalog and schema providers.
39
+
-`python/datafusion/object_store.py` exposes object store connectors: `AmazonS3`, `GoogleCloud`, `MicrosoftAzure`, `LocalFileSystem`, and `Http`.
40
+
-`python/datafusion/unparser.py` converts logical plans back to SQL via the `Dialect` and `Unparser` classes.
41
+
-`python/datafusion/dataframe_formatter.py` offers configurable HTML and string formatting for DataFrames (replaces the deprecated `html_formatter.py`).
42
+
-`python/tests/generic.py` includes utilities for test data generation:
Choosing a value significantly higher than the available cores can lead to
67
-
excessive context switching without performance gains, while a much lower value
68
-
may underutilize the machine.
69
-
70
-
71
50
You can read more about available :py:class:`~datafusion.context.SessionConfig` options in the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
72
51
and about :code:`RuntimeEnvBuilder` options in the rust `online API documentation <https://docs.rs/datafusion/latest/datafusion/execution/runtime_env/struct.RuntimeEnvBuilder.html>`_.
0 commit comments