|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code when working with code in this repository. |
| 4 | + |
| 5 | +## Project overview |
| 6 | + |
| 7 | +This is the **production** duckdb-python client — the `duckdb` package on PyPI. It provides Python bindings for [DuckDB](https://duckdb.org), an in-process OLAP database engine, via pybind11 and a custom scikit-build-core build backend. |
| 8 | + |
| 9 | +- **Repository**: https://github.com/duckdb/duckdb-python |
| 10 | +- **Package name**: `duckdb` |
| 11 | +- **Bindings**: pybind11 |
| 12 | +- **Build backend**: `duckdb_packaging.build_backend` (custom wrapper around scikit-build-core) |
| 13 | +- **Supported Python**: 3.10, 3.11, 3.12, 3.13, 3.14 |
| 14 | +- **Free-threaded Python**: not supported in this client. A separate prototype client based on DuckDB's C API targets free-threading, Stable ABI, and multi-interpreter support. |
| 15 | + |
| 16 | +## IMPORTANT: build before running anything |
| 17 | + |
| 18 | +**You MUST complete a full build before running tests, scripts, or `uv run` in a fresh worktree or after a clean slate.** `uv run pytest` triggers scikit-build-core's editable rebuild on import, which compiles 2000+ C++ files from scratch — this takes 5–10 minutes and will exceed the Bash tool's default timeout. Do not attempt to run tests, scripts, or `uv run python` until the two-step build below has completed successfully. |
| 19 | + |
| 20 | +```bash |
| 21 | +# Step 1: install build deps (~5 seconds) |
| 22 | +uv sync --only-group build --no-install-project -p 3.13 |
| 23 | + |
| 24 | +# Step 2: build the extension (3–10 min cold, ~30s with sccache, use timeout: 600000) |
| 25 | +uv sync --no-build-isolation -v --reinstall -p 3.13 |
| 26 | +``` |
| 27 | + |
| 28 | +After step 2 completes, `uv run pytest`, `uv run python`, and `.venv/bin/python` all work immediately. Subsequent C++ changes trigger fast incremental rebuilds (seconds with sccache), not full cold builds. |
| 29 | + |
| 30 | +## Build system |
| 31 | + |
| 32 | +### Editable install (standard development workflow) |
| 33 | + |
| 34 | +The build uses `--no-build-isolation` for two reasons: (1) it makes incremental rebuilds fast by reusing the target venv, and (2) **it keeps pybind11 headers in the venv so CLion / other C++ IDEs can resolve them** — with build isolation, pybind11 is installed into a temp env that gets destroyed after the build, leaving `compile_commands.json` with stale include paths and CLion unable to navigate the bindings code. |
| 35 | + |
| 36 | +`--no-build-isolation` requires that build dependencies are already installed. **This is a two-step process on a fresh venv:** |
| 37 | + |
| 38 | +**Step 1 — install build deps** (fast, ~5 seconds): |
| 39 | + |
| 40 | +```bash |
| 41 | +uv sync --only-group build --no-install-project -p 3.13 |
| 42 | +``` |
| 43 | + |
| 44 | +**Step 2 — build the extension** (3–10 min cold, ~30 seconds with warm sccache): |
| 45 | + |
| 46 | +```bash |
| 47 | +uv sync --no-build-isolation -v --reinstall -p 3.13 |
| 48 | +``` |
| 49 | + |
| 50 | +This produces a debug editable install in `build/debug/` with `editable.mode = "redirect"`. Python code changes are picked up immediately; C++ changes require a rebuild. |
| 51 | + |
| 52 | +### sccache (strongly recommended) |
| 53 | + |
| 54 | +sccache caches compiled object files across builds and worktrees. Without it, cold builds take 5–10 minutes (2000+ C++ compilation units). With a warm cache, they take ~30 seconds. |
| 55 | + |
| 56 | +```bash |
| 57 | +# Export before any uv sync |
| 58 | +export CMAKE_C_COMPILER_LAUNCHER="$(command -v sccache)" |
| 59 | +export CMAKE_CXX_COMPILER_LAUNCHER="$(command -v sccache)" |
| 60 | +``` |
| 61 | + |
| 62 | +Install with `brew install sccache` (macOS). Check cache state with `sccache -s`. |
| 63 | + |
| 64 | +### Non-editable (release-style) install |
| 65 | + |
| 66 | +```bash |
| 67 | +uv sync --no-build-isolation --no-editable -v --reinstall -p 3.13 |
| 68 | +``` |
| 69 | + |
| 70 | +Produces a release build (no debug symbols, optimized). |
| 71 | + |
| 72 | +### Wheel build |
| 73 | + |
| 74 | +```bash |
| 75 | +uv build --wheel |
| 76 | +``` |
| 77 | + |
| 78 | +Produces a wheel in `dist/`. Uses cibuildwheel for CI — see `pyproject.toml` `[tool.cibuildwheel]` for the CI wheel matrix (macOS arm64/x86_64, Linux x86_64/aarch64, Windows AMD64/ARM64). |
| 79 | + |
| 80 | +### sdist build |
| 81 | + |
| 82 | +```bash |
| 83 | +uv build --sdist |
| 84 | +``` |
| 85 | + |
| 86 | +The sdist includes the DuckDB submodule source via the `[tool.scikit-build.sdist]` include list in `pyproject.toml`. |
| 87 | + |
| 88 | +### Incremental rebuild shortcuts |
| 89 | + |
| 90 | +After editing C++ source: |
| 91 | + |
| 92 | +```bash |
| 93 | +# Fastest: touch the __init__.py to trigger scikit-build-core's rebuild detection |
| 94 | +touch duckdb/__init__.py && uv sync --no-build-isolation -v --reinstall |
| 95 | +``` |
| 96 | + |
| 97 | +Rebuild only the duckdb package (useful when dependency lock hasn't changed): |
| 98 | + |
| 99 | +```bash |
| 100 | +uv sync --reinstall-package duckdb |
| 101 | +``` |
| 102 | + |
| 103 | +### Clean slate |
| 104 | + |
| 105 | +```bash |
| 106 | +rm -rf build .venv uv.lock && uv cache clean --force |
| 107 | +``` |
| 108 | + |
| 109 | +### Python version selection |
| 110 | + |
| 111 | +Pass `-p <version>` to any `uv sync` command: |
| 112 | + |
| 113 | +```bash |
| 114 | +uv sync --no-build-isolation -v --reinstall -p 3.11 |
| 115 | +uv sync --no-build-isolation -v --reinstall -p 3.14 |
| 116 | +``` |
| 117 | + |
| 118 | +Supported: `3.10`, `3.11`, `3.12`, `3.13`, `3.14`. Do **not** use free-threaded variants (`3.13t`, `3.14t`) — the production client does not support them. |
| 119 | + |
| 120 | +### Build configuration reference |
| 121 | + |
| 122 | +Key `pyproject.toml` settings: |
| 123 | + |
| 124 | +- `BUILD_EXTENSIONS = "core_functions;json;parquet;icu;jemalloc"` — extensions built into the wheel. |
| 125 | +- Editable overrides: `build-dir = "build/debug/"`, `editable.rebuild = true`, `editable.mode = "redirect"`, `cmake.build-type = "Debug"`, `DISABLE_UNITY = "1"` (unity disabled for better debugging). |
| 126 | +- Coverage overrides: `build-dir = "build/coverage/"`, `RelWithDebInfo`, `--coverage` flags. Activate with `COVERAGE=true uv sync ...`. |
| 127 | + |
| 128 | +## Testing |
| 129 | + |
| 130 | +### Test layout |
| 131 | + |
| 132 | +``` |
| 133 | +tests/ |
| 134 | +├── fast/ # quick per-subsystem tests (seconds each) |
| 135 | +│ ├── arrow/ # pyarrow integration |
| 136 | +│ ├── pandas/ # pandas integration |
| 137 | +│ ├── polars/ # polars integration |
| 138 | +│ ├── spark/ # pyspark compatibility |
| 139 | +│ ├── adbc/ # ADBC driver |
| 140 | +│ ├── api/ # relational API |
| 141 | +│ ├── dbapi/ # PEP 249 DB-API 2.0 |
| 142 | +│ ├── capi/ # C API bindings |
| 143 | +│ ├── udf/ # Python UDFs |
| 144 | +│ └── ... |
| 145 | +└── slow/ # heavy integration/performance tests |
| 146 | +``` |
| 147 | + |
| 148 | +### Common test commands |
| 149 | + |
| 150 | +```bash |
| 151 | +# Run one subsystem's fast tests |
| 152 | +uv run pytest tests/fast/arrow/ -v |
| 153 | + |
| 154 | +# Run a specific test by name |
| 155 | +uv run pytest tests/fast/api/test_relation.py -k 'test_value_relation' -v |
| 156 | + |
| 157 | +# Parallel execution (pytest-xdist) |
| 158 | +uv run pytest tests/fast/ -n auto -r skip -vv |
| 159 | + |
| 160 | +# Filter by markers |
| 161 | +uv run pytest tests/fast/ -r skip -vv -m "not (performance and threading)" |
| 162 | + |
| 163 | +# Slow tests (be patient) |
| 164 | +uv run pytest -n 4 -r skip -vv tests/slow/test_h2oai_arrow.py |
| 165 | +``` |
| 166 | + |
| 167 | +### Markers |
| 168 | + |
| 169 | +`capi`, `dbapi`, `threading`, `performance`, `types_map`, `type_union`, and more. See `pyproject.toml` `[tool.pytest.ini_options]` for the canonical list and default addopts (`-ra --verbose`). |
| 170 | + |
| 171 | +### Test dependencies |
| 172 | + |
| 173 | +The test dependency group includes heavy packages with complex platform constraints (tensorflow, torch, pyspark, numpy version splits). Install test deps without building the project: |
| 174 | + |
| 175 | +```bash |
| 176 | +uv sync --only-group test --no-install-project -p 3.13 |
| 177 | +``` |
| 178 | + |
| 179 | +See `pyproject.toml` `[dependency-groups]` for the full dependency matrix and its platform-specific constraints. |
| 180 | + |
| 181 | +## Linting and formatting |
| 182 | + |
| 183 | +```bash |
| 184 | +# Python (ruff — configured in pyproject.toml, 120-char line length, google docstrings) |
| 185 | +uv run ruff check src/ tests/ |
| 186 | +uv run ruff format src/ tests/ |
| 187 | + |
| 188 | +# Type checking (mypy — strict mode, see [tool.mypy] in pyproject.toml) |
| 189 | +uv run mypy |
| 190 | + |
| 191 | +# Pre-commit hooks (configured in .pre-commit-config.yaml) |
| 192 | +uvx pre-commit run --all-files |
| 193 | +``` |
| 194 | + |
| 195 | +## Debugging |
| 196 | + |
| 197 | +### Quick interpreter check |
| 198 | + |
| 199 | +```bash |
| 200 | +.venv/bin/python -c 'import duckdb; print(duckdb.__version__); duckdb.sql("SELECT 42").show()' |
| 201 | +``` |
| 202 | + |
| 203 | +### AddressSanitizer (macOS) |
| 204 | + |
| 205 | +```bash |
| 206 | +DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/17/lib/darwin/libclang_rt.asan_osx_dynamic.dylib \ |
| 207 | + .venv/bin/python repro.py |
| 208 | +``` |
| 209 | + |
| 210 | +### Memory profiling with memray |
| 211 | + |
| 212 | +```bash |
| 213 | +.venv/bin/python -m memray run -o profile.bin repro.py |
| 214 | +.venv/bin/python -m memray flamegraph profile.bin |
| 215 | +open memray-flamegraph-profile.html |
| 216 | +``` |
| 217 | + |
| 218 | +### Python memory debug allocator |
| 219 | + |
| 220 | +```bash |
| 221 | +PYTHONMALLOC=malloc_debug .venv/bin/python repro.py |
| 222 | +``` |
| 223 | + |
| 224 | +### macOS deployment target inspection |
| 225 | + |
| 226 | +```bash |
| 227 | +.venv/bin/python -c "import sysconfig; print(sysconfig.get_config_var('MACOSX_DEPLOYMENT_TARGET'))" |
| 228 | +``` |
| 229 | + |
| 230 | +### Timezone isolation |
| 231 | + |
| 232 | +```bash |
| 233 | +TZ=not/existing .venv/bin/python -c 'import duckdb; ...' |
| 234 | +``` |
| 235 | + |
| 236 | +### Bug investigation repro convention |
| 237 | + |
| 238 | +When investigating a specific issue, write scripts into: |
| 239 | + |
| 240 | +``` |
| 241 | +debugscripts/<issue_number>_repro/repro.py |
| 242 | +debugscripts/<issue_number>_repro/generate_data.py # if synthetic data needed |
| 243 | +``` |
| 244 | + |
| 245 | +This convention keeps repro scripts organized by issue and makes them easy to find, share, or reference in bug reports. |
| 246 | + |
| 247 | +## Project structure |
| 248 | + |
| 249 | +``` |
| 250 | +├── duckdb/ # Python package (pure Python + extension module) |
| 251 | +│ ├── __init__.py |
| 252 | +│ ├── experimental/ # pyspark compatibility layer |
| 253 | +│ ├── filesystem.py # fsspec integration |
| 254 | +│ └── query_graph/ # (old, unmaintained) |
| 255 | +├── adbc_driver_duckdb/ # ADBC driver package |
| 256 | +├── _duckdb-stubs/ # type stubs (*.pyi) |
| 257 | +├── src/ # C++ extension source (pybind11) |
| 258 | +├── external/duckdb/ # DuckDB submodule |
| 259 | +├── duckdb_packaging/ # custom build backend |
| 260 | +├── tests/ # test suite (fast/ + slow/) |
| 261 | +├── scripts/ # maintenance scripts |
| 262 | +├── debugscripts/ # issue repro scripts (convention: <issue>_repro/) |
| 263 | +├── pyproject.toml # build config, deps, linting, CI |
| 264 | +└── CMakeLists.txt # CMake build system |
| 265 | +``` |
| 266 | + |
| 267 | +### DuckDB submodule |
| 268 | + |
| 269 | +The DuckDB engine is included as a git submodule at `external/duckdb/`. After creating a worktree or switching branches: |
| 270 | + |
| 271 | +```bash |
| 272 | +git submodule update --init --recursive |
| 273 | +``` |
| 274 | + |
| 275 | +No `--depth=1` — the build backend needs version detection from git history. |
| 276 | + |
| 277 | +## Development workflows |
| 278 | + |
| 279 | +### Worktrees |
| 280 | + |
| 281 | +This repository supports git worktrees for parallel development. The recommended pattern is to keep worktrees as siblings of the main checkout: |
| 282 | + |
| 283 | +```bash |
| 284 | +# Create a feature branch worktree |
| 285 | +git worktree add -b feature/my-feature ../feature_my_feature v1.5-variegata |
| 286 | +cd ../feature_my_feature |
| 287 | +git submodule update --init --recursive |
| 288 | +uv sync --only-group build --no-install-project -p 3.13 |
| 289 | +uv sync --no-build-isolation -v --reinstall -p 3.13 |
| 290 | +``` |
| 291 | + |
| 292 | +### GitHub CLI patterns |
| 293 | + |
| 294 | +```bash |
| 295 | +# CI status |
| 296 | +gh run list --repo duckdb/duckdb-python --workflow="Packaging" --limit 5 |
| 297 | +gh run list --workflow pypi_packaging.yml --json status,url -s failure |
| 298 | + |
| 299 | +# Download CI artifacts |
| 300 | +gh run download <run-id> -D ./artifacts/ |
| 301 | + |
| 302 | +# PR workflows |
| 303 | +gh pr checkout 167 |
| 304 | +gh pr create -B v1.5-variegata |
| 305 | +``` |
| 306 | + |
| 307 | +## Scope |
| 308 | + |
| 309 | +This file covers the **Python extension layer** — the pybind11 bindings, the build system, the test suite, and the Python packaging. The DuckDB core engine source is included as a submodule at `external/duckdb/` and is compiled from source as part of the build; debugging may require navigating into the submodule's C++ code. |
| 310 | + |
| 311 | +**Free-threaded Python** is not supported in this client. A separate prototype client based on DuckDB's C API exists for free-threading, Stable ABI, and multi-interpreter support. |
0 commit comments