Skip to content

Commit 5c2a7f7

Browse files
committed
Add CLAUDE.md
1 parent f68a338 commit 5c2a7f7

File tree

1 file changed

+311
-0
lines changed

1 file changed

+311
-0
lines changed

CLAUDE.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code when working with code in this repository.
4+
5+
## Project overview
6+
7+
This is the **production** duckdb-python client — the `duckdb` package on PyPI. It provides Python bindings for [DuckDB](https://duckdb.org), an in-process OLAP database engine, via pybind11 and a custom scikit-build-core build backend.
8+
9+
- **Repository**: https://github.com/duckdb/duckdb-python
10+
- **Package name**: `duckdb`
11+
- **Bindings**: pybind11
12+
- **Build backend**: `duckdb_packaging.build_backend` (custom wrapper around scikit-build-core)
13+
- **Supported Python**: 3.10, 3.11, 3.12, 3.13, 3.14
14+
- **Free-threaded Python**: not supported in this client. A separate prototype client based on DuckDB's C API targets free-threading, Stable ABI, and multi-interpreter support.
15+
16+
## IMPORTANT: build before running anything
17+
18+
**You MUST complete a full build before running tests, scripts, or `uv run` in a fresh worktree or after a clean slate.** `uv run pytest` triggers scikit-build-core's editable rebuild on import, which compiles 2000+ C++ files from scratch — this takes 5–10 minutes and will exceed the Bash tool's default timeout. Do not attempt to run tests, scripts, or `uv run python` until the two-step build below has completed successfully.
19+
20+
```bash
21+
# Step 1: install build deps (~5 seconds)
22+
uv sync --only-group build --no-install-project -p 3.13
23+
24+
# Step 2: build the extension (3–10 min cold, ~30s with sccache, use timeout: 600000)
25+
uv sync --no-build-isolation -v --reinstall -p 3.13
26+
```
27+
28+
After step 2 completes, `uv run pytest`, `uv run python`, and `.venv/bin/python` all work immediately. Subsequent C++ changes trigger fast incremental rebuilds (seconds with sccache), not full cold builds.
29+
30+
## Build system
31+
32+
### Editable install (standard development workflow)
33+
34+
The build uses `--no-build-isolation` for two reasons: (1) it makes incremental rebuilds fast by reusing the target venv, and (2) **it keeps pybind11 headers in the venv so CLion / other C++ IDEs can resolve them** — with build isolation, pybind11 is installed into a temp env that gets destroyed after the build, leaving `compile_commands.json` with stale include paths and CLion unable to navigate the bindings code.
35+
36+
`--no-build-isolation` requires that build dependencies are already installed. **This is a two-step process on a fresh venv:**
37+
38+
**Step 1 — install build deps** (fast, ~5 seconds):
39+
40+
```bash
41+
uv sync --only-group build --no-install-project -p 3.13
42+
```
43+
44+
**Step 2 — build the extension** (3–10 min cold, ~30 seconds with warm sccache):
45+
46+
```bash
47+
uv sync --no-build-isolation -v --reinstall -p 3.13
48+
```
49+
50+
This produces a debug editable install in `build/debug/` with `editable.mode = "redirect"`. Python code changes are picked up immediately; C++ changes require a rebuild.
51+
52+
### sccache (strongly recommended)
53+
54+
sccache caches compiled object files across builds and worktrees. Without it, cold builds take 5–10 minutes (2000+ C++ compilation units). With a warm cache, they take ~30 seconds.
55+
56+
```bash
57+
# Export before any uv sync
58+
export CMAKE_C_COMPILER_LAUNCHER="$(command -v sccache)"
59+
export CMAKE_CXX_COMPILER_LAUNCHER="$(command -v sccache)"
60+
```
61+
62+
Install with `brew install sccache` (macOS). Check cache state with `sccache -s`.
63+
64+
### Non-editable (release-style) install
65+
66+
```bash
67+
uv sync --no-build-isolation --no-editable -v --reinstall -p 3.13
68+
```
69+
70+
Produces a release build (no debug symbols, optimized).
71+
72+
### Wheel build
73+
74+
```bash
75+
uv build --wheel
76+
```
77+
78+
Produces a wheel in `dist/`. Uses cibuildwheel for CI — see `pyproject.toml` `[tool.cibuildwheel]` for the CI wheel matrix (macOS arm64/x86_64, Linux x86_64/aarch64, Windows AMD64/ARM64).
79+
80+
### sdist build
81+
82+
```bash
83+
uv build --sdist
84+
```
85+
86+
The sdist includes the DuckDB submodule source via the `[tool.scikit-build.sdist]` include list in `pyproject.toml`.
87+
88+
### Incremental rebuild shortcuts
89+
90+
After editing C++ source:
91+
92+
```bash
93+
# Fastest: touch the __init__.py to trigger scikit-build-core's rebuild detection
94+
touch duckdb/__init__.py && uv sync --no-build-isolation -v --reinstall
95+
```
96+
97+
Rebuild only the duckdb package (useful when dependency lock hasn't changed):
98+
99+
```bash
100+
uv sync --reinstall-package duckdb
101+
```
102+
103+
### Clean slate
104+
105+
```bash
106+
rm -rf build .venv uv.lock && uv cache clean --force
107+
```
108+
109+
### Python version selection
110+
111+
Pass `-p <version>` to any `uv sync` command:
112+
113+
```bash
114+
uv sync --no-build-isolation -v --reinstall -p 3.11
115+
uv sync --no-build-isolation -v --reinstall -p 3.14
116+
```
117+
118+
Supported: `3.10`, `3.11`, `3.12`, `3.13`, `3.14`. Do **not** use free-threaded variants (`3.13t`, `3.14t`) — the production client does not support them.
119+
120+
### Build configuration reference
121+
122+
Key `pyproject.toml` settings:
123+
124+
- `BUILD_EXTENSIONS = "core_functions;json;parquet;icu;jemalloc"` — extensions built into the wheel.
125+
- Editable overrides: `build-dir = "build/debug/"`, `editable.rebuild = true`, `editable.mode = "redirect"`, `cmake.build-type = "Debug"`, `DISABLE_UNITY = "1"` (unity disabled for better debugging).
126+
- Coverage overrides: `build-dir = "build/coverage/"`, `RelWithDebInfo`, `--coverage` flags. Activate with `COVERAGE=true uv sync ...`.
127+
128+
## Testing
129+
130+
### Test layout
131+
132+
```
133+
tests/
134+
├── fast/ # quick per-subsystem tests (seconds each)
135+
│ ├── arrow/ # pyarrow integration
136+
│ ├── pandas/ # pandas integration
137+
│ ├── polars/ # polars integration
138+
│ ├── spark/ # pyspark compatibility
139+
│ ├── adbc/ # ADBC driver
140+
│ ├── api/ # relational API
141+
│ ├── dbapi/ # PEP 249 DB-API 2.0
142+
│ ├── capi/ # C API bindings
143+
│ ├── udf/ # Python UDFs
144+
│ └── ...
145+
└── slow/ # heavy integration/performance tests
146+
```
147+
148+
### Common test commands
149+
150+
```bash
151+
# Run one subsystem's fast tests
152+
uv run pytest tests/fast/arrow/ -v
153+
154+
# Run a specific test by name
155+
uv run pytest tests/fast/api/test_relation.py -k 'test_value_relation' -v
156+
157+
# Parallel execution (pytest-xdist)
158+
uv run pytest tests/fast/ -n auto -r skip -vv
159+
160+
# Filter by markers
161+
uv run pytest tests/fast/ -r skip -vv -m "not (performance and threading)"
162+
163+
# Slow tests (be patient)
164+
uv run pytest -n 4 -r skip -vv tests/slow/test_h2oai_arrow.py
165+
```
166+
167+
### Markers
168+
169+
`capi`, `dbapi`, `threading`, `performance`, `types_map`, `type_union`, and more. See `pyproject.toml` `[tool.pytest.ini_options]` for the canonical list and default addopts (`-ra --verbose`).
170+
171+
### Test dependencies
172+
173+
The test dependency group includes heavy packages with complex platform constraints (tensorflow, torch, pyspark, numpy version splits). Install test deps without building the project:
174+
175+
```bash
176+
uv sync --only-group test --no-install-project -p 3.13
177+
```
178+
179+
See `pyproject.toml` `[dependency-groups]` for the full dependency matrix and its platform-specific constraints.
180+
181+
## Linting and formatting
182+
183+
```bash
184+
# Python (ruff — configured in pyproject.toml, 120-char line length, google docstrings)
185+
uv run ruff check src/ tests/
186+
uv run ruff format src/ tests/
187+
188+
# Type checking (mypy — strict mode, see [tool.mypy] in pyproject.toml)
189+
uv run mypy
190+
191+
# Pre-commit hooks (configured in .pre-commit-config.yaml)
192+
uvx pre-commit run --all-files
193+
```
194+
195+
## Debugging
196+
197+
### Quick interpreter check
198+
199+
```bash
200+
.venv/bin/python -c 'import duckdb; print(duckdb.__version__); duckdb.sql("SELECT 42").show()'
201+
```
202+
203+
### AddressSanitizer (macOS)
204+
205+
```bash
206+
DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/17/lib/darwin/libclang_rt.asan_osx_dynamic.dylib \
207+
.venv/bin/python repro.py
208+
```
209+
210+
### Memory profiling with memray
211+
212+
```bash
213+
.venv/bin/python -m memray run -o profile.bin repro.py
214+
.venv/bin/python -m memray flamegraph profile.bin
215+
open memray-flamegraph-profile.html
216+
```
217+
218+
### Python memory debug allocator
219+
220+
```bash
221+
PYTHONMALLOC=malloc_debug .venv/bin/python repro.py
222+
```
223+
224+
### macOS deployment target inspection
225+
226+
```bash
227+
.venv/bin/python -c "import sysconfig; print(sysconfig.get_config_var('MACOSX_DEPLOYMENT_TARGET'))"
228+
```
229+
230+
### Timezone isolation
231+
232+
```bash
233+
TZ=not/existing .venv/bin/python -c 'import duckdb; ...'
234+
```
235+
236+
### Bug investigation repro convention
237+
238+
When investigating a specific issue, write scripts into:
239+
240+
```
241+
debugscripts/<issue_number>_repro/repro.py
242+
debugscripts/<issue_number>_repro/generate_data.py # if synthetic data needed
243+
```
244+
245+
This convention keeps repro scripts organized by issue and makes them easy to find, share, or reference in bug reports.
246+
247+
## Project structure
248+
249+
```
250+
├── duckdb/ # Python package (pure Python + extension module)
251+
│ ├── __init__.py
252+
│ ├── experimental/ # pyspark compatibility layer
253+
│ ├── filesystem.py # fsspec integration
254+
│ └── query_graph/ # (old, unmaintained)
255+
├── adbc_driver_duckdb/ # ADBC driver package
256+
├── _duckdb-stubs/ # type stubs (*.pyi)
257+
├── src/ # C++ extension source (pybind11)
258+
├── external/duckdb/ # DuckDB submodule
259+
├── duckdb_packaging/ # custom build backend
260+
├── tests/ # test suite (fast/ + slow/)
261+
├── scripts/ # maintenance scripts
262+
├── debugscripts/ # issue repro scripts (convention: <issue>_repro/)
263+
├── pyproject.toml # build config, deps, linting, CI
264+
└── CMakeLists.txt # CMake build system
265+
```
266+
267+
### DuckDB submodule
268+
269+
The DuckDB engine is included as a git submodule at `external/duckdb/`. After creating a worktree or switching branches:
270+
271+
```bash
272+
git submodule update --init --recursive
273+
```
274+
275+
No `--depth=1` — the build backend needs version detection from git history.
276+
277+
## Development workflows
278+
279+
### Worktrees
280+
281+
This repository supports git worktrees for parallel development. The recommended pattern is to keep worktrees as siblings of the main checkout:
282+
283+
```bash
284+
# Create a feature branch worktree
285+
git worktree add -b feature/my-feature ../feature_my_feature v1.5-variegata
286+
cd ../feature_my_feature
287+
git submodule update --init --recursive
288+
uv sync --only-group build --no-install-project -p 3.13
289+
uv sync --no-build-isolation -v --reinstall -p 3.13
290+
```
291+
292+
### GitHub CLI patterns
293+
294+
```bash
295+
# CI status
296+
gh run list --repo duckdb/duckdb-python --workflow="Packaging" --limit 5
297+
gh run list --workflow pypi_packaging.yml --json status,url -s failure
298+
299+
# Download CI artifacts
300+
gh run download <run-id> -D ./artifacts/
301+
302+
# PR workflows
303+
gh pr checkout 167
304+
gh pr create -B v1.5-variegata
305+
```
306+
307+
## Scope
308+
309+
This file covers the **Python extension layer** — the pybind11 bindings, the build system, the test suite, and the Python packaging. The DuckDB core engine source is included as a submodule at `external/duckdb/` and is compiled from source as part of the build; debugging may require navigating into the submodule's C++ code.
310+
311+
**Free-threaded Python** is not supported in this client. A separate prototype client based on DuckDB's C API exists for free-threading, Stable ABI, and multi-interpreter support.

0 commit comments

Comments
 (0)