Skip to content

Commit d2720eb

Browse files
authored
Merge branch 'master' into GH648-fix-h12-calibration-axis-flip
2 parents 24f58b5 + b1ab7fb commit d2720eb

52 files changed

Lines changed: 4225 additions & 1139 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/actions/setup-python/action.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ runs:
1919
shell: bash
2020
run: |
2121
poetry env use ${{ inputs.python-version }}
22-
poetry install --extras dev
22+
poetry install --with dev,test,docs

.github/workflows/tests.yml

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ jobs:
1212
fail-fast: true
1313
matrix:
1414
python-version: ["3.10", "3.11", "3.12"]
15+
numpy-spec:
16+
# Keep this aligned with pyproject.toml: numpy = ">=2.0.2,<2.1"
17+
- "==2.0.2" # locked baseline
18+
- ">=2.0.2,<2.1" # latest allowed in declared range
1519
runs-on: ubuntu-latest
1620

1721
steps:
@@ -23,8 +27,26 @@ jobs:
2327
with:
2428
python-version: ${{ matrix.python-version }}
2529

26-
- name: Verify NumPy version
27-
run: poetry run python -c "import numpy; print('NumPy version:', numpy.__version__)"
30+
- name: Install matrix NumPy version
31+
run: poetry run pip install --upgrade --no-deps "numpy${{ matrix.numpy-spec }}"
32+
33+
- name: Verify NumPy version and spec
34+
env:
35+
NUMPY_SPEC: ${{ matrix.numpy-spec }}
36+
run: |
37+
poetry run python - <<'PY'
38+
import os
39+
import numpy
40+
from packaging.specifiers import SpecifierSet
41+
42+
spec = SpecifierSet(os.environ["NUMPY_SPEC"])
43+
version = numpy.__version__
44+
if version not in spec:
45+
raise RuntimeError(
46+
f"NumPy version {version} does not satisfy matrix spec {spec}"
47+
)
48+
print("NumPy version:", version, "| spec:", spec)
49+
PY
2850
2951
- name: Run unit tests
3052
run: poetry run pytest -v tests --ignore tests/integration --typeguard-packages=malariagen_data,malariagen_data.anoph

CONTRIBUTING.md

Lines changed: 50 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,10 @@ This package provides Python tools for accessing and analyzing genomic data from
1212

1313
You'll need:
1414

15-
- Python 3.10.x (CI-tested version)
16-
- [Poetry](https://python-poetry.org/) for dependency management
17-
- [Git](https://git-scm.com/) for version control
15+
- [pipx](https://pipx.pypa.io/) for installing Python tools
16+
- [git](https://git-scm.com/) for version control
17+
18+
Both of these can be installed using your distribution's package manager or [Homebrew](https://brew.sh/) on Mac.
1819

1920
### Initial setup
2021

@@ -33,18 +34,31 @@ You'll need:
3334
git remote add upstream https://github.com/malariagen/malariagen-data-python.git
3435
```
3536

36-
3. **Install Poetry** (if not already installed)
37+
3. **Install Poetry**
3738

3839
```bash
3940
pipx install poetry
4041
```
4142

42-
4. **Install the project and its dependencies**
43+
4. **Install Python 3.12**
44+
45+
Python 3.12 is tested in the CI-system and is the recommended version to use.
46+
47+
```bash
48+
poetry python install 3.12
49+
```
50+
51+
5. **Install the project and its dependencies**
4352

4453
```bash
45-
poetry install
54+
poetry env use 3.12
55+
poetry install --with dev,test,docs
4656
```
4757

58+
This installs the runtime dependencies along with the `dev`, `test`, and `docs`
59+
[dependency groups](https://python-poetry.org/docs/managing-dependencies/#dependency-groups).
60+
If you only need to run tests, `poetry install --with test` is sufficient.
61+
4862
**Recommended**: Use `poetry run` to run commands inside the virtual environment:
4963

5064
```bash
@@ -71,7 +85,7 @@ You'll need:
7185
python script.py
7286
```
7387

74-
5. **Install pre-commit hooks**
88+
6. **Install pre-commit hooks**
7589

7690
```bash
7791
pipx install pre-commit
@@ -107,16 +121,40 @@ You'll need:
107121

108122
4. **Run tests locally**
109123

110-
Fast unit tests (no external data access):
124+
Fast unit tests using simulated data (no external data access):
111125

112126
```bash
113-
poetry run pytest -v tests/anoph
127+
poetry run pytest -v tests --ignore tests/integration
114128
```
115129

116-
All unit tests (requires setting up credentials for legacy tests):
130+
To run integration tests which read data from GCS, you'll need to [request access to MalariaGEN data on GCS](https://malariagen.github.io/vector-data/vobs/vobs-data-access.html).
131+
132+
Once access has been granted, [install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install). E.g., if on Linux:
117133

118134
```bash
119-
poetry run pytest -v tests --ignore tests/integration
135+
./install_gcloud.sh
136+
```
137+
138+
You'll then need to obtain application-default credentials, e.g.:
139+
140+
```bash
141+
./google-cloud-sdk/bin/gcloud auth application-default login
142+
```
143+
144+
Once this is done, you can run integration tests:
145+
146+
```bash
147+
poetry run pytest -v tests/integration
148+
```
149+
150+
Tests will run slowly the first time, as data required for testing will be read from GCS. Subsequent runs will be faster as data will be cached locally in the "gcs_cache" folder.
151+
152+
6. **Run typechecking**
153+
154+
Run static typechecking with mypy:
155+
156+
```bash
157+
poetry run mypy malariagen_data tests --ignore-missing-imports
120158
```
121159

122160
5. **Check code quality**
@@ -150,7 +188,7 @@ ruff format .
150188
- **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`)
151189
- **Integration tests**: Tests requiring GCS data access are slower and run separately
152190

153-
Run type checking with:
191+
Run dynamic type checking with:
154192

155193
```bash
156194
poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph

LINUX_SETUP.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Developer setup (Linux)
2+
3+
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A), and the instructions below.
4+
5+
## 1. Fork and clone this repo
6+
```bash
7+
git clone git@github.com:[username]/malariagen-data-python.git
8+
cd malariagen-data-python
9+
```
10+
11+
## 2. Install Python
12+
```bash
13+
sudo add-apt-repository ppa:deadsnakes/ppa
14+
sudo apt install python3.10 python3.10-venv
15+
```
16+
17+
## 3. Install pipx and poetry
18+
```bash
19+
python3.10 -m pip install --user pipx
20+
python3.10 -m pipx ensurepath
21+
pipx install poetry
22+
```
23+
24+
## 4. Create and activate development environment
25+
```bash
26+
poetry install
27+
poetry shell
28+
```
29+
30+
## 5. Install pre-commit hooks
31+
```bash
32+
pipx install pre-commit
33+
pre-commit install
34+
```
35+
36+
Run pre-commit checks manually:
37+
```bash
38+
pre-commit run --all-files
39+
```
40+
41+
## 6. Run tests
42+
43+
Run fast unit tests using simulated data:
44+
```bash
45+
poetry run pytest -v tests/anoph
46+
```
47+
48+
## 7. Google Cloud authentication (for legacy tests)
49+
50+
To run legacy tests which read data from GCS, you'll need to [request access to MalariaGEN data on GCS](https://malariagen.github.io/vector-data/vobs/vobs-data-access.html).
51+
52+
Once access has been granted, [install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install):
53+
```bash
54+
./install_gcloud.sh
55+
```
56+
57+
Then obtain application-default credentials:
58+
```bash
59+
./google-cloud-sdk/bin/gcloud auth application-default login
60+
```
61+
62+
Once authenticated, run legacy tests:
63+
```bash
64+
poetry run pytest --ignore=tests/anoph -v tests
65+
```
66+
67+
Tests will run slowly the first time, as data will be read from GCS and cached locally in the `gcs_cache` folder.

MACOS_SETUP.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Developer setup (macOS)
2+
3+
The Linux setup guide is available in [LINUX_SETUP.md](LINUX_SETUP.md). If you are on macOS, follow these steps instead.
4+
5+
## 1. Install Miniconda
6+
7+
Download and install Miniconda for macOS from https://docs.conda.io/en/latest/miniconda.html.
8+
Choose the Apple Silicon installer if you have an Apple Silicon Mac, or the Intel installer otherwise. You can check with:
9+
```bash
10+
uname -m
11+
# arm64 = Apple Silicon, x86_64 = Intel
12+
```
13+
14+
After installation, close and reopen your terminal for conda to be available.
15+
16+
## 2. Create a conda environment
17+
18+
The package requires Python `>=3.10, <3.13`. Python 3.13+ is not currently supported.
19+
```bash
20+
conda create -n malariagen python=3.11
21+
conda activate malariagen
22+
```
23+
24+
## 3. Fork and clone this repo
25+
26+
Fork the repository on GitHub, then clone your fork:
27+
```bash
28+
git clone git@github.com:[username]/malariagen-data-python.git
29+
cd malariagen-data-python
30+
pip install -e ".[dev]"
31+
```
32+
33+
## 4. Install pre-commit hooks
34+
```bash
35+
pre-commit install
36+
```
37+
38+
Run pre-commit checks manually:
39+
```bash
40+
pre-commit run --all-files
41+
```
42+
43+
## 5. Run tests
44+
45+
Run fast unit tests using simulated data:
46+
```bash
47+
pytest -v tests/anoph
48+
```
49+
50+
## 6. Google Cloud authentication (for legacy tests)
51+
52+
To run legacy tests which read data from GCS, you'll need to [request access to MalariaGEN data on GCS](https://malariagen.github.io/vector-data/vobs/vobs-data-access.html).
53+
54+
Once access has been granted, install the Google Cloud CLI:
55+
```bash
56+
brew install google-cloud-sdk
57+
```
58+
59+
Then authenticate:
60+
```bash
61+
gcloud auth application-default login
62+
```
63+
64+
This opens a browser — log in with any Google account.
65+
66+
Once authenticated, run legacy tests:
67+
```bash
68+
pytest --ignore=tests/anoph -v tests
69+
```
70+
71+
Tests will run slowly the first time, as data will be read from GCS and cached locally in the `gcs_cache` folder.
72+
73+
## 7. VS Code terminal integration
74+
75+
To use the `code` command from the terminal:
76+
77+
Open VS Code → `Cmd + Shift + P` → type `Shell Command: Install 'code' command in PATH` → press Enter.

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,11 @@ for release notes.
4444

4545
## Developer setup
4646

47-
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A), and the instructions below.
47+
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A).
4848

49+
For detailed setup instructions, see:
50+
- [Linux setup guide](LINUX_SETUP.md)
51+
- [macOS setup guide](MACOS_SETUP.md)
4952
Detailed instructions can be found in the [Contributors guide](https://github.com/malariagen/malariagen-data-python/blob/master/CONTRIBUTING.md).
5053

5154
## AI use policy and guidelines

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ Some data from MalariaGEN are subject to **terms of use** which may include an e
9292
public communication of any analysis results without permission from data owners. If you
9393
have any questions about terms of use please email support@malariagen.net.
9494

95-
By default, this sofware package accesses data directly from the **MalariaGEN cloud data repository**
95+
By default, this software package accesses data directly from the **MalariaGEN cloud data repository**
9696
hosted in Google Cloud Storage in the US. Note that data access will be more efficient if your
9797
computations are also run within the same region. Google Colab provides a convenient and free
9898
service which you can use to explore data and run computations. If you have any questions about

malariagen_data/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# flake8: noqa
2+
from .adar1 import Adar1
23
from .adir1 import Adir1
34
from .af1 import Af1
45
from .ag3 import Ag3

0 commit comments

Comments
 (0)