Skip to content

Commit 1f53b51

Browse files
authored
Merge branch 'master' into fix/replace-print-with-warnings-warn
2 parents e88d4b4 + be405d1 commit 1f53b51

5 files changed

Lines changed: 184 additions & 1 deletion

File tree

LINUX_SETUP.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Developer setup (Linux)
2+
3+
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A), and the instructions below.
4+
5+
## 1. Fork and clone this repo
6+
```bash
7+
git clone git@github.com:[username]/malariagen-data-python.git
8+
cd malariagen-data-python
9+
```
10+
11+
## 2. Install Python
12+
```bash
13+
sudo add-apt-repository ppa:deadsnakes/ppa
14+
sudo apt install python3.10 python3.10-venv
15+
```
16+
17+
## 3. Install pipx and poetry
18+
```bash
19+
python3.10 -m pip install --user pipx
20+
python3.10 -m pipx ensurepath
21+
pipx install poetry
22+
```
23+
24+
## 4. Create and activate development environment
25+
```bash
26+
poetry install
27+
poetry shell
28+
```
29+
30+
## 5. Install pre-commit hooks
31+
```bash
32+
pipx install pre-commit
33+
pre-commit install
34+
```
35+
36+
Run pre-commit checks manually:
37+
```bash
38+
pre-commit run --all-files
39+
```
40+
41+
## 6. Run tests
42+
43+
Run fast unit tests using simulated data:
44+
```bash
45+
poetry run pytest -v tests/anoph
46+
```
47+
48+
## 7. Google Cloud authentication (for legacy tests)
49+
50+
To run legacy tests which read data from GCS, you'll need to [request access to MalariaGEN data on GCS](https://malariagen.github.io/vector-data/vobs/vobs-data-access.html).
51+
52+
Once access has been granted, [install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install):
53+
```bash
54+
./install_gcloud.sh
55+
```
56+
57+
Then obtain application-default credentials:
58+
```bash
59+
./google-cloud-sdk/bin/gcloud auth application-default login
60+
```
61+
62+
Once authenticated, run legacy tests:
63+
```bash
64+
poetry run pytest --ignore=tests/anoph -v tests
65+
```
66+
67+
Tests will run slowly the first time, as data will be read from GCS and cached locally in the `gcs_cache` folder.

MACOS_SETUP.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Developer setup (macOS)
2+
3+
The Linux setup guide is available in [LINUX_SETUP.md](LINUX_SETUP.md). If you are on macOS, follow these steps instead.
4+
5+
## 1. Install Miniconda
6+
7+
Download and install Miniconda for macOS from https://docs.conda.io/en/latest/miniconda.html.
8+
Choose the Apple Silicon installer if you have an Apple Silicon Mac, or the Intel installer otherwise. You can check with:
9+
```bash
10+
uname -m
11+
# arm64 = Apple Silicon, x86_64 = Intel
12+
```
13+
14+
After installation, close and reopen your terminal for conda to be available.
15+
16+
## 2. Create a conda environment
17+
18+
The package requires Python `>=3.10, <3.13`. Python 3.13+ is not currently supported.
19+
```bash
20+
conda create -n malariagen python=3.11
21+
conda activate malariagen
22+
```
23+
24+
## 3. Fork and clone this repo
25+
26+
Fork the repository on GitHub, then clone your fork:
27+
```bash
28+
git clone git@github.com:[username]/malariagen-data-python.git
29+
cd malariagen-data-python
30+
pip install -e ".[dev]"
31+
```
32+
33+
## 4. Install pre-commit hooks
34+
```bash
35+
pre-commit install
36+
```
37+
38+
Run pre-commit checks manually:
39+
```bash
40+
pre-commit run --all-files
41+
```
42+
43+
## 5. Run tests
44+
45+
Run fast unit tests using simulated data:
46+
```bash
47+
pytest -v tests/anoph
48+
```
49+
50+
## 6. Google Cloud authentication (for legacy tests)
51+
52+
To run legacy tests which read data from GCS, you'll need to [request access to MalariaGEN data on GCS](https://malariagen.github.io/vector-data/vobs/vobs-data-access.html).
53+
54+
Once access has been granted, install the Google Cloud CLI:
55+
```bash
56+
brew install google-cloud-sdk
57+
```
58+
59+
Then authenticate:
60+
```bash
61+
gcloud auth application-default login
62+
```
63+
64+
This opens a browser — log in with any Google account.
65+
66+
Once authenticated, run legacy tests:
67+
```bash
68+
pytest --ignore=tests/anoph -v tests
69+
```
70+
71+
Tests will run slowly the first time, as data will be read from GCS and cached locally in the `gcs_cache` folder.
72+
73+
## 7. VS Code terminal integration
74+
75+
To use the `code` command from the terminal:
76+
77+
Open VS Code → `Cmd + Shift + P` → type `Shell Command: Install 'code' command in PATH` → press Enter.

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,11 @@ for release notes.
4444

4545
## Developer setup
4646

47-
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A), and the instructions below.
47+
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A).
4848

49+
For detailed setup instructions, see:
50+
- [Linux setup guide](LINUX_SETUP.md)
51+
- [macOS setup guide](MACOS_SETUP.md)
4952
Detailed instructions can be found in the [Contributors guide](https://github.com/malariagen/malariagen-data-python/blob/master/CONTRIBUTING.md).
5053

5154
## AI use policy and guidelines

malariagen_data/util.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -899,6 +899,12 @@ def __init__(
899899
handler = logging.FileHandler(out)
900900
self._handler = handler
901901

902+
# Remove any pre-existing handlers from the singleton logger to prevent
903+
# accumulation (and FileHandler FD leaks) on repeated instantiation.
904+
for existing_handler in logger.handlers[:]:
905+
logger.removeHandler(existing_handler)
906+
existing_handler.close()
907+
902908
# configure handler
903909
if handler is not None:
904910
if debug:

tests/anoph/test_base.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
import io
2+
import logging
3+
14
import numpy as np
25
import pandas as pd
36
import pytest
@@ -8,6 +11,7 @@
811
from malariagen_data import ag3 as _ag3
912
from malariagen_data import adir1 as _adir1
1013
from malariagen_data.anoph.base import AnophelesBase
14+
from malariagen_data.util import LoggingHelper
1115

1216

1317
@pytest.fixture
@@ -258,6 +262,32 @@ def test_lookup_study(fixture, api):
258262
api.lookup_study("foobar")
259263

260264

265+
def test_logging_helper_no_handler_accumulation():
266+
# Regression test: repeated LoggingHelper construction on the same logger
267+
# name must not accumulate handlers (StreamHandler leak, FileHandler FD leak).
268+
logger_name = "test_logging_helper_no_handler_accumulation"
269+
for _ in range(10):
270+
LoggingHelper(name=logger_name, out=io.StringIO())
271+
logger = logging.getLogger(logger_name)
272+
assert (
273+
len(logger.handlers) <= 1
274+
), f"Handler leak: {len(logger.handlers)} handlers after 10 instantiations"
275+
276+
277+
def test_logging_helper_no_duplicate_output():
278+
# Regression test: a message emitted after N instantiations must appear
279+
# exactly once in the output stream.
280+
logger_name = "test_logging_helper_no_duplicate_output"
281+
out = io.StringIO()
282+
for _ in range(5):
283+
helper = LoggingHelper(name=logger_name, out=out)
284+
helper.info("sentinel")
285+
output = out.getvalue()
286+
assert (
287+
output.count("sentinel") == 1
288+
), f"Duplicate log output: 'sentinel' appeared {output.count('sentinel')} times"
289+
290+
261291
def _strip_terms_of_use_from_manifest(manifest_path):
262292
"""Rewrite a manifest TSV file without terms-of-use columns."""
263293
df = pd.read_csv(manifest_path, sep="\t")

0 commit comments

Comments
 (0)