Skip to content

Commit b01febc

Browse files
committed
Merge remote-tracking branch 'upstream/master' into fix/pr817-ci-cleanup
# Conflicts: # malariagen_data/anoph/dipclust.py
2 parents 172dd2f + 2d3d2f9 commit b01febc

26 files changed

Lines changed: 918 additions & 331 deletions

.github/workflows/coverage.yml

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,34 @@
11
name: coverage
22
on:
3-
push:
4-
branches:
5-
- master
6-
pull_request:
7-
branches:
8-
- master
3+
push:
4+
branches:
5+
- master
6+
pull_request:
7+
branches:
8+
- master
99
jobs:
10-
coverage:
11-
strategy:
12-
fail-fast: true
13-
runs-on: ubuntu-latest
14-
steps:
15-
- name: Checkout source
16-
uses: actions/checkout@v4
10+
coverage:
11+
strategy:
12+
fail-fast: true
13+
runs-on: ubuntu-latest
14+
steps:
15+
- name: Checkout source
16+
uses: actions/checkout@v4
1717

18-
- name: Setup python
19-
uses: actions/setup-python@v5
20-
with:
21-
python-version: '3.12'
22-
cache: 'pip'
18+
- name: Setup python
19+
uses: actions/setup-python@v5
20+
with:
21+
python-version: "3.10"
22+
cache: "pip"
2323

24-
- name: Install package
25-
run: pip install .[dev]
24+
- name: Install package
25+
run: pip install .[dev]
2626

27-
- name: Run unit tests with coverage
28-
run: pytest -v tests --ignore tests/integration --cov malariagen_data/anoph --cov-report=xml
27+
- name: Run unit tests with coverage
28+
run: pytest -v tests --ignore tests/integration --cov malariagen_data/anoph --cov-report=xml
2929

30-
- name: Upload coverage report
31-
uses: codecov/codecov-action@v3
32-
with:
33-
files: ./coverage.xml
34-
verbose: true
30+
- name: Upload coverage report
31+
uses: codecov/codecov-action@v3
32+
with:
33+
files: ./coverage.xml
34+
verbose: true

.github/workflows/integration_tests.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,6 @@ on:
33
push:
44
branches:
55
- master
6-
pull_request:
7-
branches:
8-
- master
96
jobs:
107
integration_tests:
118
strategy:

.github/workflows/notebooks.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,6 @@ on:
33
push:
44
branches:
55
- master
6-
pull_request:
7-
branches:
8-
- master
96
jobs:
107
notebooks:
118
strategy:

.github/workflows/tests.yml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,8 @@ jobs:
1111
strategy:
1212
fail-fast: true
1313
matrix:
14-
python-version: ["3.10", "3.11", "3.12"]
15-
numpy-version:
16-
- "numpy==1.26.4" # current colab version
17-
- "numpy~=2.0" # 2.0 series
18-
# numba (0.60.0) does not yet support numpy 2.1, disable this for now
19-
# - "numpy~=2.1" # 2.1 series
14+
python-version: ["3.10"]
15+
numpy-version: ["numpy==1.26.4"]
2016
runs-on: ubuntu-latest
2117

2218
steps:
@@ -32,5 +28,8 @@ jobs:
3228
- name: Install package
3329
run: pip install "${{ matrix.numpy-version }}" .[dev]
3430

31+
- name: Verify NumPy version
32+
run: python -c "import numpy; print('NumPy version:', numpy.__version__)"
33+
3534
- name: Run unit tests
3635
run: pytest -v tests --ignore tests/integration --typeguard-packages=malariagen_data,malariagen_data.anoph

AI-POLICY.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# AI use policy and guidelines
2+
3+
The goal of the MalariaGEN data API is to make access, use, and interpretation of the genomic data collected by our partners as easy and intuitive as possible. Maintainers have limited time and attention to focus on reviews, which means that each review request has to be for code that you can be proud of.
4+
5+
Any tool that can help produce better code and understand better the existing codebase, including AI tools, can be used. The only key questions are: “Is this an improvement?” and “Why is the code better now?”.
6+
7+
NEVER submit an AI-generated PR if you are not able to understand and explain the changes and why they matter. Maintainers WILL close PRs without reviewing them if they feel like they are a waste of time.
8+
9+
## Using AI as a coding assistant
10+
11+
1. Understanding and familiarising yourself with the codebase is key. No matter how good the AI code assistant, it will return useless code if you do not provide a smart and accurate enough prompt.
12+
2. Always check that your changes make sense. LLMs are terrible at saying no to a prompt and will lie and make false claims if they can’t do otherwise. It is particularly true if they lack key information.
13+
3. Each commit should be its own piece of coherent change. LLMs like to do everything at once but digestible change is easier to understand and process.
14+
4. Commenting your code is important, but LLMs really like to listen to themselves talking and will be very verbose. A small comment explaining why you made a choice is better than a paragraph explaining how a loop iterates through a list.
15+
16+
## Using AI for communication
17+
18+
As noted above, maintainers have a limited amount of time to spend on malariaGEN data API maintenance and do not want to waste it going through long, sloppy PR descriptions of simple issue. We strongly prefer clear and concise communication, even if it means we have to ask questions when more details are needed.
19+
20+
You are responsible for your own PRs and comments. Even if you use an LLM to write a PR description or comment, you are expected to read through everything and make sure that it accurately and concisely reflects your opinions, ideas and contributions. If reading your own PRs and comments is too much work for you, it is going to be the same for everyone else.
21+
Here are some concrete guidelines for using AI as part of your communication toolbox.
22+
23+
1. In general, the question that needs answering is why not what. Maintainers can see the files and lines of codes that were modified, what they will want to know is the reasoning behind the choices. Sadly, LLMs are not great at explaining their reasoning so you probably will have to chip in.
24+
2. In the same way, if you are responding to a comment or a review, you will need to justify your choice and explain how you made the decision.
25+
3. Make sure that the description of your work is accurate. Errors can happen but it is fairly obvious when an LLM claims more than it delivers.
26+
4. We are aware that English is not everyone’s first language. The grammar of your communications isn’t as important as the quality of your contribution. Feel free to use AI to improve your writing style but make sure that you still understand the message, that its content is conserved and that it doesn’t turn into an epic poem.
27+
5. Maintainers are more interested in your ideas and thoughts than in the standard answer provided by an LLM. We work with genomic data, and contributors are not expected to be experts in computer science, software engineering, genomics, entomology, … You are allowed not to know or not to be sure and it is miles better to say so than it is to regurgitate an answer that you do not understand.

CONTRIBUTING.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# Contributing to malariagen-data-python
2+
3+
Thanks for your interest in contributing to this project! This guide will help you get started.
4+
5+
## About the project
6+
7+
This package provides Python tools for accessing and analyzing genomic data from [MalariaGEN](https://www.malariagen.net/), a global research network studying the genomic epidemiology of malaria and its vectors. It provides access to data on _Anopheles_ mosquito species and _Plasmodium_ malaria parasites, with functionality for variant analysis, haplotype clustering, population genetics, and visualization.
8+
9+
## Setting up your development environment
10+
11+
### Prerequisites
12+
13+
You'll need:
14+
15+
- Python 3.10.x (CI-tested version)
16+
- [Poetry](https://python-poetry.org/) for dependency management
17+
- [Git](https://git-scm.com/) for version control
18+
19+
### Initial setup
20+
21+
1. **Fork and clone the repository**
22+
23+
Fork the repository on GitHub, then clone your fork:
24+
25+
```bash
26+
git clone git@github.com:[your-username]/malariagen-data-python.git
27+
cd malariagen-data-python
28+
```
29+
30+
2. **Add the upstream remote**
31+
32+
```bash
33+
git remote add upstream https://github.com/malariagen/malariagen-data-python.git
34+
```
35+
36+
3. **Install Poetry** (if not already installed)
37+
38+
```bash
39+
pipx install poetry
40+
```
41+
42+
4. **Install the project and its dependencies**
43+
44+
```bash
45+
poetry install
46+
```
47+
48+
**Recommended**: Use `poetry run` to run commands inside the virtual environment:
49+
50+
```bash
51+
poetry run pytest
52+
poetry run python script.py
53+
```
54+
55+
**Optional**: If you prefer an interactive shell session, install the shell plugin first:
56+
57+
```bash
58+
poetry self add poetry-plugin-shell
59+
```
60+
61+
Then activate the environment with:
62+
63+
```bash
64+
poetry shell
65+
```
66+
67+
After activation, commands run directly inside the virtual environment:
68+
69+
```bash
70+
pytest
71+
python script.py
72+
```
73+
74+
5. **Install pre-commit hooks**
75+
76+
```bash
77+
pipx install pre-commit
78+
pre-commit install
79+
```
80+
81+
Pre-commit hooks will automatically run `ruff` (linter and formatter) on your changes before each commit.
82+
83+
## Development workflow
84+
85+
### Creating a new feature or fix
86+
87+
1. **Sync with upstream**
88+
89+
```bash
90+
git checkout master
91+
git pull upstream master
92+
```
93+
94+
2. **Create a feature branch**
95+
96+
If an issue does not already exist for your change, [create one](https://github.com/malariagen/malariagen-data-python/issues/new) first. Then create a branch using the convention `GH{issue number}-{short description}`:
97+
98+
```bash
99+
git checkout -b GH123-fix-broken-filter
100+
# or
101+
git checkout -b GH456-add-new-analysis
102+
```
103+
104+
3. **Make your changes**
105+
106+
Write your code, add tests, update documentation as needed.
107+
108+
4. **Run tests locally**
109+
110+
Fast unit tests (no external data access):
111+
112+
```bash
113+
poetry run pytest -v tests/anoph
114+
```
115+
116+
All unit tests (requires setting up credentials for legacy tests):
117+
118+
```bash
119+
poetry run pytest -v tests --ignore tests/integration
120+
```
121+
122+
5. **Check code quality**
123+
124+
The pre-commit hooks will run automatically, but you can also run them manually:
125+
126+
```bash
127+
pre-commit run --all-files
128+
```
129+
130+
### Code style
131+
132+
We use `ruff` for both linting and formatting. The configuration is in `pyproject.toml`. Key points:
133+
134+
- Line length: 88 characters (black default)
135+
- Follow PEP 8 conventions
136+
- Use type hints where appropriate
137+
- Write clear docstrings (we use numpydoc format)
138+
139+
The pre-commit hooks will handle most formatting automatically. If you want to run ruff manually:
140+
141+
```bash
142+
ruff check .
143+
ruff format .
144+
```
145+
146+
### Testing
147+
148+
- **Write tests for new functionality**: Add unit tests in the `tests/` directory
149+
- **Test coverage**: Aim to maintain or improve test coverage
150+
- **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`)
151+
- **Integration tests**: Tests requiring GCS data access are slower and run separately
152+
153+
Run type checking with:
154+
155+
```bash
156+
poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph
157+
```
158+
159+
### Documentation
160+
161+
- Update docstrings if you modify public APIs
162+
- Documentation is built using Sphinx with the pydata theme
163+
- API docs are auto-generated from docstrings
164+
- Follow the [numpydoc](https://numpydoc.readthedocs.io/) style guide
165+
166+
## Submitting your contribution
167+
168+
### Before opening a pull request
169+
170+
- [ ] Tests pass locally
171+
- [ ] Pre-commit hooks pass (or run `pre-commit run --all-files`)
172+
- [ ] Code is well-documented
173+
- [ ] Commit messages are clear and descriptive
174+
175+
### Opening a pull request
176+
177+
1. **Push your branch**
178+
179+
```bash
180+
git push origin your-branch-name
181+
```
182+
183+
2. **Create the pull request**
184+
- Go to the [repository on GitHub](https://github.com/malariagen/malariagen-data-python)
185+
- Click "Pull requests" → "New pull request"
186+
- Select your fork and branch
187+
- Write a clear title and description
188+
189+
3. **Pull request description should include:**
190+
- What problem does this solve?
191+
- How does it solve it?
192+
- Any relevant issue numbers (e.g., "Fixes #123")
193+
- Testing done
194+
- Any breaking changes or migration notes
195+
196+
### Review process
197+
198+
- PRs require approval from a project maintainer
199+
- CI tests must pass (pytest on Python 3.10 with NumPy 1.26.4)
200+
- Address review feedback by pushing new commits to your branch
201+
- Once approved, a maintainer will merge your PR
202+
203+
## AI-assisted contributions
204+
205+
We welcome contributions that involve AI tools (like GitHub Copilot, ChatGPT, or similar). If you use AI assistance:
206+
207+
- Review and understand any AI-generated code before submitting
208+
- Ensure the code follows project conventions and passes all tests
209+
- You remain responsible for the quality and correctness of the contribution
210+
- Disclosure of AI usage is optional. Regardless of tools used, contributors remain responsible for the quality and correctness of their submissions.
211+
212+
## Communication
213+
214+
- **Issues**: Use [GitHub Issues](https://github.com/malariagen/malariagen-data-python/issues) for bug reports and feature requests
215+
- **Discussions**: For questions and general discussion, use [GitHub Discussions](https://github.com/malariagen/malariagen-data-python/discussions)
216+
- **Pull requests**: Use PR comments for code review discussions
217+
- **Email**: For data access questions, contact [support@malariagen.net](mailto:support@malariagen.net)
218+
219+
## Finding something to work on
220+
221+
- Look for issues labeled [`good first issue`](https://github.com/malariagen/malariagen-data-python/labels/good%20first%20issue)
222+
- Check for issues labeled [`help wanted`](https://github.com/malariagen/malariagen-data-python/labels/help%20wanted)
223+
- Improve documentation or add examples
224+
- Increase test coverage
225+
226+
## Questions?
227+
228+
If you're unsure about anything, feel free to:
229+
230+
- Open an issue to ask
231+
- Start a discussion on GitHub Discussions
232+
- Ask in your pull request
233+
234+
We appreciate your contributions and will do our best to help you succeed!
235+
236+
## License
237+
238+
By contributing to this project, you agree that your contributions will be licensed under the [MIT License](LICENSE).

0 commit comments

Comments
 (0)