Skip to content

Commit aba27a3

Browse files
authored
Merge pull request #860 from adilraza99/docs/add-contributing-md
Add comprehensive CONTRIBUTING.md guide for new contributors
2 parents 2c1964a + 29a37af commit aba27a3

1 file changed

Lines changed: 238 additions & 0 deletions

File tree

CONTRIBUTING.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# Contributing to malariagen-data-python
2+
3+
Thanks for your interest in contributing to this project! This guide will help you get started.
4+
5+
## About the project
6+
7+
This package provides Python tools for accessing and analyzing genomic data from [MalariaGEN](https://www.malariagen.net/), a global research network studying the genomic epidemiology of malaria and its vectors. It provides access to data on _Anopheles_ mosquito species and _Plasmodium_ malaria parasites, with functionality for variant analysis, haplotype clustering, population genetics, and visualization.
8+
9+
## Setting up your development environment
10+
11+
### Prerequisites
12+
13+
You'll need:
14+
15+
- Python 3.10.x (CI-tested version)
16+
- [Poetry](https://python-poetry.org/) for dependency management
17+
- [Git](https://git-scm.com/) for version control
18+
19+
### Initial setup
20+
21+
1. **Fork and clone the repository**
22+
23+
Fork the repository on GitHub, then clone your fork:
24+
25+
```bash
26+
git clone git@github.com:[your-username]/malariagen-data-python.git
27+
cd malariagen-data-python
28+
```
29+
30+
2. **Add the upstream remote**
31+
32+
```bash
33+
git remote add upstream https://github.com/malariagen/malariagen-data-python.git
34+
```
35+
36+
3. **Install Poetry** (if not already installed)
37+
38+
```bash
39+
pipx install poetry
40+
```
41+
42+
4. **Install the project and its dependencies**
43+
44+
```bash
45+
poetry install
46+
```
47+
48+
**Recommended**: Use `poetry run` to run commands inside the virtual environment:
49+
50+
```bash
51+
poetry run pytest
52+
poetry run python script.py
53+
```
54+
55+
**Optional**: If you prefer an interactive shell session, install the shell plugin first:
56+
57+
```bash
58+
poetry self add poetry-plugin-shell
59+
```
60+
61+
Then activate the environment with:
62+
63+
```bash
64+
poetry shell
65+
```
66+
67+
After activation, commands run directly inside the virtual environment:
68+
69+
```bash
70+
pytest
71+
python script.py
72+
```
73+
74+
5. **Install pre-commit hooks**
75+
76+
```bash
77+
pipx install pre-commit
78+
pre-commit install
79+
```
80+
81+
Pre-commit hooks will automatically run `ruff` (linter and formatter) on your changes before each commit.
82+
83+
## Development workflow
84+
85+
### Creating a new feature or fix
86+
87+
1. **Sync with upstream**
88+
89+
```bash
90+
git checkout master
91+
git pull upstream master
92+
```
93+
94+
2. **Create a feature branch**
95+
96+
If an issue does not already exist for your change, [create one](https://github.com/malariagen/malariagen-data-python/issues/new) first. Then create a branch using the convention `GH{issue number}-{short description}`:
97+
98+
```bash
99+
git checkout -b GH123-fix-broken-filter
100+
# or
101+
git checkout -b GH456-add-new-analysis
102+
```
103+
104+
3. **Make your changes**
105+
106+
Write your code, add tests, update documentation as needed.
107+
108+
4. **Run tests locally**
109+
110+
Fast unit tests (no external data access):
111+
112+
```bash
113+
poetry run pytest -v tests/anoph
114+
```
115+
116+
All unit tests (requires setting up credentials for legacy tests):
117+
118+
```bash
119+
poetry run pytest -v tests --ignore tests/integration
120+
```
121+
122+
5. **Check code quality**
123+
124+
The pre-commit hooks will run automatically, but you can also run them manually:
125+
126+
```bash
127+
pre-commit run --all-files
128+
```
129+
130+
### Code style
131+
132+
We use `ruff` for both linting and formatting. The configuration is in `pyproject.toml`. Key points:
133+
134+
- Line length: 88 characters (black default)
135+
- Follow PEP 8 conventions
136+
- Use type hints where appropriate
137+
- Write clear docstrings (we use numpydoc format)
138+
139+
The pre-commit hooks will handle most formatting automatically. If you want to run ruff manually:
140+
141+
```bash
142+
ruff check .
143+
ruff format .
144+
```
145+
146+
### Testing
147+
148+
- **Write tests for new functionality**: Add unit tests in the `tests/` directory
149+
- **Test coverage**: Aim to maintain or improve test coverage
150+
- **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`)
151+
- **Integration tests**: Tests requiring GCS data access are slower and run separately
152+
153+
Run type checking with:
154+
155+
```bash
156+
poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph
157+
```
158+
159+
### Documentation
160+
161+
- Update docstrings if you modify public APIs
162+
- Documentation is built using Sphinx with the pydata theme
163+
- API docs are auto-generated from docstrings
164+
- Follow the [numpydoc](https://numpydoc.readthedocs.io/) style guide
165+
166+
## Submitting your contribution
167+
168+
### Before opening a pull request
169+
170+
- [ ] Tests pass locally
171+
- [ ] Pre-commit hooks pass (or run `pre-commit run --all-files`)
172+
- [ ] Code is well-documented
173+
- [ ] Commit messages are clear and descriptive
174+
175+
### Opening a pull request
176+
177+
1. **Push your branch**
178+
179+
```bash
180+
git push origin your-branch-name
181+
```
182+
183+
2. **Create the pull request**
184+
- Go to the [repository on GitHub](https://github.com/malariagen/malariagen-data-python)
185+
- Click "Pull requests" → "New pull request"
186+
- Select your fork and branch
187+
- Write a clear title and description
188+
189+
3. **Pull request description should include:**
190+
- What problem does this solve?
191+
- How does it solve it?
192+
- Any relevant issue numbers (e.g., "Fixes #123")
193+
- Testing done
194+
- Any breaking changes or migration notes
195+
196+
### Review process
197+
198+
- PRs require approval from a project maintainer
199+
- CI tests must pass (pytest on Python 3.10 with NumPy 1.26.4)
200+
- Address review feedback by pushing new commits to your branch
201+
- Once approved, a maintainer will merge your PR
202+
203+
## AI-assisted contributions
204+
205+
We welcome contributions that involve AI tools (like GitHub Copilot, ChatGPT, or similar). If you use AI assistance:
206+
207+
- Review and understand any AI-generated code before submitting
208+
- Ensure the code follows project conventions and passes all tests
209+
- You remain responsible for the quality and correctness of the contribution
210+
- Disclosure of AI usage is optional. Regardless of tools used, contributors remain responsible for the quality and correctness of their submissions.
211+
212+
## Communication
213+
214+
- **Issues**: Use [GitHub Issues](https://github.com/malariagen/malariagen-data-python/issues) for bug reports and feature requests
215+
- **Discussions**: For questions and general discussion, use [GitHub Discussions](https://github.com/malariagen/malariagen-data-python/discussions)
216+
- **Pull requests**: Use PR comments for code review discussions
217+
- **Email**: For data access questions, contact [support@malariagen.net](mailto:support@malariagen.net)
218+
219+
## Finding something to work on
220+
221+
- Look for issues labeled [`good first issue`](https://github.com/malariagen/malariagen-data-python/labels/good%20first%20issue)
222+
- Check for issues labeled [`help wanted`](https://github.com/malariagen/malariagen-data-python/labels/help%20wanted)
223+
- Improve documentation or add examples
224+
- Increase test coverage
225+
226+
## Questions?
227+
228+
If you're unsure about anything, feel free to:
229+
230+
- Open an issue to ask
231+
- Start a discussion on GitHub Discussions
232+
- Ask in your pull request
233+
234+
We appreciate your contributions and will do our best to help you succeed!
235+
236+
## License
237+
238+
By contributing to this project, you agree that your contributions will be licensed under the [MIT License](LICENSE).

0 commit comments

Comments
 (0)