Skip to content

Commit b69d587

Browse files
committed
Add comprehensive CONTRIBUTING.md guide for new contributors
- Add structured contributor onboarding guide - Document development workflow and testing - Clarify contribution and PR process
1 parent 2d63f0d commit b69d587

1 file changed

Lines changed: 215 additions & 0 deletions

File tree

CONTRIBUTING.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# Contributing to malariagen-data-python
2+
3+
Thanks for your interest in contributing to this project! This guide will help you get started.
4+
5+
## About the project
6+
7+
This package provides Python tools for accessing and analyzing genomic data from [MalariaGEN](https://www.malariagen.net/), a global research network studying the genomic epidemiology of malaria and its vectors. The package supports multiple data releases (Ag3, Af1, Amin1, Adir1, Pf7, Pf8, Pv4) and provides functionality for variant analysis, haplotype clustering, population genetics, and visualization.
8+
9+
## Setting up your development environment
10+
11+
### Prerequisites
12+
13+
You'll need:
14+
15+
- Python 3.10.x (CI-tested version)
16+
- [Poetry](https://python-poetry.org/) for dependency management
17+
- [Git](https://git-scm.com/) for version control
18+
19+
### Initial setup
20+
21+
1. **Fork and clone the repository**
22+
23+
Fork the repository on GitHub, then clone your fork:
24+
25+
```bash
26+
git clone git@github.com:[your-username]/malariagen-data-python.git
27+
cd malariagen-data-python
28+
```
29+
30+
2. **Add the upstream remote**
31+
32+
```bash
33+
git remote add upstream https://github.com/malariagen/malariagen-data-python.git
34+
```
35+
36+
3. **Install Poetry** (if not already installed)
37+
38+
```bash
39+
pipx install poetry
40+
```
41+
42+
4. **Create and activate the development environment**
43+
44+
```bash
45+
poetry install
46+
poetry shell
47+
```
48+
49+
5. **Install pre-commit hooks**
50+
51+
```bash
52+
pipx install pre-commit
53+
pre-commit install
54+
```
55+
56+
Pre-commit hooks will automatically run `ruff` (linter and formatter) on your changes before each commit.
57+
58+
## Development workflow
59+
60+
### Creating a new feature or fix
61+
62+
1. **Sync with upstream**
63+
64+
```bash
65+
git checkout master
66+
git pull upstream master
67+
```
68+
69+
2. **Create a feature branch**
70+
71+
Use a descriptive name:
72+
73+
```bash
74+
git checkout -b feature/your-feature-name
75+
# or
76+
git checkout -b fix/issue-description
77+
# or
78+
git checkout -b docs/update-contributing-guide
79+
```
80+
81+
3. **Make your changes**
82+
83+
Write your code, add tests, update documentation as needed.
84+
85+
4. **Run tests locally**
86+
87+
Fast unit tests (no external data access):
88+
89+
```bash
90+
poetry run pytest -v tests/anoph
91+
```
92+
93+
All unit tests (requires setting up credentials for legacy tests):
94+
95+
```bash
96+
poetry run pytest -v tests --ignore tests/integration
97+
```
98+
99+
5. **Check code quality**
100+
101+
The pre-commit hooks will run automatically, but you can also run them manually:
102+
103+
```bash
104+
pre-commit run --all-files
105+
```
106+
107+
### Code style
108+
109+
We use `ruff` for both linting and formatting. The configuration is in `pyproject.toml`. Key points:
110+
111+
- Line length: 88 characters (black default)
112+
- Follow PEP 8 conventions
113+
- Use type hints where appropriate
114+
- Write clear docstrings (we use numpydoc format)
115+
116+
The pre-commit hooks will handle most formatting automatically. If you want to run ruff manually:
117+
118+
```bash
119+
ruff check .
120+
ruff format .
121+
```
122+
123+
### Testing
124+
125+
- **Write tests for new functionality**: Add unit tests in the `tests/` directory
126+
- **Test coverage**: Aim to maintain or improve test coverage
127+
- **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`)
128+
- **Integration tests**: Tests requiring GCS data access are slower and run separately
129+
130+
Run type checking with:
131+
132+
```bash
133+
poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph
134+
```
135+
136+
### Documentation
137+
138+
- Update docstrings if you modify public APIs
139+
- Documentation is built using Sphinx with the pydata theme
140+
- API docs are auto-generated from docstrings
141+
- Follow the [numpydoc](https://numpydoc.readthedocs.io/) style guide
142+
143+
## Submitting your contribution
144+
145+
### Before opening a pull request
146+
147+
- [ ] Tests pass locally
148+
- [ ] Pre-commit hooks pass (or run `pre-commit run --all-files`)
149+
- [ ] Code is well-documented
150+
- [ ] Commit messages are clear and descriptive
151+
152+
### Opening a pull request
153+
154+
1. **Push your branch**
155+
156+
```bash
157+
git push origin your-branch-name
158+
```
159+
160+
2. **Create the pull request**
161+
- Go to the [repository on GitHub](https://github.com/malariagen/malariagen-data-python)
162+
- Click "Pull requests" → "New pull request"
163+
- Select your fork and branch
164+
- Write a clear title and description
165+
166+
3. **Pull request description should include:**
167+
- What problem does this solve?
168+
- How does it solve it?
169+
- Any relevant issue numbers (e.g., "Fixes #123")
170+
- Testing done
171+
- Any breaking changes or migration notes
172+
173+
### Review process
174+
175+
- PRs require approval from a project maintainer
176+
- CI tests must pass (pytest on Python 3.10 with NumPy 1.26.4)
177+
- Address review feedback by pushing new commits to your branch
178+
- Once approved, a maintainer will merge your PR
179+
180+
## AI-assisted contributions
181+
182+
We welcome contributions that involve AI tools (like GitHub Copilot, ChatGPT, or similar). If you use AI assistance:
183+
184+
- Review and understand any AI-generated code before submitting
185+
- Ensure the code follows project conventions and passes all tests
186+
- You remain responsible for the quality and correctness of the contribution
187+
- Disclosure of AI usage is optional. Regardless of tools used, contributors remain responsible for the quality and correctness of their submissions.
188+
189+
## Communication
190+
191+
- **Issues**: Use [GitHub Issues](https://github.com/malariagen/malariagen-data-python/issues) for bug reports and feature requests
192+
- **Discussions**: For questions and general discussion, use [GitHub Discussions](https://github.com/malariagen/malariagen-data-python/discussions)
193+
- **Pull requests**: Use PR comments for code review discussions
194+
- **Email**: For data access questions, contact [support@malariagen.net](mailto:support@malariagen.net)
195+
196+
## Finding something to work on
197+
198+
- Look for issues labeled [`good first issue`](https://github.com/malariagen/malariagen-data-python/labels/good%20first%20issue)
199+
- Check for issues labeled [`help wanted`](https://github.com/malariagen/malariagen-data-python/labels/help%20wanted)
200+
- Improve documentation or add examples
201+
- Increase test coverage
202+
203+
## Questions?
204+
205+
If you're unsure about anything, feel free to:
206+
207+
- Open an issue to ask
208+
- Start a discussion on GitHub Discussions
209+
- Ask in your pull request
210+
211+
We appreciate your contributions and will do our best to help you succeed!
212+
213+
## License
214+
215+
By contributing to this project, you agree that your contributions will be licensed under the [MIT License](LICENSE).

0 commit comments

Comments
 (0)