|
| 1 | +# Contributing to malariagen-data-python |
| 2 | + |
| 3 | +Thanks for your interest in contributing to this project! This guide will help you get started. |
| 4 | + |
| 5 | +## About the project |
| 6 | + |
| 7 | +This package provides Python tools for accessing and analyzing genomic data from [MalariaGEN](https://www.malariagen.net/), a global research network studying the genomic epidemiology of malaria and its vectors. It provides access to data on _Anopheles_ mosquito species and _Plasmodium_ malaria parasites, with functionality for variant analysis, haplotype clustering, population genetics, and visualization. |
| 8 | + |
| 9 | +## Setting up your development environment |
| 10 | + |
| 11 | +### Prerequisites |
| 12 | + |
| 13 | +You'll need: |
| 14 | + |
| 15 | +- Python 3.10.x (CI-tested version) |
| 16 | +- [Poetry](https://python-poetry.org/) for dependency management |
| 17 | +- [Git](https://git-scm.com/) for version control |
| 18 | + |
| 19 | +### Initial setup |
| 20 | + |
| 21 | +1. **Fork and clone the repository** |
| 22 | + |
| 23 | + Fork the repository on GitHub, then clone your fork: |
| 24 | + |
| 25 | + ```bash |
| 26 | + git clone git@github.com:[your-username]/malariagen-data-python.git |
| 27 | + cd malariagen-data-python |
| 28 | + ``` |
| 29 | + |
| 30 | +2. **Add the upstream remote** |
| 31 | + |
| 32 | + ```bash |
| 33 | + git remote add upstream https://github.com/malariagen/malariagen-data-python.git |
| 34 | + ``` |
| 35 | + |
| 36 | +3. **Install Poetry** (if not already installed) |
| 37 | + |
| 38 | + ```bash |
| 39 | + pipx install poetry |
| 40 | + ``` |
| 41 | + |
| 42 | +4. **Install the project and its dependencies** |
| 43 | + |
| 44 | + ```bash |
| 45 | + poetry install |
| 46 | + ``` |
| 47 | + |
| 48 | + **Recommended**: Use `poetry run` to run commands inside the virtual environment: |
| 49 | + |
| 50 | + ```bash |
| 51 | + poetry run pytest |
| 52 | + poetry run python script.py |
| 53 | + ``` |
| 54 | + |
| 55 | + **Optional**: If you prefer an interactive shell session, install the shell plugin first: |
| 56 | + |
| 57 | + ```bash |
| 58 | + poetry self add poetry-plugin-shell |
| 59 | + ``` |
| 60 | + |
| 61 | + Then activate the environment with: |
| 62 | + |
| 63 | + ```bash |
| 64 | + poetry shell |
| 65 | + ``` |
| 66 | + |
| 67 | + After activation, commands run directly inside the virtual environment: |
| 68 | + |
| 69 | + ```bash |
| 70 | + pytest |
| 71 | + python script.py |
| 72 | + ``` |
| 73 | + |
| 74 | +5. **Install pre-commit hooks** |
| 75 | + |
| 76 | + ```bash |
| 77 | + pipx install pre-commit |
| 78 | + pre-commit install |
| 79 | + ``` |
| 80 | + |
| 81 | + Pre-commit hooks will automatically run `ruff` (linter and formatter) on your changes before each commit. |
| 82 | + |
| 83 | +## Development workflow |
| 84 | + |
| 85 | +### Creating a new feature or fix |
| 86 | + |
| 87 | +1. **Sync with upstream** |
| 88 | + |
| 89 | + ```bash |
| 90 | + git checkout master |
| 91 | + git pull upstream master |
| 92 | + ``` |
| 93 | + |
| 94 | +2. **Create a feature branch** |
| 95 | + |
| 96 | + If an issue does not already exist for your change, [create one](https://github.com/malariagen/malariagen-data-python/issues/new) first. Then create a branch using the convention `GH{issue number}-{short description}`: |
| 97 | + |
| 98 | + ```bash |
| 99 | + git checkout -b GH123-fix-broken-filter |
| 100 | + # or |
| 101 | + git checkout -b GH456-add-new-analysis |
| 102 | + ``` |
| 103 | + |
| 104 | +3. **Make your changes** |
| 105 | + |
| 106 | + Write your code, add tests, update documentation as needed. |
| 107 | + |
| 108 | +4. **Run tests locally** |
| 109 | + |
| 110 | + Fast unit tests (no external data access): |
| 111 | + |
| 112 | + ```bash |
| 113 | + poetry run pytest -v tests/anoph |
| 114 | + ``` |
| 115 | + |
| 116 | + All unit tests (requires setting up credentials for legacy tests): |
| 117 | + |
| 118 | + ```bash |
| 119 | + poetry run pytest -v tests --ignore tests/integration |
| 120 | + ``` |
| 121 | + |
| 122 | +5. **Check code quality** |
| 123 | + |
| 124 | + The pre-commit hooks will run automatically, but you can also run them manually: |
| 125 | + |
| 126 | + ```bash |
| 127 | + pre-commit run --all-files |
| 128 | + ``` |
| 129 | + |
| 130 | +### Code style |
| 131 | + |
| 132 | +We use `ruff` for both linting and formatting. The configuration is in `pyproject.toml`. Key points: |
| 133 | + |
| 134 | +- Line length: 88 characters (black default) |
| 135 | +- Follow PEP 8 conventions |
| 136 | +- Use type hints where appropriate |
| 137 | +- Write clear docstrings (we use numpydoc format) |
| 138 | + |
| 139 | +The pre-commit hooks will handle most formatting automatically. If you want to run ruff manually: |
| 140 | + |
| 141 | +```bash |
| 142 | +ruff check . |
| 143 | +ruff format . |
| 144 | +``` |
| 145 | + |
| 146 | +### Testing |
| 147 | + |
| 148 | +- **Write tests for new functionality**: Add unit tests in the `tests/` directory |
| 149 | +- **Test coverage**: Aim to maintain or improve test coverage |
| 150 | +- **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`) |
| 151 | +- **Integration tests**: Tests requiring GCS data access are slower and run separately |
| 152 | + |
| 153 | +Run type checking with: |
| 154 | + |
| 155 | +```bash |
| 156 | +poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph |
| 157 | +``` |
| 158 | + |
| 159 | +### Documentation |
| 160 | + |
| 161 | +- Update docstrings if you modify public APIs |
| 162 | +- Documentation is built using Sphinx with the pydata theme |
| 163 | +- API docs are auto-generated from docstrings |
| 164 | +- Follow the [numpydoc](https://numpydoc.readthedocs.io/) style guide |
| 165 | + |
| 166 | +## Submitting your contribution |
| 167 | + |
| 168 | +### Before opening a pull request |
| 169 | + |
| 170 | +- [ ] Tests pass locally |
| 171 | +- [ ] Pre-commit hooks pass (or run `pre-commit run --all-files`) |
| 172 | +- [ ] Code is well-documented |
| 173 | +- [ ] Commit messages are clear and descriptive |
| 174 | + |
| 175 | +### Opening a pull request |
| 176 | + |
| 177 | +1. **Push your branch** |
| 178 | + |
| 179 | + ```bash |
| 180 | + git push origin your-branch-name |
| 181 | + ``` |
| 182 | + |
| 183 | +2. **Create the pull request** |
| 184 | + - Go to the [repository on GitHub](https://github.com/malariagen/malariagen-data-python) |
| 185 | + - Click "Pull requests" → "New pull request" |
| 186 | + - Select your fork and branch |
| 187 | + - Write a clear title and description |
| 188 | + |
| 189 | +3. **Pull request description should include:** |
| 190 | + - What problem does this solve? |
| 191 | + - How does it solve it? |
| 192 | + - Any relevant issue numbers (e.g., "Fixes #123") |
| 193 | + - Testing done |
| 194 | + - Any breaking changes or migration notes |
| 195 | + |
| 196 | +### Review process |
| 197 | + |
| 198 | +- PRs require approval from a project maintainer |
| 199 | +- CI tests must pass (pytest on Python 3.10 with NumPy 1.26.4) |
| 200 | +- Address review feedback by pushing new commits to your branch |
| 201 | +- Once approved, a maintainer will merge your PR |
| 202 | + |
| 203 | +## AI-assisted contributions |
| 204 | + |
| 205 | +We welcome contributions that involve AI tools (like GitHub Copilot, ChatGPT, or similar). If you use AI assistance: |
| 206 | + |
| 207 | +- Review and understand any AI-generated code before submitting |
| 208 | +- Ensure the code follows project conventions and passes all tests |
| 209 | +- You remain responsible for the quality and correctness of the contribution |
| 210 | +- Disclosure of AI usage is optional. Regardless of tools used, contributors remain responsible for the quality and correctness of their submissions. |
| 211 | + |
| 212 | +## Communication |
| 213 | + |
| 214 | +- **Issues**: Use [GitHub Issues](https://github.com/malariagen/malariagen-data-python/issues) for bug reports and feature requests |
| 215 | +- **Discussions**: For questions and general discussion, use [GitHub Discussions](https://github.com/malariagen/malariagen-data-python/discussions) |
| 216 | +- **Pull requests**: Use PR comments for code review discussions |
| 217 | +- **Email**: For data access questions, contact [support@malariagen.net](mailto:support@malariagen.net) |
| 218 | + |
| 219 | +## Finding something to work on |
| 220 | + |
| 221 | +- Look for issues labeled [`good first issue`](https://github.com/malariagen/malariagen-data-python/labels/good%20first%20issue) |
| 222 | +- Check for issues labeled [`help wanted`](https://github.com/malariagen/malariagen-data-python/labels/help%20wanted) |
| 223 | +- Improve documentation or add examples |
| 224 | +- Increase test coverage |
| 225 | + |
| 226 | +## Questions? |
| 227 | + |
| 228 | +If you're unsure about anything, feel free to: |
| 229 | + |
| 230 | +- Open an issue to ask |
| 231 | +- Start a discussion on GitHub Discussions |
| 232 | +- Ask in your pull request |
| 233 | + |
| 234 | +We appreciate your contributions and will do our best to help you succeed! |
| 235 | + |
| 236 | +## License |
| 237 | + |
| 238 | +By contributing to this project, you agree that your contributions will be licensed under the [MIT License](LICENSE). |
0 commit comments