Skip to content

Commit 4ee0c9c

Browse files
authored
Merge branch 'master' into GH856-AI-policy-document
2 parents 9f79668 + 71c0b23 commit 4ee0c9c

7 files changed

Lines changed: 260 additions & 11 deletions

File tree

CONTRIBUTING.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# Contributing to malariagen-data-python
2+
3+
Thanks for your interest in contributing to this project! This guide will help you get started.
4+
5+
## About the project
6+
7+
This package provides Python tools for accessing and analyzing genomic data from [MalariaGEN](https://www.malariagen.net/), a global research network studying the genomic epidemiology of malaria and its vectors. It provides access to data on _Anopheles_ mosquito species and _Plasmodium_ malaria parasites, with functionality for variant analysis, haplotype clustering, population genetics, and visualization.
8+
9+
## Setting up your development environment
10+
11+
### Prerequisites
12+
13+
You'll need:
14+
15+
- Python 3.10.x (CI-tested version)
16+
- [Poetry](https://python-poetry.org/) for dependency management
17+
- [Git](https://git-scm.com/) for version control
18+
19+
### Initial setup
20+
21+
1. **Fork and clone the repository**
22+
23+
Fork the repository on GitHub, then clone your fork:
24+
25+
```bash
26+
git clone git@github.com:[your-username]/malariagen-data-python.git
27+
cd malariagen-data-python
28+
```
29+
30+
2. **Add the upstream remote**
31+
32+
```bash
33+
git remote add upstream https://github.com/malariagen/malariagen-data-python.git
34+
```
35+
36+
3. **Install Poetry** (if not already installed)
37+
38+
```bash
39+
pipx install poetry
40+
```
41+
42+
4. **Install the project and its dependencies**
43+
44+
```bash
45+
poetry install
46+
```
47+
48+
**Recommended**: Use `poetry run` to run commands inside the virtual environment:
49+
50+
```bash
51+
poetry run pytest
52+
poetry run python script.py
53+
```
54+
55+
**Optional**: If you prefer an interactive shell session, install the shell plugin first:
56+
57+
```bash
58+
poetry self add poetry-plugin-shell
59+
```
60+
61+
Then activate the environment with:
62+
63+
```bash
64+
poetry shell
65+
```
66+
67+
After activation, commands run directly inside the virtual environment:
68+
69+
```bash
70+
pytest
71+
python script.py
72+
```
73+
74+
5. **Install pre-commit hooks**
75+
76+
```bash
77+
pipx install pre-commit
78+
pre-commit install
79+
```
80+
81+
Pre-commit hooks will automatically run `ruff` (linter and formatter) on your changes before each commit.
82+
83+
## Development workflow
84+
85+
### Creating a new feature or fix
86+
87+
1. **Sync with upstream**
88+
89+
```bash
90+
git checkout master
91+
git pull upstream master
92+
```
93+
94+
2. **Create a feature branch**
95+
96+
If an issue does not already exist for your change, [create one](https://github.com/malariagen/malariagen-data-python/issues/new) first. Then create a branch using the convention `GH{issue number}-{short description}`:
97+
98+
```bash
99+
git checkout -b GH123-fix-broken-filter
100+
# or
101+
git checkout -b GH456-add-new-analysis
102+
```
103+
104+
3. **Make your changes**
105+
106+
Write your code, add tests, update documentation as needed.
107+
108+
4. **Run tests locally**
109+
110+
Fast unit tests (no external data access):
111+
112+
```bash
113+
poetry run pytest -v tests/anoph
114+
```
115+
116+
All unit tests (requires setting up credentials for legacy tests):
117+
118+
```bash
119+
poetry run pytest -v tests --ignore tests/integration
120+
```
121+
122+
5. **Check code quality**
123+
124+
The pre-commit hooks will run automatically, but you can also run them manually:
125+
126+
```bash
127+
pre-commit run --all-files
128+
```
129+
130+
### Code style
131+
132+
We use `ruff` for both linting and formatting. The configuration is in `pyproject.toml`. Key points:
133+
134+
- Line length: 88 characters (black default)
135+
- Follow PEP 8 conventions
136+
- Use type hints where appropriate
137+
- Write clear docstrings (we use numpydoc format)
138+
139+
The pre-commit hooks will handle most formatting automatically. If you want to run ruff manually:
140+
141+
```bash
142+
ruff check .
143+
ruff format .
144+
```
145+
146+
### Testing
147+
148+
- **Write tests for new functionality**: Add unit tests in the `tests/` directory
149+
- **Test coverage**: Aim to maintain or improve test coverage
150+
- **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`)
151+
- **Integration tests**: Tests requiring GCS data access are slower and run separately
152+
153+
Run type checking with:
154+
155+
```bash
156+
poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph
157+
```
158+
159+
### Documentation
160+
161+
- Update docstrings if you modify public APIs
162+
- Documentation is built using Sphinx with the pydata theme
163+
- API docs are auto-generated from docstrings
164+
- Follow the [numpydoc](https://numpydoc.readthedocs.io/) style guide
165+
166+
## Submitting your contribution
167+
168+
### Before opening a pull request
169+
170+
- [ ] Tests pass locally
171+
- [ ] Pre-commit hooks pass (or run `pre-commit run --all-files`)
172+
- [ ] Code is well-documented
173+
- [ ] Commit messages are clear and descriptive
174+
175+
### Opening a pull request
176+
177+
1. **Push your branch**
178+
179+
```bash
180+
git push origin your-branch-name
181+
```
182+
183+
2. **Create the pull request**
184+
- Go to the [repository on GitHub](https://github.com/malariagen/malariagen-data-python)
185+
- Click "Pull requests" → "New pull request"
186+
- Select your fork and branch
187+
- Write a clear title and description
188+
189+
3. **Pull request description should include:**
190+
- What problem does this solve?
191+
- How does it solve it?
192+
- Any relevant issue numbers (e.g., "Fixes #123")
193+
- Testing done
194+
- Any breaking changes or migration notes
195+
196+
### Review process
197+
198+
- PRs require approval from a project maintainer
199+
- CI tests must pass (pytest on Python 3.10 with NumPy 1.26.4)
200+
- Address review feedback by pushing new commits to your branch
201+
- Once approved, a maintainer will merge your PR
202+
203+
## AI-assisted contributions
204+
205+
We welcome contributions that involve AI tools (like GitHub Copilot, ChatGPT, or similar). If you use AI assistance:
206+
207+
- Review and understand any AI-generated code before submitting
208+
- Ensure the code follows project conventions and passes all tests
209+
- You remain responsible for the quality and correctness of the contribution
210+
- Disclosure of AI usage is optional. Regardless of tools used, contributors remain responsible for the quality and correctness of their submissions.
211+
212+
## Communication
213+
214+
- **Issues**: Use [GitHub Issues](https://github.com/malariagen/malariagen-data-python/issues) for bug reports and feature requests
215+
- **Discussions**: For questions and general discussion, use [GitHub Discussions](https://github.com/malariagen/malariagen-data-python/discussions)
216+
- **Pull requests**: Use PR comments for code review discussions
217+
- **Email**: For data access questions, contact [support@malariagen.net](mailto:support@malariagen.net)
218+
219+
## Finding something to work on
220+
221+
- Look for issues labeled [`good first issue`](https://github.com/malariagen/malariagen-data-python/labels/good%20first%20issue)
222+
- Check for issues labeled [`help wanted`](https://github.com/malariagen/malariagen-data-python/labels/help%20wanted)
223+
- Improve documentation or add examples
224+
- Increase test coverage
225+
226+
## Questions?
227+
228+
If you're unsure about anything, feel free to:
229+
230+
- Open an issue to ask
231+
- Start a discussion on GitHub Discussions
232+
- Ask in your pull request
233+
234+
We appreciate your contributions and will do our best to help you succeed!
235+
236+
## License
237+
238+
By contributing to this project, you agree that your contributions will be licensed under the [MIT License](LICENSE).

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ modifying the `docs/source/_static/switcher.json` file accordingly.
6565

6666
If you use the `malariagen_data` package in a publication
6767
or include any of its functions or code in other materials (_e.g._ training resources),
68-
please cite: [doi.org/10.5281/zenodo.11173411](doi.org/10.5281/zenodo.11173411)
68+
please cite: [doi.org/10.5281/zenodo.11173411](https://doi.org/10.5281/zenodo.11173411)
6969

7070
Some functions may require additional citations to acknowledge specific contributions. These are indicated in the description for each relevant function.
7171

malariagen_data/anoph/cnv_data.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -886,11 +886,13 @@ def plot_cnv_hmm_heatmap_track(
886886
width: gplt_params.width = gplt_params.width_default,
887887
row_height: gplt_params.row_height = 7,
888888
height: Optional[gplt_params.height] = None,
889-
palette: Optional[gplt_params.colors] = cnv_params.colorscale_default,
889+
palette: Optional[gplt_params.colors] = None,
890890
show: gplt_params.show = True,
891891
output_backend: gplt_params.output_backend = gplt_params.output_backend_default,
892892
) -> gplt_params.optional_figure:
893893
debug = self._log.debug
894+
if palette is None:
895+
palette = cnv_params.colorscale_default
894896

895897
import bokeh.models as bkmod
896898
import bokeh.plotting as bkplt
@@ -1028,13 +1030,15 @@ def plot_cnv_hmm_heatmap(
10281030
width: gplt_params.width = gplt_params.width_default,
10291031
row_height: gplt_params.row_height = 7,
10301032
track_height: Optional[gplt_params.track_height] = None,
1031-
palette: Optional[gplt_params.colors] = cnv_params.colorscale_default,
1033+
palette: Optional[gplt_params.colors] = None,
10321034
genes_height: gplt_params.genes_height = gplt_params.genes_height_default,
10331035
show: gplt_params.show = True,
10341036
gene_labels: Optional[gplt_params.gene_labels] = None,
10351037
gene_labelset: Optional[gplt_params.gene_labelset] = None,
10361038
) -> gplt_params.optional_figure:
10371039
debug = self._log.debug
1040+
if palette is None:
1041+
palette = cnv_params.colorscale_default
10381042

10391043
import bokeh.layouts as bklay
10401044
import bokeh.plotting as bkplt

malariagen_data/anoph/dipclust.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -622,7 +622,7 @@ def plot_diplotype_clustering_advanced(
622622
snp_filter_min_maf: float = 0.05,
623623
snp_query: Optional[base_params.snp_query] = AA_CHANGE_QUERY,
624624
cnv_region: Optional[base_params.regions] = None,
625-
cnv_colorscale: plotly_params.color_continuous_scale = cnv_params.colorscale_default,
625+
cnv_colorscale: plotly_params.color_continuous_scale = None,
626626
cnv_max_coverage_variance: cnv_params.max_coverage_variance = 0.2,
627627
site_mask: Optional[base_params.site_mask] = None,
628628
sample_sets: Optional[base_params.sample_sets] = None,
@@ -657,6 +657,8 @@ def plot_diplotype_clustering_advanced(
657657
chunks: base_params.chunks = base_params.native_chunks,
658658
inline_array: base_params.inline_array = base_params.inline_array_default,
659659
):
660+
if cnv_colorscale is None:
661+
cnv_colorscale = cnv_params.colorscale_default
660662
if cohort_size and snp_transcript:
661663
cohort_size = None
662664
print(

malariagen_data/anoph/genome_features.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -552,5 +552,5 @@ def _transcript_to_parent_name(self, transcript):
552552
return self._gene_name_overrides[parent_id]
553553
except KeyError:
554554
rec_parent = df_genome_features.loc[parent_id]
555-
# Try to access "Name" attribute, fall back to "ID" if not present.
556-
return rec_parent.get("Name", parent_id)
555+
# Try to access gene name attribute, fall back to "ID" if not present.
556+
return rec_parent.get(self._gff_gene_name_attribute, parent_id)

malariagen_data/veff.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ def get_ref_allele_coords(self, chrom, pos, ref):
8484
ref_seq = self.get_ref_seq(chrom, ref_start, ref_stop).lower()
8585
assert ref_seq == ref.lower(), (
8686
"reference allele does not match reference sequence, "
87-
"expected %r, found %r" % (ref_seq, ref.lower())
87+
f"expected {ref_seq!r}, found {ref.lower()!r}"
8888
)
8989

9090
return ref_start, ref_stop
@@ -262,11 +262,11 @@ def _get_within_cds_effect(ann, base_effect, cds, cdss):
262262
base_effect = base_effect._replace(
263263
ref_codon=ref_codon,
264264
alt_codon=alt_codon,
265-
codon_change="%s/%s" % (ref_codon, alt_codon),
265+
codon_change=f"{ref_codon}/{alt_codon}",
266266
aa_pos=aa_pos,
267267
ref_aa=ref_aa,
268268
alt_aa=alt_aa,
269-
aa_change="%s%s%s" % (ref_aa, aa_pos, alt_aa),
269+
aa_change=f"{ref_aa}{aa_pos}{alt_aa}",
270270
)
271271

272272
if len(ref) == 1 and len(alt) == 1:

tests/anoph/test_genome_features.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,9 @@ def test_plot_genes_with_gene_labels(fixture, api: AnophelesGenomeFeaturesData):
169169
# For each contig in the fixture...
170170
for contig in fixture.contigs:
171171
# Get the genes for this contig.
172-
genes_df = api.genome_features(region=contig).query("type == 'gene'")
172+
genes_df = api.genome_features(region=contig).query(
173+
f"type == '{api._gff_gene_type}'"
174+
)
173175

174176
# If there are no genes, we cannot label them.
175177
if not genes_df.empty:
@@ -181,7 +183,10 @@ def test_plot_genes_with_gene_labels(fixture, api: AnophelesGenomeFeaturesData):
181183

182184
# Put the random gene "ID" and its "Name" in a dictionary.
183185
random_gene_labels = dict(
184-
zip(random_sample_genes_df["ID"], random_sample_genes_df["Name"])
186+
zip(
187+
random_sample_genes_df["ID"],
188+
random_sample_genes_df[api._gff_gene_name_attribute],
189+
)
185190
)
186191

187192
# Check that we get a Bokeh figure from plot_genes() with these gene_labels.

0 commit comments

Comments
 (0)