Skip to content

Commit 05e89bf

Browse files
authored
Merge branch 'master' into docs/plot-frequencies-time-series-params
2 parents 47c32f1 + b0db6ed commit 05e89bf

53 files changed

Lines changed: 2655 additions & 928 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/actions/setup-python/action.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ runs:
1919
shell: bash
2020
run: |
2121
poetry env use ${{ inputs.python-version }}
22-
poetry install --extras dev
22+
poetry install --with dev,test,docs

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
.idea
22
.vscode
33
__pycache__
4+
.mypy_cache
45
*.pyc
56
dist
7+
.venv/
68
.coverage
79
coverage.xml
810
.ipynb_checkpoints/

CONTRIBUTING.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This package provides Python tools for accessing and analyzing genomic data from
1212

1313
You'll need:
1414

15-
- [pipx](https://python-poetry.org/) for installing Python tools
15+
- [pipx](https://pipx.pypa.io/) for installing Python tools
1616
- [git](https://git-scm.com/) for version control
1717

1818
Both of these can be installed using your distribution's package manager or [Homebrew](https://brew.sh/) on Mac.
@@ -52,9 +52,13 @@ Both of these can be installed using your distribution's package manager or [Hom
5252

5353
```bash
5454
poetry env use 3.12
55-
poetry install --extras dev
55+
poetry install --with dev,test,docs
5656
```
5757

58+
This installs the runtime dependencies along with the `dev`, `test`, and `docs`
59+
[dependency groups](https://python-poetry.org/docs/managing-dependencies/#dependency-groups).
60+
If you only need to run tests, `poetry install --with test` is sufficient.
61+
5862
**Recommended**: Use `poetry run` to run commands inside the virtual environment:
5963

6064
```bash

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ To get setup for development, see [this video if you prefer VS Code](https://you
4949
For detailed setup instructions, see:
5050
- [Linux setup guide](LINUX_SETUP.md)
5151
- [macOS setup guide](MACOS_SETUP.md)
52+
- [Google Colab (TPU) setup guide](docs/source/colab_tpu_runtime.rst)
5253
Detailed instructions can be found in the [Contributors guide](https://github.com/malariagen/malariagen-data-python/blob/master/CONTRIBUTING.md).
5354

5455
## AI use policy and guidelines

docs/source/colab_tpu_runtime.rst

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
Google Colab Installation Guide
2+
===============================
3+
Prerequisites
4+
-------------
5+
6+
Before installing the package, configure the runtime environment
7+
correctly:
8+
9+
1. Open a new notebook in Google Colab.
10+
2. Navigate to ``Runtime → Change runtime type``.
11+
3. Set the runtime configuration as follows:
12+
13+
- **Runtime type:** Python 3
14+
- **Hardware accelerator:** TPU
15+
- **TPU type:** v2-8
16+
17+
4. Click **Save** to apply the configuration.
18+
19+
Using the recommended TPU configuration ensures compatibility with
20+
workflows that may require TPU-based computation.
21+
22+
23+
Installation Procedure
24+
----------------------
25+
26+
In a new notebook cell, install the package:
27+
28+
.. code-block:: bash
29+
30+
!pip install malariagen_data
31+
32+
After installation completes, verify that the package is available:
33+
34+
.. code-block:: python
35+
36+
import malariagen_data
37+
38+
If the import executes without errors, the installation was successful.
39+
40+
If dependency-related warnings or conflicts occur, follow one of the
41+
resolution options described below.
42+
43+
44+
Resolution Options
45+
------------------
46+
47+
Resolution Option 1: Uninstall Panel
48+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
49+
50+
If your notebook does not require ``panel``, uninstall it before
51+
installing ``malariagen_data``.
52+
53+
.. code-block:: bash
54+
55+
!pip uninstall -y panel
56+
!pip install malariagen_data
57+
58+
Verify installation:
59+
60+
.. code-block:: python
61+
62+
import malariagen_data
63+
64+
65+
Resolution Option 2: Install Compatible Panel Version
66+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
67+
68+
If your workflow depends on ``panel``, install a compatible version:
69+
70+
.. code-block:: bash
71+
72+
!pip install panel==1.7.0
73+
!pip install malariagen_data
74+
75+
Restart the runtime:
76+
77+
``Runtime → Restart runtime``
78+
79+
Then verify:
80+
81+
.. code-block:: python
82+
83+
import malariagen_data
84+
85+
86+
Resolution Option 3: Install Required Blinker Version
87+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
88+
89+
If a ``blinker`` version conflict occurs:
90+
91+
.. code-block:: bash
92+
93+
!pip install blinker==1.9.0 --ignore-installed
94+
!pip install malariagen_data
95+
96+
Restart the runtime and verify:
97+
98+
.. code-block:: python
99+
100+
import malariagen_data
101+
102+
103+
Final Verification
104+
------------------
105+
106+
After completing any of the procedures above:
107+
108+
- Ensure that ``malariagen_data`` installs without dependency errors.
109+
- Confirm that ``import malariagen_data`` runs successfully.
110+
- Restart the runtime whenever core dependencies are modified.
111+
- Avoid mixing incompatible package versions within the same Colab session.

malariagen_data/anoph/aim_data.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,4 @@ def plot_aim_heatmap(
341341

342342
if show: # pragma: no cover
343343
fig.show(renderer=renderer)
344-
return None
345-
else:
346-
return fig
344+
return fig

malariagen_data/anoph/base.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -607,9 +607,9 @@ def _read_sample_sets_manifest(self, *, single_release: str):
607607
# Get today's date in ISO format
608608
today_date_iso = date.today().isoformat()
609609
# Add an "unrestricted_use" column, set to True if terms-of-use expiry date <= today's date.
610-
df["unrestricted_use"] = df[terms_of_use_expiry_date_column].apply(
611-
lambda d: True if pd.isna(d) else (d <= today_date_iso)
612-
)
610+
# Vectorized operation: True if NaN, else (d <= today_date_iso)
611+
s = df[terms_of_use_expiry_date_column]
612+
df["unrestricted_use"] = s.isna() | (s <= today_date_iso)
613613
# Make the "unrestricted_use" column a nullable boolean, to allow missing data.
614614
df["unrestricted_use"] = df["unrestricted_use"].astype(pd.BooleanDtype())
615615

malariagen_data/anoph/base_params.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,10 @@
6969
str,
7070
"""
7171
A pandas query string to be evaluated against the sample metadata, to
72-
select samples to be included in the returned data.
72+
select samples to be included in the returned data. E.g.,
73+
"country == 'Uganda'". If the query returns zero results, a warning
74+
will be emitted with fuzzy-match suggestions for possible typos or
75+
case mismatches.
7376
""",
7477
]
7578

@@ -186,6 +189,14 @@ def _validate_sample_selection_params(
186189
"Random seed used for reproducible down-sampling.",
187190
]
188191

192+
gene: TypeAlias = Annotated[
193+
str,
194+
"""
195+
Gene identifier. Can be either a gene ID or gene name.
196+
Gene names are matched case-insensitively.
197+
""",
198+
]
199+
189200
transcript: TypeAlias = Annotated[
190201
str,
191202
"Gene transcript identifier.",

malariagen_data/anoph/cnv_data.py

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -810,9 +810,7 @@ def plot_cnv_hmm_coverage_track(
810810

811811
if show:
812812
bkplt.show(fig)
813-
return None
814-
else:
815-
return fig
813+
return fig
816814

817815
@_check_types
818816
@doc(
@@ -884,9 +882,7 @@ def plot_cnv_hmm_coverage(
884882

885883
if show:
886884
bkplt.show(fig)
887-
return None
888-
else:
889-
return fig
885+
return fig
890886

891887
@_check_types
892888
@doc(
@@ -1028,9 +1024,7 @@ def plot_cnv_hmm_heatmap_track(
10281024

10291025
if show:
10301026
bkplt.show(fig)
1031-
return None
1032-
else:
1033-
return fig
1027+
return fig
10341028

10351029
@_check_types
10361030
@doc(
@@ -1100,6 +1094,4 @@ def plot_cnv_hmm_heatmap(
11001094

11011095
if show:
11021096
bkplt.show(fig)
1103-
return None
1104-
else:
1105-
return fig
1097+
return fig

malariagen_data/anoph/cnv_frq.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,11 @@ def _gene_cnv(
9090
inline_array,
9191
):
9292
# Sanity check.
93-
assert isinstance(region, Region)
93+
if not isinstance(region, Region):
94+
raise TypeError(
95+
f"Expected region to be a Region object, "
96+
f"got {type(region).__name__}: {region!r}"
97+
)
9498

9599
# Access genes within the region of interest.
96100
df_genome_features = self.genome_features(region=region)
@@ -260,7 +264,11 @@ def _gene_cnv_frequencies(
260264
debug = self._log.debug
261265

262266
debug("sanity check - this function is one region at a time")
263-
assert isinstance(region, Region)
267+
if not isinstance(region, Region):
268+
raise TypeError(
269+
f"Expected region to be a Region object, "
270+
f"got {type(region).__name__}: {region!r}"
271+
)
264272

265273
debug("get gene copy number data")
266274
ds_cnv = self.gene_cnv(
@@ -504,7 +512,11 @@ def _gene_cnv_frequencies_advanced(
504512
debug = self._log.debug
505513

506514
debug("sanity check - here we deal with one region only")
507-
assert isinstance(region, Region)
515+
if not isinstance(region, Region):
516+
raise TypeError(
517+
f"Expected region to be a Region object, "
518+
f"got {type(region).__name__}: {region!r}"
519+
)
508520

509521
debug("access gene CNV calls")
510522
ds_cnv = self.gene_cnv(

0 commit comments

Comments
 (0)