Skip to content

Commit 9a3868d

Browse files
authored
Merge branch 'master' into docs/colab-tpu-runtime
2 parents 2bd6a63 + b160cc2 commit 9a3868d

153 files changed

Lines changed: 10674 additions & 2488 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/actions/setup-python/action.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ runs:
1919
shell: bash
2020
run: |
2121
poetry env use ${{ inputs.python-version }}
22-
poetry install --extras dev
22+
poetry install --with dev,test,docs

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
.idea
22
.vscode
33
__pycache__
4+
.mypy_cache
45
*.pyc
56
dist
7+
.venv/
68
.coverage
79
coverage.xml
810
.ipynb_checkpoints/

CONTRIBUTING.md

Lines changed: 50 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,10 @@ This package provides Python tools for accessing and analyzing genomic data from
1212

1313
You'll need:
1414

15-
- Python 3.10.x (CI-tested version)
16-
- [Poetry](https://python-poetry.org/) for dependency management
17-
- [Git](https://git-scm.com/) for version control
15+
- [pipx](https://pipx.pypa.io/) for installing Python tools
16+
- [git](https://git-scm.com/) for version control
17+
18+
Both of these can be installed using your distribution's package manager or [Homebrew](https://brew.sh/) on Mac.
1819

1920
### Initial setup
2021

@@ -33,18 +34,31 @@ You'll need:
3334
git remote add upstream https://github.com/malariagen/malariagen-data-python.git
3435
```
3536

36-
3. **Install Poetry** (if not already installed)
37+
3. **Install Poetry**
3738

3839
```bash
3940
pipx install poetry
4041
```
4142

42-
4. **Install the project and its dependencies**
43+
4. **Install Python 3.12**
44+
45+
Python 3.12 is tested in the CI-system and is the recommended version to use.
46+
47+
```bash
48+
poetry python install 3.12
49+
```
50+
51+
5. **Install the project and its dependencies**
4352

4453
```bash
45-
poetry install
54+
poetry env use 3.12
55+
poetry install --with dev,test,docs
4656
```
4757

58+
This installs the runtime dependencies along with the `dev`, `test`, and `docs`
59+
[dependency groups](https://python-poetry.org/docs/managing-dependencies/#dependency-groups).
60+
If you only need to run tests, `poetry install --with test` is sufficient.
61+
4862
**Recommended**: Use `poetry run` to run commands inside the virtual environment:
4963

5064
```bash
@@ -71,7 +85,7 @@ You'll need:
7185
python script.py
7286
```
7387

74-
5. **Install pre-commit hooks**
88+
6. **Install pre-commit hooks**
7589

7690
```bash
7791
pipx install pre-commit
@@ -107,16 +121,40 @@ You'll need:
107121

108122
4. **Run tests locally**
109123

110-
Fast unit tests (no external data access):
124+
Fast unit tests using simulated data (no external data access):
111125

112126
```bash
113-
poetry run pytest -v tests/anoph
127+
poetry run pytest -v tests --ignore tests/integration
114128
```
115129

116-
All unit tests (requires setting up credentials for legacy tests):
130+
To run integration tests which read data from GCS, you'll need to [request access to MalariaGEN data on GCS](https://malariagen.github.io/vector-data/vobs/vobs-data-access.html).
131+
132+
Once access has been granted, [install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install). E.g., if on Linux:
117133

118134
```bash
119-
poetry run pytest -v tests --ignore tests/integration
135+
./install_gcloud.sh
136+
```
137+
138+
You'll then need to obtain application-default credentials, e.g.:
139+
140+
```bash
141+
./google-cloud-sdk/bin/gcloud auth application-default login
142+
```
143+
144+
Once this is done, you can run integration tests:
145+
146+
```bash
147+
poetry run pytest -v tests/integration
148+
```
149+
150+
Tests will run slowly the first time, as data required for testing will be read from GCS. Subsequent runs will be faster as data will be cached locally in the "gcs_cache" folder.
151+
152+
6. **Run typechecking**
153+
154+
Run static typechecking with mypy:
155+
156+
```bash
157+
poetry run mypy malariagen_data tests --ignore-missing-imports
120158
```
121159

122160
5. **Check code quality**
@@ -150,7 +188,7 @@ ruff format .
150188
- **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`)
151189
- **Integration tests**: Tests requiring GCS data access are slower and run separately
152190

153-
Run type checking with:
191+
Run dynamic type checking with:
154192

155193
```bash
156194
poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph

LINUX_SETUP.md

Lines changed: 88 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,138 @@
11
# Developer setup (Linux)
22

3-
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A), and the instructions below.
3+
## 1. Install Git
4+
5+
Choose the command for your Linux distribution:
6+
7+
**Ubuntu, Debian, and Mint:**
8+
9+
```bash
10+
sudo apt update
11+
sudo apt install -y git
12+
```
13+
14+
**Fedora:**
415

5-
## 1. Fork and clone this repo
616
```bash
7-
git clone git@github.com:[username]/malariagen-data-python.git
17+
sudo dnf install -y git
18+
```
19+
20+
**Arch Linux:**
21+
22+
```bash
23+
sudo pacman -S sudo
24+
sudo pacman -S git
25+
sudo pacman -S openssh
26+
```
27+
28+
If your Arch install does not have `sudo` configured yet, run the commands above as `root`, then configure `sudo` for your user.
29+
30+
## 2. Fork and clone this repo
31+
32+
After forking the repository on GitHub, clone your fork.
33+
34+
Use SSH if your SSH keys are set up:
35+
36+
```bash
37+
git clone git@github.com:[YOUR_GITHUB_USERNAME]/malariagen-data-python.git
838
cd malariagen-data-python
939
```
1040

11-
## 2. Install Python
41+
Use HTTPS if you prefer, or if you do not have SSH keys configured (common on WSL):
42+
43+
```bash
44+
git clone https://github.com/[YOUR_GITHUB_USERNAME]/malariagen-data-python.git
45+
cd malariagen-data-python
46+
```
47+
48+
## 3. Install pipx
49+
50+
Choose the command for your Linux distribution:
51+
52+
**Ubuntu, Debian, and Mint:**
53+
54+
```bash
55+
sudo apt update
56+
sudo apt install -y pipx
57+
pipx ensurepath
58+
```
59+
60+
**Fedora:**
61+
62+
```bash
63+
sudo dnf install -y pipx
64+
pipx ensurepath
65+
```
66+
67+
**Arch Linux:**
68+
1269
```bash
13-
sudo add-apt-repository ppa:deadsnakes/ppa
14-
sudo apt install python3.10 python3.10-venv
70+
sudo pacman -S python-pipx
71+
pipx ensurepath
1572
```
1673

17-
## 3. Install pipx and poetry
74+
Close and reopen your terminal to apply PATH changes.
75+
If you prefer to reload the shell in-place, run:
76+
77+
```bash
78+
exec bash
79+
```
80+
81+
## 4. Install Poetry and Python 3.12
82+
83+
The package requires `>=3.10,<3.13`. We use Poetry's built-in installer to handle the Python version universally across all distributions.
84+
1885
```bash
19-
python3.10 -m pip install --user pipx
20-
python3.10 -m pipx ensurepath
2186
pipx install poetry
87+
poetry python install 3.12
2288
```
2389

24-
## 4. Create and activate development environment
90+
## 5. Create development environment
91+
2592
```bash
26-
poetry install
27-
poetry shell
93+
poetry env use 3.12
94+
poetry install --extras dev
2895
```
2996

30-
## 5. Install pre-commit hooks
97+
## 6. Install pre-commit hooks
98+
3199
```bash
32100
pipx install pre-commit
33101
pre-commit install
34102
```
35103

36104
Run pre-commit checks manually:
105+
37106
```bash
38107
pre-commit run --all-files
39108
```
40109

41-
## 6. Run tests
110+
## 7. Run tests
42111

43112
Run fast unit tests using simulated data:
113+
44114
```bash
45115
poetry run pytest -v tests/anoph
46116
```
47117

48-
## 7. Google Cloud authentication (for legacy tests)
118+
## 8. Google Cloud authentication (for legacy tests)
49119

50120
To run legacy tests which read data from GCS, you'll need to [request access to MalariaGEN data on GCS](https://malariagen.github.io/vector-data/vobs/vobs-data-access.html).
51121

52122
Once access has been granted, [install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install):
123+
53124
```bash
54125
./install_gcloud.sh
55126
```
56127

57128
Then obtain application-default credentials:
129+
58130
```bash
59131
./google-cloud-sdk/bin/gcloud auth application-default login
60132
```
61133

62134
Once authenticated, run legacy tests:
135+
63136
```bash
64137
poetry run pytest --ignore=tests/anoph -v tests
65138
```

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ To get setup for development, see [this video if you prefer VS Code](https://you
4949
For detailed setup instructions, see:
5050
- [Linux setup guide](LINUX_SETUP.md)
5151
- [macOS setup guide](MACOS_SETUP.md)
52+
- [Windows setup guide](WINDOWS_SETUP.md)
53+
- [Google Colab (TPU) setup guide](docs/source/colab_tpu_runtime.rst)
5254
Detailed instructions can be found in the [Contributors guide](https://github.com/malariagen/malariagen-data-python/blob/master/CONTRIBUTING.md).
5355

5456
## AI use policy and guidelines

WINDOWS_SETUP.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Windows Setup Guide
2+
3+
To get setup for development on Windows, see
4+
[this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM),
5+
or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A),
6+
and the instructions below.
7+
8+
## 1. Fork and clone this repo
9+
```bash
10+
git clone https://github.com/[username]/malariagen-data-python.git
11+
cd malariagen-data-python
12+
```
13+
14+
## 2. Install Python
15+
16+
Download and install Python 3.10 from the official website:
17+
https://www.python.org/downloads/windows/
18+
19+
During installation, check the box that says Add Python to PATH
20+
before clicking Install.
21+
22+
Verify the installation worked:
23+
```bash
24+
python --version
25+
```
26+
27+
## 3. Install pipx and poetry
28+
```bash
29+
python -m pip install --user pipx
30+
python -m pipx ensurepath
31+
pipx install poetry
32+
```
33+
34+
After running ensurepath, close and reopen PowerShell before continuing.
35+
36+
## 4. Create and activate development environment
37+
```bash
38+
poetry install
39+
poetry shell
40+
```
41+
42+
## 5. Install pre-commit hooks
43+
```bash
44+
pipx install pre-commit
45+
pre-commit install
46+
```
47+
48+
## 6. Add upstream remote and get latest code
49+
```bash
50+
git remote add upstream https://github.com/malariagen/malariagen-data-python
51+
git pull upstream master
52+
```
53+
54+
Note: On Windows the default branch is called master, not main.
55+
56+
## 7. Verify everything works
57+
```bash
58+
python -c "import malariagen_data; print('Setup successful!')"
59+
```
60+
61+
## Common Issues on Windows
62+
63+
**poetry not found after install**
64+
65+
Close and reopen PowerShell, then try again.
66+
67+
**git not recognized**
68+
69+
Install Git from https://git-scm.com/download/win
70+
and restart PowerShell.
71+
72+
**python not recognized**
73+
74+
Reinstall Python and make sure to check
75+
Add Python to PATH during installation.
76+
77+
**fatal: not a git repository**
78+
79+
Make sure you are inside the malariagen-data-python
80+
folder before running any git commands.
81+
```bash
82+
cd malariagen-data-python
83+
```
84+
85+
**error: pathspec main did not match**
86+
87+
On Windows use master instead of main.
88+
```bash
89+
git checkout master
90+
```

0 commit comments

Comments
 (0)