Skip to content

Commit e4d94c9

Browse files
oops readding curation now
1 parent 156b5fe commit e4d94c9

4 files changed

Lines changed: 89 additions & 86 deletions

File tree

README.md

Lines changed: 12 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,15 @@
1-
# `malariagen_data` - analyse MalariaGEN data from Python
1+
# Curation metadata
22

3-
This Python package provides methods for accessing and analysing data from MalariaGEN.
3+
Summary statistics used during our sequence QC process are available within each sample set subdirectory, in a file named "sequence_qc_stats.csv". Each file contains the following fields:
44

5-
## Installation
5+
- `sample_id` (string) - MalariaGEN sample identifier
6+
- `mean_cov` (float) - mean coverage
7+
- `median_cov` (int) - median coverage
8+
- `modal_cov` (int) - modal coverage
9+
- `mean_cov_{contig}` (float) - mean coverage for a particular contig
10+
- `median_cov_{contig}` (int) - median coverage for a particular contig
11+
- `mode_cov_{contig}` (int) - modal coverage for a particular contig
12+
- `frac_gen_cov` (float) - fraction of the genome covered
13+
- `divergence` (float) - divergence
614

7-
The `malariagen_data` Python package is available from the Python
8-
package index (PyPI) and can be installed via `pip`, e.g.:
9-
10-
```bash
11-
pip install malariagen-data
12-
```
13-
14-
## Documentation
15-
16-
Documentation of classes and methods in the public API are available
17-
from the following locations:
18-
19-
- [Ag3 API
20-
docs](https://malariagen.github.io/malariagen-data-python/latest/Ag3.html)
21-
22-
- [Af1 API
23-
docs](https://malariagen.github.io/malariagen-data-python/latest/Af1.html)
24-
25-
- [Amin1 API
26-
docs](https://malariagen.github.io/malariagen-data-python/latest/Amin1.html)
27-
28-
- [Adir1 API
29-
docs](https://malariagen.github.io/malariagen-data-python/latest/Adir1.html)
30-
31-
- [Pf8 API
32-
docs](https://malariagen.github.io/parasite-data/pf8/api.html)
33-
34-
- [Pf7 API
35-
docs](https://malariagen.github.io/parasite-data/pf7/api.html)
36-
37-
- [Pv4 API
38-
docs](https://malariagen.github.io/parasite-data/pv4/api.html)
39-
40-
## Release notes (change log)
41-
42-
See [GitHub releases](https://github.com/malariagen/malariagen-data-python/releases)
43-
for release notes.
44-
45-
## Developer setup
46-
47-
To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A).
48-
49-
For detailed setup instructions, see:
50-
- [Linux setup guide](LINUX_SETUP.md)
51-
- [macOS setup guide](MACOS_SETUP.md)
52-
- [Windows setup guide](WINDOWS_SETUP.md)
53-
- [Google Colab (TPU) setup guide](docs/source/colab_tpu_runtime.rst)
54-
Detailed instructions can be found in the [Contributors guide](https://github.com/malariagen/malariagen-data-python/blob/master/CONTRIBUTING.md).
55-
56-
## AI use policy and guidelines
57-
58-
See [AI use policy and guidelines](https://github.com/malariagen/malariagen-data-python/blob/master/AI-POLICY.md) for more details.
59-
60-
## Release process
61-
62-
Create a new GitHub release. That's it. This will automatically
63-
trigger publishing of a new release to PyPI and a new version of
64-
the documentation via GitHub Actions.
65-
66-
The version switcher for the documentation can then be updated by
67-
modifying the `docs/source/_static/switcher.json` file accordingly.
68-
69-
## Citation
70-
71-
If you use the `malariagen_data` package in a publication
72-
or include any of its functions or code in other materials (_e.g._ training resources),
73-
please cite: [doi.org/10.5281/zenodo.11173411](https://doi.org/10.5281/zenodo.11173411)
74-
75-
Some functions may require additional citations to acknowledge specific contributions. These are indicated in the description for each relevant function.
76-
77-
For any questions, please feel free to contact us at: [support@malariagen.net](mailto:support@malariagen.net)
78-
79-
80-
## Sponsorship
81-
82-
This project is currently supported by the following grants:
83-
84-
* [BMGF INV-068808](https://www.gatesfoundation.org/about/committed-grants/2024/04/inv-068808)
85-
* [BMGF INV-062921](https://www.gatesfoundation.org/about/committed-grants/2024/07/inv-062921)
86-
87-
This project was previously supported by the following grants:
88-
89-
* [BMGF INV-001927](https://www.gatesfoundation.org/about/committed-grants/2019/11/inv001927)
15+
For further information or queries contact support@malariagen.net.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
sample_id,mean_cov,median_cov,modal_cov,mean_cov_2RL,median_cov_2RL,mode_cov_2RL,mean_cov_3RL,median_cov_3RL,mode_cov_3RL,mean_cov_X,median_cov_X,mode_cov_X,frac_gen_cov,divergence
2+
VMF00318-0001,21.39,21,20,21.44,21,20,21.11,21,20,22.34,21,20,0.953,0.01965
3+
VMF00318-0002,26.37,26,25,26.45,26,25,26.16,26,25,26.88,25,25,0.954,0.01949
4+
VMF00318-0003,40.61,40,40,40.83,40,40,40.25,40,40,41.17,39,39,0.954,0.01927
5+
VMF00318-0004,22.26,22,21,22.34,22,21,22.09,22,21,22.61,21,21,0.953,0.0195
6+
VMF00318-0006,32.69,32,32,32.78,32,32,32.39,32,32,33.58,32,31,0.953,0.01936
7+
VMF00318-0007,27.86,27,27,27.94,27,27,27.47,27,27,29.1,27,26,0.955,0.01939
8+
VMF00318-0008,25.07,24,24,25.18,25,24,24.82,24,24,25.57,24,24,0.953,0.01942
9+
VMF00318-0009,30.95,30,30,31.03,30,30,30.67,30,30,31.73,30,29,0.953,0.01939
10+
VMF00318-0010,26.33,26,25,26.46,26,25,26.07,26,25,26.85,26,25,0.952,0.01945
11+
VMF00318-0011,27.77,27,27,27.89,27,27,27.59,27,27,28.0,27,26,0.953,0.01946
12+
VMF00318-0012,38.43,38,38,38.6,38,38,38.16,38,38,38.73,37,37,0.955,0.01929
13+
VMF00318-0013,26.28,26,25,26.4,26,25,25.96,25,25,27.1,25,25,0.953,0.01949
14+
VMF00318-0014,22.27,22,21,22.41,22,21,22.04,22,21,22.6,21,21,0.953,0.01962
15+
VMF00318-0015,26.28,26,25,26.44,26,25,26.0,26,25,26.77,25,25,0.954,0.01935
16+
VMF00318-0016,24.79,24,24,24.87,24,24,24.51,24,24,25.56,24,24,0.953,0.01952
17+
VMF00318-0017,19.94,19,19,20.17,19,19,19.67,19,19,20.08,19,19,0.952,0.01988
18+
VMF00318-0018,27.63,27,26,27.77,27,27,27.34,27,26,28.25,27,26,0.954,0.01939
19+
VMF00318-0019,25.8,25,25,26.04,25,25,25.53,25,25,25.93,25,24,0.952,0.01945
20+
VMF00318-0020,25.39,25,24,25.52,25,24,25.14,25,24,25.91,24,24,0.952,0.01952
21+
VMF00318-0021,23.72,23,23,23.81,23,23,23.47,23,23,24.37,23,23,0.953,0.0196
22+
VMF00318-0022,22.61,22,21,22.81,22,22,22.37,22,21,22.76,22,21,0.952,0.0196
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
sample_id,mean_cov,median_cov,modal_cov,mean_cov_2RL,median_cov_2RL,mode_cov_2RL,mean_cov_3RL,median_cov_3RL,mode_cov_3RL,mean_cov_X,median_cov_X,mode_cov_X,frac_gen_cov,divergence
2+
VMF00320-0001,25.23,25,24,25.29,25,24,24.92,24,24,26.27,25,24,0.951,0.0201
3+
VMF00320-0003,19.5,19,18,19.56,19,18,19.22,19,18,20.36,19,18,0.95,0.02057
4+
VMF00320-0004,26.65,26,26,26.74,26,26,26.37,26,26,27.38,26,25,0.952,0.02001
5+
VMF00320-0005,22.31,21,21,22.45,22,21,21.92,21,21,23.28,21,21,0.952,0.02019
6+
VMF00320-0006,21.82,21,21,21.9,21,21,21.46,21,21,22.99,21,21,0.951,0.0202
7+
VMF00320-0007,22.83,22,22,22.93,22,22,22.51,22,22,23.71,22,22,0.953,0.02
8+
VMF00320-0008,20.07,19,19,20.14,19,19,19.79,19,19,20.91,19,19,0.953,0.02024
9+
VMF00320-0009,22.45,22,22,22.53,22,22,22.13,22,21,23.36,22,21,0.953,0.02008
10+
VMF00320-0010,24.74,24,24,24.89,24,24,24.37,24,24,25.61,24,24,0.952,0.02005
11+
VMF00320-0012,23.1,22,22,23.15,22,22,22.74,22,22,24.31,22,22,0.954,0.02002
12+
VMF00320-0013,24.23,23,23,24.23,24,23,23.82,23,23,25.92,24,23,0.952,0.02003
13+
VMF00320-0014,24.33,24,23,24.46,24,23,23.94,24,23,25.31,24,23,0.953,0.01994
14+
VMF00320-0015,23.78,23,23,23.84,23,23,23.43,23,23,24.94,23,23,0.951,0.02015
15+
VMF00320-0016,22.06,21,21,22.11,21,21,21.8,21,21,22.91,21,21,0.953,0.02009
16+
VMF00320-0017,28.2,28,27,28.32,28,27,27.87,28,27,29.09,27,27,0.953,0.01995
17+
VMF00320-0018,43.42,43,43,43.51,43,43,43.02,43,43,44.65,43,42,0.955,0.01984
18+
VMF00320-0019,19.93,19,18,20.07,19,19,19.64,19,18,20.52,19,18,0.952,0.02033
19+
VMF00320-0020,22.89,22,22,22.96,22,22,22.61,22,22,23.7,22,22,0.951,0.02024
20+
VMF00320-0021,23.32,22,22,23.53,23,22,22.96,22,22,23.89,22,22,0.951,0.02028
21+
VMF00320-0022,27.43,27,27,27.53,27,27,27.09,27,27,28.4,27,26,0.952,0.01998
22+
VMF00320-0023,24.37,24,23,24.39,24,23,24.02,24,23,25.7,24,23,0.954,0.01997
23+
VMF00320-0024,22.3,22,21,22.41,22,21,22.08,21,21,22.76,21,21,0.951,0.02017
24+
VMF00320-0025,19.25,18,18,19.38,18,18,18.86,18,18,20.27,19,18,0.952,0.02052
25+
VMF00320-0026,20.41,20,19,20.45,20,19,20.08,20,19,21.54,20,19,0.952,0.02036
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
sample_id,mean_cov,median_cov,modal_cov,mean_cov_2RL,median_cov_2RL,mode_cov_2RL,mean_cov_3RL,median_cov_3RL,mode_cov_3RL,mean_cov_X,median_cov_X,mode_cov_X,frac_gen_cov,divergence
2+
VMF00339-0001,17.37,16,14,17.38,16,14,17.11,16,14,18.42,15,13,0.946,0.02138
3+
VMF00339-0002,24.61,24,24,24.74,24,24,24.37,24,24,24.97,24,23,0.953,0.01913
4+
VMF00339-0003,28.79,28,28,29.0,28,28,28.58,28,28,28.77,28,28,0.953,0.01901
5+
VMF00339-0004,28.2,28,27,28.38,28,28,28.04,28,27,28.08,27,27,0.952,0.01922
6+
VMF00339-0005,22.43,22,22,23.57,23,23,23.16,23,22,14.47,12,11,0.95,0.01924
7+
VMF00339-0006,17.63,17,17,18.55,18,17,18.2,18,17,11.34,9,8,0.951,0.01932
8+
VMF00339-0007,22.79,22,23,23.93,23,23,23.5,23,23,14.87,12,11,0.951,0.01928
9+
VMF00339-0008,21.8,21,22,22.94,22,22,22.43,22,22,14.26,11,10,0.952,0.01928
10+
VMF00339-0009,19.93,19,20,20.95,20,20,20.53,20,20,13.0,10,9,0.95,0.01927
11+
VMF00339-0010,22.39,22,22,23.55,23,22,23.04,23,22,14.63,12,11,0.951,0.01925
12+
VMF00339-0011,23.43,23,23,24.62,24,23,24.17,24,23,15.21,12,11,0.95,0.0193
13+
VMF00339-0012,25.73,25,26,27.09,26,26,26.67,26,26,16.0,13,12,0.951,0.01917
14+
VMF00339-0013,23.46,23,24,24.63,24,24,24.33,24,24,14.79,12,12,0.953,0.01913
15+
VMF00339-0014,22.16,22,22,23.32,23,22,22.87,22,22,14.22,12,11,0.951,0.01925
16+
VMF00339-0015,23.42,23,23,24.64,24,24,24.16,24,23,15.1,12,11,0.951,0.01925
17+
VMF00339-0016,22.75,22,22,22.88,22,22,22.53,22,22,23.08,21,20,0.955,0.01911
18+
VMF00339-0017,26.5,26,27,27.89,27,27,27.37,27,27,16.9,14,13,0.951,0.01925
19+
VMF00339-0018,22.36,22,22,22.53,22,22,22.13,22,22,22.61,21,21,0.952,0.01908
20+
VMF00339-0019,28.62,28,28,28.78,28,28,28.33,28,28,29.12,28,28,0.953,0.01914
21+
VMF00339-0020,22.78,22,23,24.03,23,23,23.5,23,23,14.35,12,11,0.951,0.01919
22+
VMF00339-0021,27.09,27,26,27.23,27,27,26.83,27,26,27.54,26,26,0.953,0.01919
23+
VMF00339-0022,24.24,24,24,24.39,24,24,23.94,24,24,24.86,24,24,0.952,0.01911
24+
VMF00339-0025,21.49,21,21,22.65,22,22,22.15,22,21,13.77,11,10,0.95,0.0193
25+
VMF00339-0026,24.5,24,25,25.8,25,25,25.22,25,25,15.97,13,12,0.951,0.01923
26+
VMF00339-0027,22.08,22,22,23.3,23,22,22.74,22,22,14.06,12,11,0.951,0.01931
27+
VMF00339-0028,29.04,29,28,29.18,29,28,28.81,29,28,29.42,28,28,0.952,0.01918
28+
VMF00339-0029,23.81,23,24,25.03,24,24,24.6,24,24,15.31,12,11,0.951,0.01933
29+
VMF00339-0030,23.05,23,23,24.24,23,23,23.78,23,23,14.87,12,11,0.951,0.01928
30+
VMF00339-0031,23.85,23,24,25.12,24,24,24.61,24,24,15.21,12,12,0.951,0.01928

0 commit comments

Comments
 (0)