Skip to content
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
31a0c31
init As1 class file hooray
tristanpwdennis Mar 31, 2026
d7fa6df
add docs
tristanpwdennis Mar 31, 2026
76c5d89
Update base class, init, start adding test coverage
tristanpwdennis Apr 1, 2026
e529a78
Merge branch 'malariagen:master' into add-As1
tristanpwdennis Apr 1, 2026
b57322f
merge
tristanpwdennis Apr 1, 2026
569105e
test genome features
tristanpwdennis Apr 1, 2026
7851553
accidentally mangled conftest, thanks claude code, working now
tristanpwdennis Apr 5, 2026
38fe863
Merge branch 'master' into add-As1
tristanpwdennis Apr 5, 2026
ceaf546
fix phasing analysis misspec
tristanpwdennis Apr 6, 2026
774e93f
x
tristanpwdennis Apr 6, 2026
d761029
add index entry and grid image
tristanpwdennis Apr 6, 2026
c9e8202
ci: trigger test run
tristanpwdennis Apr 7, 2026
a5ffbdf
Merge branch 'master' into add-As1
tristanpwdennis Apr 7, 2026
9d1cba2
add tests for as1. add cnv flag to skip tests for classes without cn…
tristanpwdennis Apr 7, 2026
46ecca7
Merge branch 'add-As1' of https://github.com/tristanpwdennis/malariag…
tristanpwdennis Apr 7, 2026
fed7e9f
tidy metadata in fixture
tristanpwdennis Apr 7, 2026
156b5fe
remove ghostly surv flags from abdi
tristanpwdennis Apr 7, 2026
e4d94c9
oops readding curation now
tristanpwdennis Apr 7, 2026
e3cd4dd
re-add dummy qc cols to test data
tristanpwdennis Apr 7, 2026
7535986
fix readme
tristanpwdennis Apr 10, 2026
97017b4
fix more typosd
tristanpwdennis Apr 10, 2026
a09b28d
Merge branch 'master' into add-As1
jonbrenas Apr 13, 2026
a146798
Merge branch 'master' into add-As1
ahernank Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 12 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,15 @@
# `malariagen_data` - analyse MalariaGEN data from Python
# Curation metadata
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea but it's been banished now


This Python package provides methods for accessing and analysing data from MalariaGEN.
Summary statistics used during our sequence QC process are available within each sample set subdirectory, in a file named "sequence_qc_stats.csv". Each file contains the following fields:

## Installation
- `sample_id` (string) - MalariaGEN sample identifier
- `mean_cov` (float) - mean coverage
- `median_cov` (int) - median coverage
- `modal_cov` (int) - modal coverage
- `mean_cov_{contig}` (float) - mean coverage for a particular contig
- `median_cov_{contig}` (int) - median coverage for a particular contig
- `mode_cov_{contig}` (int) - modal coverage for a particular contig
- `frac_gen_cov` (float) - fraction of the genome covered
- `divergence` (float) - divergence

The `malariagen_data` Python package is available from the Python
package index (PyPI) and can be installed via `pip`, e.g.:

```bash
pip install malariagen-data
```

## Documentation

Documentation of classes and methods in the public API are available
from the following locations:

- [Ag3 API
docs](https://malariagen.github.io/malariagen-data-python/latest/Ag3.html)

- [Af1 API
docs](https://malariagen.github.io/malariagen-data-python/latest/Af1.html)

- [Amin1 API
docs](https://malariagen.github.io/malariagen-data-python/latest/Amin1.html)

- [Adir1 API
docs](https://malariagen.github.io/malariagen-data-python/latest/Adir1.html)

- [Pf8 API
docs](https://malariagen.github.io/parasite-data/pf8/api.html)

- [Pf7 API
docs](https://malariagen.github.io/parasite-data/pf7/api.html)

- [Pv4 API
docs](https://malariagen.github.io/parasite-data/pv4/api.html)

## Release notes (change log)

See [GitHub releases](https://github.com/malariagen/malariagen-data-python/releases)
for release notes.

## Developer setup

To get setup for development, see [this video if you prefer VS Code](https://youtu.be/zddl3n1DCFM), or [this older video if you prefer PyCharm](https://youtu.be/QniQi-Hoo9A).

For detailed setup instructions, see:
- [Linux setup guide](LINUX_SETUP.md)
- [macOS setup guide](MACOS_SETUP.md)
- [Windows setup guide](WINDOWS_SETUP.md)
- [Google Colab (TPU) setup guide](docs/source/colab_tpu_runtime.rst)
Detailed instructions can be found in the [Contributors guide](https://github.com/malariagen/malariagen-data-python/blob/master/CONTRIBUTING.md).

## AI use policy and guidelines

See [AI use policy and guidelines](https://github.com/malariagen/malariagen-data-python/blob/master/AI-POLICY.md) for more details.

## Release process

Create a new GitHub release. That's it. This will automatically
trigger publishing of a new release to PyPI and a new version of
the documentation via GitHub Actions.

The version switcher for the documentation can then be updated by
modifying the `docs/source/_static/switcher.json` file accordingly.

## Citation

If you use the `malariagen_data` package in a publication
or include any of its functions or code in other materials (_e.g._ training resources),
please cite: [doi.org/10.5281/zenodo.11173411](https://doi.org/10.5281/zenodo.11173411)

Some functions may require additional citations to acknowledge specific contributions. These are indicated in the description for each relevant function.

For any questions, please feel free to contact us at: [support@malariagen.net](mailto:support@malariagen.net)


## Sponsorship

This project is currently supported by the following grants:

* [BMGF INV-068808](https://www.gatesfoundation.org/about/committed-grants/2024/04/inv-068808)
* [BMGF INV-062921](https://www.gatesfoundation.org/about/committed-grants/2024/07/inv-062921)

This project was previously supported by the following grants:

* [BMGF INV-001927](https://www.gatesfoundation.org/about/committed-grants/2019/11/inv001927)
For further information or queries contact support@malariagen.net.
141 changes: 141 additions & 0 deletions docs/source/As1.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
As1
=====

This page provides a curated list of functions and properties available in the ``malariagen_data`` API
for data on *Anopheles stephensi* species mosquitoes.

To set up the API, use the following code::

import malariagen_data
as1 = malariagen_data.As1()

All the functions below can then be accessed as methods on the ``as1`` object. E.g., to call the
``sample_metadata()`` function, do::

df_samples = as1.sample_metadata()

For more information about the data and terms of use, please see the
`MalariaGEN website <https://www.malariagen.net/data>`_ or contact support@malariagen.net.

.. currentmodule:: malariagen_data.as1.As1

Basic data access
-----------------
.. autosummary::
:toctree: generated/

releases
sample_sets
lookup_release
lookup_study

Reference genome data access
----------------------------
.. autosummary::
:toctree: generated/

contigs
genome_sequence
genome_features
plot_transcript
plot_genes

Sample metadata access
----------------------
.. autosummary::
:toctree: generated/

sample_metadata
add_extra_metadata
clear_extra_metadata
lookup_sample
count_samples
plot_samples_bar
plot_samples_interactive_map
plot_sample_location_mapbox
plot_sample_location_geo
wgs_data_catalog
cohorts

SNP data access
---------------
.. autosummary::
:toctree: generated/

site_mask_ids
snp_calls
snp_allele_counts
plot_snps
site_annotations
is_accessible
biallelic_snp_calls
biallelic_diplotypes
biallelic_snps_to_plink

SNP frequency analysis
----------------------
.. autosummary::
:toctree: generated/

snp_allele_frequencies
snp_allele_frequencies_advanced
aa_allele_frequencies
aa_allele_frequencies_advanced
plot_frequencies_heatmap
plot_frequencies_time_series
plot_frequencies_interactive_map

Principal components analysis (PCA)
-----------------------------------
.. autosummary::
:toctree: generated/

pca
plot_pca_variance
plot_pca_coords
plot_pca_coords_3d

Genetic distance and neighbour-joining trees (NJT)
--------------------------------------------------
.. autosummary::
:toctree: generated/

plot_njt
njt
biallelic_diplotype_pairwise_distances

Heterozygosity analysis
-----------------------
.. autosummary::
:toctree: generated/

plot_heterozygosity
roh_hmm
plot_roh

Diversity analysis
------------------
.. autosummary::
:toctree: generated/

cohort_diversity_stats
diversity_stats
plot_diversity_stats

Diplotype clustering
--------------------
.. autosummary::
:toctree: generated/

plot_diplotype_clustering

Fst analysis
------------
.. autosummary::
:toctree: generated/

average_fst
pairwise_average_fst
plot_pairwise_average_fst
fst_gwss
plot_fst_gwss
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,17 @@ API documentation
:align: center
:width: 100%

.. grid-item-card:: ``As1``
:link: As1
:link-type: doc

*Anopheles stephensi*.

.. image:: ./_static/images/anopheles_stephensi.jpg
:alt: Anopheles stephensi mosquito mosquito
:align: center
:width: 100%

.. grid-item-card:: ``Amin1``
:link: Amin1
:link-type: doc
Expand Down
1 change: 1 addition & 0 deletions malariagen_data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from .af1 import Af1
from .ag3 import Ag3
from .amin1 import Amin1
from .as1 import As1
from .anopheles import AnophelesDataResource, Region
from .pf7 import Pf7
from .pf8 import Pf8
Expand Down
Loading
Loading