Skip to content

Commit 4a5100d

Browse files
committed
docs: move is_surveillance to general metadata, clarify conditional columns
1 parent 4ef994f commit 4a5100d

1 file changed

Lines changed: 20 additions & 17 deletions

File tree

malariagen_data/anoph/sample_metadata.py

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -732,8 +732,10 @@ def clear_extra_metadata(self):
732732
- ``terms_of_use_expiry_date`` - Expiry date of terms of use for the sample.
733733
- ``terms_of_use_url`` - URL of the terms of use for the sample.
734734
- ``unrestricted_use`` - Whether the sample can be used without restrictions.
735+
- ``is_surveillance`` - Whether the sample can be used for surveillance.
735736
736-
**Sequence QC metadata**:
737+
**Sequence QC metadata** (present for all sample sets, values may
738+
be missing if QC data is unavailable for a given sample set):
737739
738740
- ``mean_cov`` - Mean sequencing coverage across the genome.
739741
- ``median_cov`` - Median sequencing coverage across the genome.
@@ -747,7 +749,7 @@ def clear_extra_metadata(self):
747749
- ``mean_cov_3L`` - Mean coverage on chromosome arm 3L.
748750
- ``median_cov_3L`` - Median coverage on chromosome arm 3L.
749751
- ``mode_cov_3L`` - Modal coverage on chromosome arm 3L.
750-
- ``mean_cov_3R`` - Mean coverage on chromosome arm 3R.=
752+
- ``mean_cov_3R`` - Mean coverage on chromosome arm 3R.
751753
- ``median_cov_3R`` - Median coverage on chromosome arm 3R.
752754
- ``mode_cov_3R`` - Modal coverage on chromosome arm 3R.
753755
- ``mean_cov_X`` - Mean coverage on chromosome X.
@@ -758,25 +760,24 @@ def clear_extra_metadata(self):
758760
- ``contam_pct`` - Estimated contamination percentage.
759761
- ``contam_LLR`` - Log-likelihood ratio for contamination estimate.
760762
761-
**Surveillance flags**:
762-
763-
- ``is_surveillance`` - Whether the sample can be used for surveillance.
764-
765-
**AIM (Ancestry-Informative Marker) metadata** (if available):
763+
**AIM (Ancestry-Informative Marker) metadata** (only present when
764+
an AIM analysis is available for the data resource, e.g., *Ag3*):
766765
767-
- ``aim_species_fraction_arab`` - Fraction of gambcolu vs. arabiensis AIMs
768-
indicating arabiensis.
766+
- ``aim_species_fraction_arab`` - Fraction of gambcolu vs. arabiensis
767+
AIMs indicating arabiensis.
769768
- ``aim_species_fraction_colu`` - Fraction of gambiae vs. coluzzii AIMs
770769
indicating coluzzii.
771-
- ``aim_species_fraction_colu_no2l`` - Fraction of gambiae vs. coluzzii AIMs
772-
indicating coluzzii, excluding chromosome arm 2L.
770+
- ``aim_species_fraction_colu_no2l`` - Fraction of gambiae vs. coluzzii
771+
AIMs indicating coluzzii, excluding chromosome arm 2L.
773772
- ``aim_species_gambcolu_arabiensis`` - Taxon assigned by gambcolu vs.
774773
arabiensis AIMs.
775774
- ``aim_species_gambiae_coluzzii`` - Taxon assigned by gambiae vs.
776775
coluzzii AIMs.
777776
- ``aim_species`` - Final species assignment combining both AIM analyses.
778777
779-
**Cohort metadata** (if available):
778+
**Cohort metadata** (only present when a cohorts analysis is available
779+
for the data resource; quarter columns are only present for cohorts
780+
analyses from 20230223 onwards):
780781
781782
- ``country_iso`` - ISO code of the country of collection.
782783
- ``admin1_name`` - Name of the first-level administrative region.
@@ -785,14 +786,16 @@ def clear_extra_metadata(self):
785786
- ``taxon`` - Taxon assigned by combining AIM and cohort analyses.
786787
- ``cohort_admin1_year`` - Cohort grouping by admin level 1 and year.
787788
- ``cohort_admin1_month`` - Cohort grouping by admin level 1 and month.
788-
- ``cohort_admin1_quarter`` - Cohort grouping by admin level 1 and quarter.
789+
- ``cohort_admin1_quarter`` - Cohort grouping by admin level 1 and
790+
quarter (cohorts analysis >= 20230223 only).
789791
- ``cohort_admin2_year`` - Cohort grouping by admin level 2 and year.
790792
- ``cohort_admin2_month`` - Cohort grouping by admin level 2 and month.
791-
- ``cohort_admin2_quarter`` - Cohort grouping by admin level 2 and quarter.
793+
- ``cohort_admin2_quarter`` - Cohort grouping by admin level 2 and
794+
quarter (cohorts analysis >= 20230223 only).
792795
793-
The exact columns present depend on the sample sets requested and
794-
which analyses are available. The returned DataFrame is a copy and
795-
can be safely modified without affecting internal caches.
796+
The exact columns present depend on the data resource and sample sets
797+
requested. The returned DataFrame is a copy and can be safely modified
798+
without affecting internal caches.
796799
""",
797800
)
798801
def sample_metadata(

0 commit comments

Comments
 (0)