Cholera is an acute diarrheal disease caused by Vibrio cholerae and remains a significant public health threat, particularly in regions affected by limited access to safe water, sanitation, and hygiene. Cholera outbreaks can spread rapidly and cross-national borders, requiring timely surveillance, coordinated response, and robust data sharing.
Advances in whole-genome sequencing (WGS) and genomic epidemiology have greatly enhanced the ability to investigate cholera outbreaks, track transmission pathways, monitor the emergence and spread of virulence and antimicrobial resistance determinants, and inform public health interventions. The impact of these approaches depends on the availability of high-quality, standardized metadata that provide essential epidemiological, clinical, and environmental context.
This repository contains the Cholera Metadata Standard, a structured and harmonized framework designed to support genomic epidemiology of cholera. The standard aims to enable consistent data collection, interoperability across studies and platforms, and meaningful comparison and reuse of genomic datasets at local, regional, and global scales.
Cholera is a waterborne infectious disease characterized by acute watery diarrhea that can lead to severe dehydration and death if untreated. It is endemic in many parts of the world and frequently associated with outbreaks driven by environmental, climatic, and socio-economic factors. Effective cholera control relies on early detection, rapid response, and sustained surveillance.
Genomic epidemiology has become a critical tool for cholera research and public health by enabling:
- High-resolution tracking of outbreak sources and transmission chains
- Differentiation of endemic persistence versus reintroduction events
- Characterization of toxigenic lineages and virulence factors
- Monitoring of antimicrobial resistance and evolutionary dynamics
Interpreting genomic data in these contexts requires standardized metadata describing the case, location, time, environment, and laboratory processes associated with each isolate or sample.
Metadata standards define a common structure, vocabulary, and set of expectations for describing data. In genomic epidemiology, standardized metadata:
- Improves data quality, completeness, and consistency
- Enables interoperability across databases, analytical pipelines, and surveillance systems
- Facilitates data sharing and reuse in alignment with FAIR principles (Findable, Accessible, Interoperable, Reusable)
- Supports reproducibility and transparent interpretation of genomic analyses
For cholera, where data are often generated across multiple countries, sectors, and outbreak contexts, a shared metadata standard is essential to support coordinated regional and global analyses.
This metadata standard is intended to support genomic epidemiology of cholera by defining a harmonized set of metadata elements relevant to pathogen genomics, epidemiology, and public health surveillance.
This standard:
- Focuses on metadata accompanying Vibrio cholerae genomic data
- Supports outbreak investigation, surveillance, and research use cases
- Is applicable across diverse geographic, laboratory, and public health settings, including cross-border contexts
This standard is not intended to:
- Replace clinical case definitions or treatment guidelines for cholera
- Serve as a comprehensive electronic health record or laboratory information management system
- Function as a regulatory or clinical decision-support tool
This repository serves as the authoritative home for the Cholera Metadata Standard and its supporting materials. It is intended for use by researchers, public health practitioners, laboratorians, bioinformaticians, and data stewards working in cholera surveillance and genomic epidemiology.
Specifically, this repository aims to:
- Provide a clear and well-documented metadata specification for cholera genomic data
- Support consistent implementation of the standard across projects, institutions, and countries
- Enable testing, validation, and iterative improvement of the standard
- Promote transparency, collaboration, and community engagement in standard development
This repository is intended for:
- Public health professionals involved in cholera surveillance and outbreak response
- Genomic epidemiology researchers studying Vibrio cholerae
- Laboratory scientists, bioinformaticians, and data managers generating or analyzing genomic data
- Standards developers and stakeholders interested in public health data harmonization
Contributions, feedback, and issue reports are welcome. Community input is essential to ensure the metadata standard remains relevant, practical, and aligned with public health needs. Please see the contribution guidelines for details on how to get involved.
If you use, adapt, or reference this metadata standard in your work, please credit:
Public Health Alliance for Genomic Epidemiology (PHA4GE)
Africa Centres for Disease Control and Prevention (Africa CDC)
A formal citation and citation file (CITATION.cff) may be added in future releases.
License information is provided in this repository