Skip to content

Enhance variant effect annotation: classify in-frame complex variants properly in veff.py #1180

@vinitjain2005

Description

@vinitjain2005

Description:

In the veff.py module, specifically within the function _get_within_cds_effect, complex in-frame variants involving combinations of MNPs (multi-nucleotide polymorphisms) and INDELs (insertions/deletions) are currently assigned a placeholder effect:

effect = base_effect._replace(
effect="TODO in-frame complex variation (MNP + INDEL)", impact="UNKNOWN"
)

This placeholder effect label leaves these biologically important variant types unclassified, marked with "UNKNOWN" impact, which reduces the accuracy and usefulness of variant effect annotations in downstream analyses.


Proposed Fix:

Replace the placeholder with a meaningful effect classification and impact level by updating the code to:

effect = base_effect._replace(
effect="INFRAME_COMPLEX_VARIANT", impact="MODERATE"
)

This change uses the standard effect term "INFRAME_COMPLEX_VARIANT" to indicate complex but in-frame coding changes, and assigns a "MODERATE" impact consistent with other in-frame variants.


Why this matters:

  • Completeness: Proper classification of in-frame complex variants closes a gap in the current annotation logic.
  • Interpretability: Users and downstream tools can better interpret variant consequences with a clear and standard effect term.
  • Robustness: Improves the overall quality of variant effect annotation in the MalariaGEN data Python package.
  • Simplicity: The fix is straightforward but impactful, making it an excellent candidate for contributions and learning.

Suggested next steps:

  • Implement the fix by replacing the placeholder code in _get_within_cds_effect inside veff.py.
  • Add unit tests to cover cases with in-frame complex variants (MNP + INDEL).
  • Ensure existing tests pass and new tests validate the new effect classification.

This issue will help improve the biological relevance and accuracy of variant annotations in the MalariaGEN genomic analysis tools.

Assign these to me so i can start contributing : @vinitjain2005

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions