Skip to content

veff.py: in-frame complex variants (MNP+INDEL) return literal "TODO" string as effect — invalid scientific output #1315

@Tanisha127

Description

@Tanisha127

Summary

In malariagen_data/veff.py, the function _get_within_cds_effect() has an incomplete branch for in-frame complex variants — variants that are simultaneously a multi-nucleotide polymorphism (MNP) AND an indel, where the net length change is a multiple of 3 (no frameshift).

Current broken behaviour

Lines 447-449 currently contain:

effect = base_effect._replace(
    effect="TODO in-frame complex variation (MNP + INDEL)",
    impact="UNKNOWN"
)

This means a researcher who calls snp_effects() on such a variant
receives a DataFrame where:

  • The effect column contains the literal string "TODO in-frame complex variation (MNP + INDEL)"
  • The impact column contains "UNKNOWN"

Why this is serious

  1. "UNKNOWN" is not a valid impact level anywhere else in this codebase.
    All other values are "HIGH", "MODERATE", "LOW", or "MODIFIER".
    This silently breaks any downstream filtering such as df[df["impact"] == "HIGH"].

  2. The TODO string is a developer note leaking directly into scientific output with no warning or error raised.

How to reproduce

Call snp_effects() on any variant where:

  • len(ref) > 1 and len(alt) > 1 (not a simple insertion or deletion)
  • len(ref) != len(alt) (not a pure MNP)
  • (len(alt) - len(ref)) % 3 == 0 (in-frame, not a frameshift)

Example: ref="GCC", alt="GCCATG" at a CDS position.

Proposed fix

Replace the TODO branch with:

effect = base_effect._replace(effect="CODON_CHANGE", impact="MODERATE")

This is consistent with how pure MNPs are already handled in the elif branch directly above. Both cases represent in-frame changes to one or more codons with no frameshift.

Related

This is related to issue #1180. I will submit a PR with the fix and a regression test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions