Skip to content

Improve error messaging when haplotypes_frequencies_advanced yields no cohorts #1170

@karthik642006

Description

@karthik642006

Bug: haplotype_frequencies_advanced fails silently or with cryptic errors when cohort grouping produces zero cohorts

Summary

When haplotype_frequencies_advanced is called with parameters that result in zero cohorts — such as a very high min_cohort_size, a restrictive sample_query, or a narrow area_by / period_by / taxon_by combination — the function does not fail fast. Instead, it proceeds past the cohort construction step and fails later with confusing downstream errors or silently returns empty outputs. There is no explicit guard for this scenario.


Expected Behavior

If cohort grouping produces zero cohorts, the function should raise a clear, user-friendly ValueError immediately after cohort construction, before any downstream computation begins. The error message should explain what went wrong and suggest actionable fixes.

Suggested error message:

No cohorts found after applying filters. Try lowering min_cohort_size,
adjusting area_by/period_by/taxon_by, or relaxing sample_query.

Actual Behavior

  • No explicit check is performed after cohort construction to detect an empty df_cohorts.
  • The function continues execution and either:
    • Raises a non-obvious downstream error unrelated to the root cause, or
    • Returns an empty or malformed output with no indication of what went wrong.
  • Users are left without a clear signal that the cohort grouping step itself produced no results.

Steps to Reproduce

# Example: min_cohort_size set unrealistically high
ag3.haplotype_frequencies_advanced(
    region="3L",
    analysis="gamb_colu",
    min_cohort_size=99999,
    area_by="admin1_iso",
    period_by="year",
)

# Example: overly restrictive sample_query
ag3.haplotype_frequencies_advanced(
    region="3L",
    analysis="gamb_colu",
    sample_query="country == 'Nonexistent'",
    area_by="admin1_iso",
    period_by="year",
)

Impact

  • Users waste time debugging cryptic downstream errors that obscure the true root cause.
  • In exploratory analyses or batch jobs iterating over many parameter combinations, silent empty outputs can lead to misinterpreted results.
  • The failure mode is inconsistent with other parts of the API that already apply fail-fast "no data" patterns.

Proposed Fix

Add an explicit guard immediately after cohort construction:

if df_cohorts.empty:
    raise ValueError(
        "No cohorts found after applying filters. Try lowering min_cohort_size, "
        "adjusting area_by/period_by/taxon_by, or relaxing sample_query."
    )

This aligns with the fail-fast conventions used elsewhere in the API and ensures users receive an immediately actionable error message close to the source of the problem.


Additional Context

  • This pattern of failing early on empty data conditions is already established in other parts of the codebase — haplotype_frequencies_advanced should be consistent with that approach.
  • The fix is minimal, low-risk, and purely additive — no existing behavior is changed when cohorts are non-empty.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions