Skip to content

Bug: phenotypes.py swallows all exceptions via broad except Exception, silently masking data and logic errors #1160

@khushthecoder

Description

@khushthecoder

Fix: Replace Broad except Exception Anti-Pattern in phenotypes.py

Description

While exploring the API and testing robust error handling for a GSoC 2026 proposal, a recurring anti-pattern was identified in malariagen_data/anoph/phenotypes.py. The code uses broad except Exception: clauses to catch all errors and convert them into mere warnings, causing actual bugs to be silently ignored and returning empty or corrupted data instead of failing loudly.


Where the Problem Is

There are 5 instances of except Exception: in phenotypes.py doing exactly this:

1. Lines 73–75: Loading phenotype data

except Exception as e:
    warnings.warn(f"Unexpected error loading phenotype data for {sample_set}: {e}")

2. Lines 310–313: Merging phenotype and variant data

except Exception as e:
    warnings.warn(f"Unexpected error selecting/merging variant data: {e}")
return ds

3. Lines 351–353: Fetching sample metadata

except Exception as e:
    warnings.warn(f"Error fetching sample metadata: {e}")
return pd.DataFrame()

4. Lines 376–378: Evaluating sample_query

except Exception as e:
    warnings.warn(f"Error applying sample_query '{sample_query}': {e}")
return pd.DataFrame()

5. Lines 495–498: Checking if phenotype path exists

if self._fs.exists(phenotype_path):
    phenotype_sample_sets.append(sample_set)
except Exception:
    continue

Why This Is a Serious Problem

1. Silently Masked Bugs

A simple typo in a query string, a KeyError from a missing sample_id column, or a TypeError from Pandas will all trigger the except Exception block. Instead of raising a traceback to tell the user what went wrong, the code silently warns and returns an empty pd.DataFrame() or an unmodified dataset.

2. Data Integrity Risk

Users might continue their downstream scientific analysis without realizing their phenotype query failed or returned truncated results. In bioinformatics pipelines, an empty result caused by a TypeError should fail the pipeline, not just print a warning and proceed with 0 records.

3. Debugging Nightmare

If a bug originates deep inside Dask, Pandas, or the fsspec filesystem layer, catching Exception destroys the stack trace, making it much harder for maintainers to fix issues that users report.


Proposed Fix

Replace these broad catch-all exceptions with the specific exception classes we actually expect to handle (e.g., FileNotFoundError, pd.errors.UndefinedVariableError), and let unexpected errors crash naturally with a full traceback.

Example 1: Applying a query (Lines 376–378)

Before:

except Exception as e:
    warnings.warn(f"Error applying sample_query '{sample_query}': {e}")
return pd.DataFrame()

After:

except pd.errors.UndefinedVariableError as e:  # Catch specifically bad query syntax
    warnings.warn(f"Invalid sample_query '{sample_query}': {e}")
return pd.DataFrame()
# Let unexpected TypeErrors/KeyErrors propagate naturally

Example 2: Checking if phenotype path exists (Lines 495–498)

Before:

try:
    if self._fs.exists(phenotype_path):
        phenotype_sample_sets.append(sample_set)
except Exception:
    continue

After:

# Rather than try/except Exception, let network/auth errors propagate
if self._fs.exists(phenotype_path):
    phenotype_sample_sets.append(sample_set)

Summary

Location Current Behaviour Proposed Fix
Lines 73–75 Catches all errors on load, warns Catch FileNotFoundError, IOError only
Lines 310–313 Catches all merge errors, returns partial ds Catch KeyError, ValueError only
Lines 351–353 Catches all metadata errors, returns empty DataFrame Catch KeyError, AttributeError only
Lines 376–378 Catches all query errors, returns empty DataFrame Catch pd.errors.UndefinedVariableError only
Lines 495–498 Swallows all filesystem errors silently Remove try/except, let errors propagate

This change will make the codebase safer, more debuggable, and better aligned with the principle of failing loudly when something unexpected goes wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions