Fix: Replace Broad except Exception Anti-Pattern in phenotypes.py
Description
While exploring the API and testing robust error handling for a GSoC 2026 proposal, a recurring anti-pattern was identified in malariagen_data/anoph/phenotypes.py. The code uses broad except Exception: clauses to catch all errors and convert them into mere warnings, causing actual bugs to be silently ignored and returning empty or corrupted data instead of failing loudly.
Where the Problem Is
There are 5 instances of except Exception: in phenotypes.py doing exactly this:
1. Lines 73–75: Loading phenotype data
except Exception as e:
warnings.warn(f"Unexpected error loading phenotype data for {sample_set}: {e}")
2. Lines 310–313: Merging phenotype and variant data
except Exception as e:
warnings.warn(f"Unexpected error selecting/merging variant data: {e}")
return ds
3. Lines 351–353: Fetching sample metadata
except Exception as e:
warnings.warn(f"Error fetching sample metadata: {e}")
return pd.DataFrame()
4. Lines 376–378: Evaluating sample_query
except Exception as e:
warnings.warn(f"Error applying sample_query '{sample_query}': {e}")
return pd.DataFrame()
5. Lines 495–498: Checking if phenotype path exists
if self._fs.exists(phenotype_path):
phenotype_sample_sets.append(sample_set)
except Exception:
continue
Why This Is a Serious Problem
1. Silently Masked Bugs
A simple typo in a query string, a KeyError from a missing sample_id column, or a TypeError from Pandas will all trigger the except Exception block. Instead of raising a traceback to tell the user what went wrong, the code silently warns and returns an empty pd.DataFrame() or an unmodified dataset.
2. Data Integrity Risk
Users might continue their downstream scientific analysis without realizing their phenotype query failed or returned truncated results. In bioinformatics pipelines, an empty result caused by a TypeError should fail the pipeline, not just print a warning and proceed with 0 records.
3. Debugging Nightmare
If a bug originates deep inside Dask, Pandas, or the fsspec filesystem layer, catching Exception destroys the stack trace, making it much harder for maintainers to fix issues that users report.
Proposed Fix
Replace these broad catch-all exceptions with the specific exception classes we actually expect to handle (e.g., FileNotFoundError, pd.errors.UndefinedVariableError), and let unexpected errors crash naturally with a full traceback.
Example 1: Applying a query (Lines 376–378)
Before:
except Exception as e:
warnings.warn(f"Error applying sample_query '{sample_query}': {e}")
return pd.DataFrame()
After:
except pd.errors.UndefinedVariableError as e: # Catch specifically bad query syntax
warnings.warn(f"Invalid sample_query '{sample_query}': {e}")
return pd.DataFrame()
# Let unexpected TypeErrors/KeyErrors propagate naturally
Example 2: Checking if phenotype path exists (Lines 495–498)
Before:
try:
if self._fs.exists(phenotype_path):
phenotype_sample_sets.append(sample_set)
except Exception:
continue
After:
# Rather than try/except Exception, let network/auth errors propagate
if self._fs.exists(phenotype_path):
phenotype_sample_sets.append(sample_set)
Summary
| Location |
Current Behaviour |
Proposed Fix |
| Lines 73–75 |
Catches all errors on load, warns |
Catch FileNotFoundError, IOError only |
| Lines 310–313 |
Catches all merge errors, returns partial ds |
Catch KeyError, ValueError only |
| Lines 351–353 |
Catches all metadata errors, returns empty DataFrame |
Catch KeyError, AttributeError only |
| Lines 376–378 |
Catches all query errors, returns empty DataFrame |
Catch pd.errors.UndefinedVariableError only |
| Lines 495–498 |
Swallows all filesystem errors silently |
Remove try/except, let errors propagate |
This change will make the codebase safer, more debuggable, and better aligned with the principle of failing loudly when something unexpected goes wrong.
Fix: Replace Broad
except ExceptionAnti-Pattern inphenotypes.pyDescription
While exploring the API and testing robust error handling for a GSoC 2026 proposal, a recurring anti-pattern was identified in
malariagen_data/anoph/phenotypes.py. The code uses broadexcept Exception:clauses to catch all errors and convert them into mere warnings, causing actual bugs to be silently ignored and returning empty or corrupted data instead of failing loudly.Where the Problem Is
There are 5 instances of
except Exception:inphenotypes.pydoing exactly this:1. Lines 73–75: Loading phenotype data
2. Lines 310–313: Merging phenotype and variant data
3. Lines 351–353: Fetching sample metadata
4. Lines 376–378: Evaluating
sample_query5. Lines 495–498: Checking if phenotype path exists
Why This Is a Serious Problem
1. Silently Masked Bugs
A simple typo in a query string, a
KeyErrorfrom a missingsample_idcolumn, or aTypeErrorfrom Pandas will all trigger theexcept Exceptionblock. Instead of raising a traceback to tell the user what went wrong, the code silently warns and returns an emptypd.DataFrame()or an unmodified dataset.2. Data Integrity Risk
Users might continue their downstream scientific analysis without realizing their phenotype query failed or returned truncated results. In bioinformatics pipelines, an empty result caused by a
TypeErrorshould fail the pipeline, not just print a warning and proceed with0records.3. Debugging Nightmare
If a bug originates deep inside
Dask,Pandas, or thefsspecfilesystem layer, catchingExceptiondestroys the stack trace, making it much harder for maintainers to fix issues that users report.Proposed Fix
Replace these broad catch-all exceptions with the specific exception classes we actually expect to handle (e.g.,
FileNotFoundError,pd.errors.UndefinedVariableError), and let unexpected errors crash naturally with a full traceback.Example 1: Applying a query (Lines 376–378)
Before:
After:
Example 2: Checking if phenotype path exists (Lines 495–498)
Before:
After:
Summary
FileNotFoundError,IOErroronlydsKeyError,ValueErroronlyKeyError,AttributeErroronlypd.errors.UndefinedVariableErroronlyThis change will make the codebase safer, more debuggable, and better aligned with the principle of failing loudly when something unexpected goes wrong.