Description
Multiple methods pass user-supplied sample_query strings directly to pandas.DataFrame.eval() using the Python engine, which can execute arbitrary Python code. In base.py:984, hap_data.py:423, and genome_features.py:121, user-controlled strings are interpolated into .query() / .eval() without any sanitization or allowlisting.
How to Reproduce
import malariagen_data
ag3 = malariagen_data.Ag3()
# Benign query — works as expected:
ag3.sample_metadata(sample_query="country == 'Ghana'")
# Malicious query — executes arbitrary Python via pandas eval(engine="python"):
ag3.sample_metadata(sample_query="__import__('os').system('echo PWNED') or country == 'Ghana'")
The vulnerability exists because base.py:984 calls:
loc_samples = df_samples.eval(sample_query, **sample_query_options, engine="python")
The engine="python" flag gives the eval engine access to the full Python runtime, not just the restricted numexpr engine.
Similarly, genome_features.py:121 interpolates the contig parameter directly:
df = df.query(f"contig == '{contig}'")
Why It Is Important
This is a code injection vulnerability — any function accepting a sample_query parameter is an attack surface.
- The library is used in shared Jupyter notebook environments (Google Colab, institutional JupyterHubs) where queries may come from URL parameters, config files, or shared notebooks.
- While pandas documents this risk, a scientific library should not expose it to researchers who may not be security-aware.
- OWASP classifies injection as a top-10 vulnerability category.
Affected Locations
| File |
Line |
Sink |
base.py |
984 |
df_samples.eval(sample_query, engine="python") |
hap_data.py |
423 |
DataFrame.eval() with user input |
genome_features.py |
121 |
df.query(f"contig == '{contig}'") |
Expected Impact After Resolution
- User inputs are validated against an allowlist of safe column names and operators before reaching
eval(), or the numexpr engine is used where possible.
- Researchers using the library in shared environments are protected from injection via query strings.
Description
Multiple methods pass user-supplied
sample_querystrings directly topandas.DataFrame.eval()using the Python engine, which can execute arbitrary Python code. Inbase.py:984,hap_data.py:423, andgenome_features.py:121, user-controlled strings are interpolated into.query()/.eval()without any sanitization or allowlisting.How to Reproduce
The vulnerability exists because
base.py:984calls:The
engine="python"flag gives the eval engine access to the full Python runtime, not just the restrictednumexprengine.Similarly,
genome_features.py:121interpolates thecontigparameter directly:Why It Is Important
This is a code injection vulnerability — any function accepting a
sample_queryparameter is an attack surface.Affected Locations
base.pydf_samples.eval(sample_query, engine="python")hap_data.pyDataFrame.eval()with user inputgenome_features.pydf.query(f"contig == '{contig}'")Expected Impact After Resolution
eval(), or thenumexprengine is used where possible.