Skip to content

Unsanitized User Input Passed to DataFrame.eval() Enables Arbitrary Code Execution #1292

@kunal-10-cloud

Description

@kunal-10-cloud

Description

Multiple methods pass user-supplied sample_query strings directly to pandas.DataFrame.eval() using the Python engine, which can execute arbitrary Python code. In base.py:984, hap_data.py:423, and genome_features.py:121, user-controlled strings are interpolated into .query() / .eval() without any sanitization or allowlisting.


How to Reproduce

import malariagen_data
ag3 = malariagen_data.Ag3()
 
# Benign query — works as expected:
ag3.sample_metadata(sample_query="country == 'Ghana'")
 
# Malicious query — executes arbitrary Python via pandas eval(engine="python"):
ag3.sample_metadata(sample_query="__import__('os').system('echo PWNED') or country == 'Ghana'")

The vulnerability exists because base.py:984 calls:

loc_samples = df_samples.eval(sample_query, **sample_query_options, engine="python")

The engine="python" flag gives the eval engine access to the full Python runtime, not just the restricted numexpr engine.

Similarly, genome_features.py:121 interpolates the contig parameter directly:

df = df.query(f"contig == '{contig}'")

Why It Is Important

This is a code injection vulnerability — any function accepting a sample_query parameter is an attack surface.

  • The library is used in shared Jupyter notebook environments (Google Colab, institutional JupyterHubs) where queries may come from URL parameters, config files, or shared notebooks.
  • While pandas documents this risk, a scientific library should not expose it to researchers who may not be security-aware.
  • OWASP classifies injection as a top-10 vulnerability category.

Affected Locations

File Line Sink
base.py 984 df_samples.eval(sample_query, engine="python")
hap_data.py 423 DataFrame.eval() with user input
genome_features.py 121 df.query(f"contig == '{contig}'")

Expected Impact After Resolution

  • User inputs are validated against an allowlist of safe column names and operators before reaching eval(), or the numexpr engine is used where possible.
  • Researchers using the library in shared environments are protected from injection via query strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions