Skip to content

datafusion-contrib/datafusion-fuzzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataFusion Fuzzer

A fuzzing tool for Apache DataFusion that tests SQL query execution and helps find potential bugs, crashes, and inconsistencies in query results.

Overview

This fuzzer primarily:

  1. Generates random tables and SQL queries.
  2. Runs them on DataFusion and checks whether the results satisfy an oracle-defined consistency rule.

Example

Oracle: TLP (Ternary Logic Partitioning)

Random query (Q1):
    SELECT * FROM t1;

Mutated query (Q2):
    SELECT * FROM t1 WHERE v1 > 0
    UNION ALL
    SELECT * FROM t1 WHERE NOT (v1 > 0)
    UNION ALL
    SELECT * FROM t1 WHERE (v1 > 0) IS NULL;

Consistency check:
    Q1 and Q2 should return the same multiset of rows.

This project is inspired by SQLancer.

For an introduction to database fuzzing techniques, see this talk by the author of SQLancer: https://youtu.be/Np46NQ6lqP8?si=lSVAU7Jy3H-QtrWV

Quick Start

To run the fuzzer with the default sample configuration:

cargo run --release -- --config fuzzer-default.toml

This runs the fuzzer against the DataFusion version specified in Cargo.toml.

The config file controls options such as round count, timeout, and log directory.

If a bug is found, use the CLI output and generated log files to reproduce it.

To override values from the configuration file by using CLI arguments:

cargo run --release -- --config fuzzer-default.toml --rounds 5 --queries-per-round 20

See fuzzer-default.toml for supported options.

Command Line Options

Options:
  -c, --config <FILE>                    Path to config file
  -s, --seed <SEED>                      Random seed [default: 42]
  -r, --rounds <ROUNDS>                  Number of rounds to run
  -q, --queries-per-round <QUERIES>      Number of queries per round
  -t, --timeout <TIMEOUT>                Query timeout in seconds
  -l, --log-path <LOG_PATH>              Path to log file
  -d, --display-logs                     Display logs
      --enable-tui                       Enable TUI display
  -h, --help                             Print help
  -V, --version                          Print version

Roadmap

Implemented Oracles

The runner currently chooses one oracle at random for each test case:

  • NoCrashOracle: checks for non-whitelisted crashes and errors.
  • TlpWhereOracle: validates TLP partitioning over WHERE (p, NOT p, p IS NULL) using value-level multiset comparison.
  • TlpHavingOracle: validates TLP partitioning over HAVING (p, NOT p, p IS NULL) using value-level multiset comparison.
  • NoREC (planned): paper

SQL Features

  • WHERE
  • SORT + LIMIT/OFFSET
  • AGGREGATE
  • HAVING
  • JOIN
  • UNION/UNION ALL/INTERSECT/EXCEPT

SQL Subqueries

  • Views
  • Scalar subquery
  • Relation-like subquery

Expressions

  • Operators
  • Scalar functions
  • Aggregate Functions
  • Window Functions

Types

  • Complete primitive type coverage
  • Time-related types
  • Array types
  • Struct/JSON

Infrastructure

  • CLI
  • Oracle interface

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages