A fuzzing tool for Apache DataFusion that tests SQL query execution and helps find potential bugs, crashes, and inconsistencies in query results.
This fuzzer primarily:
- Generates random tables and SQL queries.
- Runs them on DataFusion and checks whether the results satisfy an oracle-defined consistency rule.
Oracle: TLP (Ternary Logic Partitioning)
Random query (Q1):
SELECT * FROM t1;
Mutated query (Q2):
SELECT * FROM t1 WHERE v1 > 0
UNION ALL
SELECT * FROM t1 WHERE NOT (v1 > 0)
UNION ALL
SELECT * FROM t1 WHERE (v1 > 0) IS NULL;
Consistency check:
Q1 and Q2 should return the same multiset of rows.
This project is inspired by SQLancer.
For an introduction to database fuzzing techniques, see this talk by the author of SQLancer: https://youtu.be/Np46NQ6lqP8?si=lSVAU7Jy3H-QtrWV
To run the fuzzer with the default sample configuration:
cargo run --release -- --config fuzzer-default.tomlThis runs the fuzzer against the DataFusion version specified in Cargo.toml.
The config file controls options such as round count, timeout, and log directory.
If a bug is found, use the CLI output and generated log files to reproduce it.
To override values from the configuration file by using CLI arguments:
cargo run --release -- --config fuzzer-default.toml --rounds 5 --queries-per-round 20See fuzzer-default.toml for supported options.
Options:
-c, --config <FILE> Path to config file
-s, --seed <SEED> Random seed [default: 42]
-r, --rounds <ROUNDS> Number of rounds to run
-q, --queries-per-round <QUERIES> Number of queries per round
-t, --timeout <TIMEOUT> Query timeout in seconds
-l, --log-path <LOG_PATH> Path to log file
-d, --display-logs Display logs
--enable-tui Enable TUI display
-h, --help Print help
-V, --version Print version
The runner currently chooses one oracle at random for each test case:
-
NoCrashOracle: checks for non-whitelisted crashes and errors. -
TlpWhereOracle: validates TLP partitioning overWHERE(p,NOT p,p IS NULL) using value-level multiset comparison. -
TlpHavingOracle: validates TLP partitioning overHAVING(p,NOT p,p IS NULL) using value-level multiset comparison. -
NoREC(planned): paper
- WHERE
- SORT + LIMIT/OFFSET
- AGGREGATE
- HAVING
- JOIN
- UNION/UNION ALL/INTERSECT/EXCEPT
- Views
- Scalar subquery
-
Relation-likesubquery
- Operators
- Scalar functions
- Aggregate Functions
- Window Functions
- Complete primitive type coverage
- Time-related types
- Array types
- Struct/JSON
- CLI
- Oracle interface