Skip to content

Commit ce15e5d

Browse files
committed
fix
1 parent e99c796 commit ce15e5d

2 files changed

Lines changed: 12 additions & 5 deletions

File tree

datafusion/datasource-parquet/benches/parquet_nested_filter_pushdown.rs

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@ use arrow::array::{
2424
use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
2525
use criterion::{Criterion, Throughput, criterion_group, criterion_main};
2626
use datafusion_common::ScalarValue;
27-
use datafusion_datasource_parquet::{ParquetFileMetrics, build_row_filter, SelectivityTracker};
27+
use datafusion_datasource_parquet::{
28+
ParquetFileMetrics, SelectivityTracker, build_row_filter,
29+
};
2830
use datafusion_expr::{Expr, col};
2931
use datafusion_functions_nested::expr_fn::array_has;
3032
use datafusion_physical_expr::planner::logical2physical;
@@ -115,9 +117,14 @@ fn scan_with_predicate(
115117
let file_metrics = ParquetFileMetrics::new(0, &path.display().to_string(), &metrics);
116118

117119
let builder = if pushdown {
118-
if let Some(row_filter) =
119-
build_row_filter(predicate, file_schema, &metadata, false, &file_metrics, &SelectivityTracker::default())?
120-
{
120+
if let Some(row_filter) = build_row_filter(
121+
predicate,
122+
file_schema,
123+
&metadata,
124+
false,
125+
&file_metrics,
126+
&SelectivityTracker::default(),
127+
)? {
121128
builder.with_row_filter(row_filter)
122129
} else {
123130
builder

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ The following configuration settings are available:
9292
| datafusion.execution.parquet.coerce_int96 | NULL | (reading) If true, parquet reader will read columns of physical type int96 as originating from a different resolution than nanosecond. This is useful for reading data from systems like Spark which stores microsecond resolution timestamps in an int96 allowing it to write values with a larger date range than 64-bit timestamps with nanosecond resolution. |
9393
| datafusion.execution.parquet.bloom_filter_on_read | true | (reading) Use any available bloom filters when reading parquet files |
9494
| datafusion.execution.parquet.max_predicate_cache_size | NULL | (reading) The maximum predicate cache size, in bytes. When `pushdown_filters` is enabled, sets the maximum memory used to cache the results of predicate evaluation between filter evaluation and output generation. Decreasing this value will reduce memory usage, but may increase IO and CPU usage. None means use the default parquet reader setting. 0 means no caching. |
95-
| datafusion.execution.parquet.filter_effectiveness_threshold | 0.8 | (reading) Minimum filter effectiveness threshold for adaptive filter pushdown. Only filters that filter out at least this fraction of rows will be promoted to row filters during adaptive filter pushdown. A value of 1.0 means only filters that filter out all rows will be promoted. A value of 0.0 means all filters will be promoted. Because there can be a high I/O cost to pushing down ineffective filters, recommended values are in the range [0.8, 0.95], depending on random I/0 costs. |
95+
| datafusion.execution.parquet.filter_effectiveness_threshold | 0.8 | (reading) Minimum filter effectiveness threshold for adaptive filter pushdown. Only filters that filter out at least this fraction of rows will be promoted to row filters during adaptive filter pushdown. A value of 1.0 means only filters that filter out all rows will be promoted. A value of 0.0 means all filters will be promoted. Because there can be a high I/O cost to pushing down ineffective filters, recommended values are in the range [0.8, 0.95], depending on random I/0 costs. |
9696
| datafusion.execution.parquet.data_pagesize_limit | 1048576 | (writing) Sets best effort maximum size of data page in bytes |
9797
| datafusion.execution.parquet.write_batch_size | 1024 | (writing) Sets write_batch_size in rows |
9898
| datafusion.execution.parquet.writer_version | 1.0 | (writing) Sets parquet writer version valid values are "1.0" and "2.0" |

0 commit comments

Comments
 (0)