You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Migrate Avro reader to arrow-avro and remove internal conversion code (#17861)
## Which issue does this PR close?
- Closes#14097
## Rationale for this change
DataFusion previously maintained custom Avro-to-Arrow conversion logic.
This PR migrates Avro reading to `arrow-avro` to align behavior with
upstream Arrow and remove duplicated implementation.
## What changes are included in this PR?
- Switched DataFusion Avro reader path to `arrow-avro` (`ReaderBuilder`)
- Removed internal/legacy Avro conversion paths that are no longer
needed
- Updated crate wiring to use `arrow-avro` and removed prior
`apache-avro` dependency usage in affected paths
- Updated Avro projection flow to use `arrow-avro` projection support
- Added/updated upgrade documentation for Avro API and behavior changes
## Are these changes tested?
Yes.
- Added/updated Avro reader unit tests in `datafusion/datasource-avro`
(including projection and timestamp logical types)
- Updated SQL logic tests in
`datafusion/sqllogictest/test_files/avro.slt`
- Integration is covered by existing CI/test suites for affected crates
## Are there any user-facing changes?
Yes.
1. `DataFusionError::AvroError` is removed.
2. `From<apache_avro::Error> for DataFusionError` is removed.
3. Re-export changed from `datafusion::apache_avro` to
`datafusion::arrow_avro`.
4. Avro feature wiring changed:
- `datafusion` crate `avro` feature no longer enables
`datafusion-common/avro`
- `datafusion-proto` crate `avro` feature no longer enables
`datafusion-common/avro`
5. Avro decoding behavior now follows `arrow-avro` semantics, including:
- Avro `string` values being read as Arrow `Binary` in this path
- `timestamp-*` logical types read as UTC timezone-aware timestamps
(`Timestamp(..., Some("+00:00"))`)
- `local-timestamp-*` remaining timezone-naive (`Timestamp(..., None)`)
Upgrade notes are documented in:
`docs/source/library-user-guide/upgrading/53.0.0.md`
---------
Co-authored-by: Connor Sanders <170039284+jecsand838@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
0 commit comments