Skip to content

added support for MapFromEntries#21720

Open
athlcode wants to merge 8 commits intoapache:mainfrom
athlcode:support/MapFromEntries
Open

added support for MapFromEntries#21720
athlcode wants to merge 8 commits intoapache:mainfrom
athlcode:support/MapFromEntries

Conversation

@athlcode
Copy link
Copy Markdown

@athlcode athlcode commented Apr 18, 2026

Which issue does this PR close?

Closes # (none: prerequisite for apache/datafusion-comet#2706; follow-up to #17779 and #19274).

Rationale for this change

Spark's default mapKeyDedupPolicy is EXCEPTION, raising SparkRuntimeException with error class DUPLICATED_MAP_KEY on duplicate map keys. The existing Spark map_from_entries / map_from_arrays UDFs silently kept the last occurrence, forcing downstream engines like datafusion-comet to fall back to Spark.

What changes are included in this PR?

Duplicate map keys in Spark map_from_entries, map_from_arrays, and str_to_map now raise [DUPLICATED_MAP_KEY] Duplicate map key {key} was found, matching Spark's default behaviour and error class.

Are these changes tested?

Yes, via sqllogictest assertions covering the new error across the affected Spark map UDFs.

Are there any user-facing changes?

Yes. Duplicate keys now raise [DUPLICATED_MAP_KEY] under the default policy instead of silently collapsing to the last occurrence. No new config keys, no API changes.

@github-actions github-actions bot added common Related to common crate spark labels Apr 18, 2026
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) and removed common Related to common crate labels Apr 18, 2026
Copy link
Copy Markdown
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @athlcode. Since this changes the current behavior, I think it would be a good opportunity to add a config that allows switching between exception and last key wins (e.g., datafusion.execution.mapKeyDedupPolicy). Otherwise existing code relying on the last key wins policy won't be able to adapt.

cc: @Jefffrey and @comphead since you both reviewed the original map_from_entries.

Comment thread datafusion/spark/src/function/map/utils.rs Outdated
@coderfender
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good @athlcode . Left some comments and I would like to see some more tests covering various edge cases both on. the rust side and SLT side to add more confidence

Comment thread datafusion/spark/src/function/map/utils.rs Outdated
Comment thread datafusion/sqllogictest/test_files/spark/map/map_from_entries.slt
@athlcode
Copy link
Copy Markdown
Author

The PR looks good @athlcode . Left some comments and I would like to see some more tests covering various edge cases both on. the rust side and SLT side to add more confidence

yep fixing comments and will add tests

@athlcode
Copy link
Copy Markdown
Author

Thanks @athlcode. Since this changes the current behavior, I think it would be a good opportunity to add a config that allows switching between exception and last key wins (e.g., datafusion.execution.mapKeyDedupPolicy). Otherwise existing code relying on the last key wins policy won't be able to adapt.

cc: @Jefffrey and @comphead since you both reviewed the original map_from_entries.

thanks @nuno-faria. I am looking into this as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants