Skip to content

ORC-2161: [C++] UnionColumnReader should reject out-of-range union tags#2618

Closed
wgtmac wants to merge 1 commit intoapache:mainfrom
wgtmac:fix_union
Closed

ORC-2161: [C++] UnionColumnReader should reject out-of-range union tags#2618
wgtmac wants to merge 1 commit intoapache:mainfrom
wgtmac:fix_union

Conversation

@wgtmac
Copy link
Copy Markdown
Member

@wgtmac wgtmac commented May 7, 2026

What changes were proposed in this pull request?

This PR adds validation for C++ union tag values before they are used as child indexes.

The change covers:

  • UnionColumnReader::skip
  • UnionColumnReader::nextInternal
  • UnionColumnPrinter::printRow

If a malformed ORC file contains a union tag that is greater than or equal to the number of union children, the C++ reader/printer now throws ParseError instead of indexing out of bounds.

Why are the changes needed?

Union tags are decoded from the ORC data stream as byte values, but the valid range depends on the number of union children. Malformed input can contain a tag outside that range. The C++ reader previously trusted the tag value directly when indexing per-child state.

This patch makes malformed union tags fail cleanly.

How was this patch tested?

Added C++ unit tests for a two-child union with invalid tag value 200, covering:

  • next
  • next with nulls
  • skip
  • ColumnPrinter

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex GPT-5.

@wgtmac wgtmac marked this pull request as ready for review May 7, 2026 08:44
Copy link
Copy Markdown
Contributor

@ffacs ffacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~ Thank you for the fix @wgtmac .

@wgtmac
Copy link
Copy Markdown
Member Author

wgtmac commented May 7, 2026

Thanks for the quick review, @ffacs!

@wgtmac wgtmac closed this in 3563ee5 May 7, 2026
wgtmac added a commit that referenced this pull request May 7, 2026
### What changes were proposed in this pull request?

This PR adds validation for C++ union tag values before they are used as child indexes.

The change covers:
- UnionColumnReader::skip
- UnionColumnReader::nextInternal
- UnionColumnPrinter::printRow

If a malformed ORC file contains a union tag that is greater than or equal to the number of union children, the C++ reader/printer now throws ParseError instead of indexing out of bounds.

### Why are the changes needed?

Union tags are decoded from the ORC data stream as byte values, but the valid range depends on the number of union children. Malformed input can contain a tag outside that range. The C++ reader previously trusted the tag value directly when indexing per-child state.

This patch makes malformed union tags fail cleanly.

### How was this patch tested?

Added C++ unit tests for a two-child union with invalid tag value 200, covering:
- next
- next with nulls
- skip
- ColumnPrinter

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex GPT-5.

Closes #2618 from wgtmac/fix_union.

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Gang Wu <ustcwg@gmail.com>
(cherry picked from commit 3563ee5)
Signed-off-by: Gang Wu <ustcwg@gmail.com>
wgtmac added a commit that referenced this pull request May 7, 2026
### What changes were proposed in this pull request?

This PR adds validation for C++ union tag values before they are used as child indexes.

The change covers:
- UnionColumnReader::skip
- UnionColumnReader::nextInternal
- UnionColumnPrinter::printRow

If a malformed ORC file contains a union tag that is greater than or equal to the number of union children, the C++ reader/printer now throws ParseError instead of indexing out of bounds.

### Why are the changes needed?

Union tags are decoded from the ORC data stream as byte values, but the valid range depends on the number of union children. Malformed input can contain a tag outside that range. The C++ reader previously trusted the tag value directly when indexing per-child state.

This patch makes malformed union tags fail cleanly.

### How was this patch tested?

Added C++ unit tests for a two-child union with invalid tag value 200, covering:
- next
- next with nulls
- skip
- ColumnPrinter

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex GPT-5.

Closes #2618 from wgtmac/fix_union.

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Gang Wu <ustcwg@gmail.com>
(cherry picked from commit 3563ee5)
Signed-off-by: Gang Wu <ustcwg@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants