Skip to content

Commit f686059

Browse files
committed
perf: optimize the json newline scanning
This is an alternative approach to #19687 Instead of reading the entire range in the json FileOpener, implement an AlignedBoundaryStream which scans the range for newlines as the FileStream requests data from the stream, by wrapping the original stream returned by the ObjectStore. This eliminated the overhead of the extra two get_opts requests needed by calculate_range and more importantly, it allows for efficient read-ahead implementations by the underlying ObjectStore. Previously this was inefficient because the streams opened by calculate_range included a stream from (start - 1) to file_size and another one from (end - 1) to end_of_file, just to find the two relevant newlines.
1 parent 15bc6bd commit f686059

3 files changed

Lines changed: 817 additions & 31 deletions

File tree

0 commit comments

Comments
 (0)