Commit f686059
committed
perf: optimize the json newline scanning
This is an alternative approach to
#19687
Instead of reading the entire range in the json FileOpener, implement an
AlignedBoundaryStream which scans the range for newlines as the
FileStream requests data from the stream, by wrapping the original
stream returned by the ObjectStore.
This eliminated the overhead of the extra two get_opts requests needed
by calculate_range and more importantly, it allows for efficient
read-ahead implementations by the underlying ObjectStore. Previously
this was inefficient because the streams opened by calculate_range
included a stream from (start - 1) to file_size and another one from
(end - 1) to end_of_file, just to find the two relevant newlines.1 parent 15bc6bd commit f686059
3 files changed
Lines changed: 817 additions & 31 deletions
0 commit comments