Skip to content

[BUG] Edge case fix for first line of big file if the BUFFER_SIZE is bigger than the remaining bytes #15

@navidpadid

Description

@navidpadid

If the remaining chunk is smaller than the BUFFER_SIZE and it is the last chunk to read (very unlikely to happen, since large file already contains ~millions of lines), then the read bytes will interfere (duplicates with the buffer variable):

buffer = file.read(BUFFER_SIZE) + buffer

Approach:

            while position > 0 and len(result) < lines_to_read:
                prev_position = position
                position = max(0, position - BUFFER_SIZE)
                file.seek(position)
                if position == 0:
                    buffer = file.read(prev_position) + buffer
                else:
                    buffer = file.read(BUFFER_SIZE) + buffer

                cur_lines = buffer.split(b'\n')
                if position != 0:
                    buffer = cur_lines.pop(0)  # Keep the last partial line for the next read
                for line in reversed(cur_lines):
                    if len(line) > 0:
                        result.appendleft(line.decode() + '\n')
                        if len(result) == lines_to_read:
                            break

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions