Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add each_line enumerator to IO class #10

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ianfixes
Copy link

@ianfixes ianfixes commented Jan 25, 2020

This enables the following:

require 'piperator'
require 'pathname'

# Open a set of files as if they were one big file, like the "cat" command does
# works anywhere you'd use File.open("one.txt").each_line, only 1 line in memory at a time
Piperator.infinite_io(Pathname.glob("*.txt").map(&:each_line).reduce(:+)) do |io| 
  io.each_line { |line| puts line }
end

# Ruby 2.3 version
Piperator.infinite_io(Pathname.glob("*.txt").map(&:each_line).lazy.flat_map(&:lazy)) do |io| 
  io.each_line { |line| puts line }
end

@ianfixes ianfixes force-pushed the 2020-01-25_io_each_line branch from fbe0b4e to 59eb99b Compare January 25, 2020 05:40
lib/piperator/io.rb Outdated Show resolved Hide resolved
@lautis
Copy link
Owner

lautis commented Jan 29, 2020

In the Ruby IO object, each_line takes the line separator and limit as arguments. Would that be possible to include here?

https://ruby-doc.org/core-2.5.1/IO.html#method-i-each_line

@ianfixes
Copy link
Author

I ran into trouble with this, and the only way around it seems to be to use the .eager method that was added in Ruby 2.7 to turn a lazy enumerator into a "regular" one.

My goal here is to be able to take a lazy enumerator (like you'd use for an infinite sequence) and use it to create an infinitely long IO stream. In other words, to drive a stream of bytes from a generator function.

The problem I'm running into here is that either I implement it as a lazy enumerator (which causes each_line to return no data), or as a "regular" enumerator (which causes an out of memory error -- it tries to read the entire infinite stream before continuing).

But yes, if I can find away around that, I'd add those other methods.

Do you have any ideas for how to accomplish this? The last thing I tried was to create a child class of StringIO and try to write data to it from my Enumerator::next every time its buffer became empty. It didn't seem to work.

@ianfixes
Copy link
Author

I found a solution to this while developing an unrelated project: use IO.pipe to handle all the buffering. That solves both the lazy/eager enumerator problem and the buffering. Contributing it here in case it is of interest.

@ianfixes ianfixes force-pushed the 2020-01-25_io_each_line branch 3 times, most recently from a51c4ae to 1be2c16 Compare June 29, 2021 17:19
@ianfixes ianfixes force-pushed the 2020-01-25_io_each_line branch from 1be2c16 to 466417e Compare June 29, 2021 17:20
@ianfixes
Copy link
Author

@lautis

In the Ruby IO object, each_line takes the line separator and limit as arguments. Would that be possible to include here?

I worked around your suggestion by yielding a literal core IO object from the function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants