Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ordered as the default behavior #54

Open
wookayin opened this issue Jul 16, 2020 · 3 comments
Open

Make ordered as the default behavior #54

wookayin opened this issue Jul 16, 2020 · 3 comments

Comments

@wookayin
Copy link

wookayin commented Jul 16, 2020

map() preserving the order is much more intuitive behavior. Python's builtin Pool executor, ray, joblib, etc. all work in such a way.

I realized that one can still pipe to pl.process.ordered, but the documentation is limited and this is quite difficult to use.

def slow_identity(x):
   time.sleep(random.random())
   return x

s = list(range(100)) | pl.process.map(slow_identity, workers=N)
list(s)     # should be ordered by default
@cgarciae
Copy link
Owner

Hey @wookayin,

Implementing ordering efficiently can get very tricky if you consider multi-stage pipelines containing transformations like filter and flat_map. The current implementation of ordered is pessimistic and has to wait for all the elements to come in before yielding.

I think the example from the ordered documentation should be able for people to get started, but would be happy to improve if you give some feedback.

@cgarciae
Copy link
Owner

cgarciae commented Jul 16, 2020

I don't agree that stages should order by default since its a slower operation. We can optimize ordered for simple cases and add ordered shortcut flag to map.

What do you think?

@gtadamson
Copy link

I agree with @cgarciae that ordering by default would be undesirable if it slowed down the speed of the full pypeln

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants