You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
I too am about to start using your great lib in production, so...
I'd like to eventually help out.
I do not currently satisfy the clauses in #112 to become maintainer, but:
I think i have found a bug and maybe i can write a PR to fix it.
Before i go dig though i'd like a feedback.
if you think I am doing something wrong or if I am onto somtething.
So i might satisfy those clauses after this potential bug-hunt.
tldr;
in a mixed async-task/multiprocess pipeline i observe a file descriptor leak
by looking at (h)top you see the leaking subprocs, but the included scripts
meaures the leak using psutils.
The leak only occurs in the mixed setting and if the async stage preceeds the multiporcess one.
Minimal code to reproduce
importpypelnasplimporttimeimportpsutilimportasyncioimportgc# ---- unit functions -----deffilter_int(x):
returnisinstance(x, int)
# --- sync implems ---defop_1(x, cost_sec=0.01):
time.sleep(cost_sec)
returnx+1defop_2(x, cost_sec=0.01):
time.sleep(cost_sec)
returnx**2# --- async implems ----asyncdefaop_1(x, cost_sec=0.01):
awaitasyncio.sleep(cost_sec)
returnx+1asyncdefaop_2(x, cost_sec=0.01):
awaitasyncio.sleep(cost_sec)
returnx**2defsink_print(x):
print("===:", x, end="\r")
if__name__=="__main__":
REPEATS=10# grab master process infoself_proc=psutil.Process()
fds= []
forrepinrange(REPEATS):
xs=range(100)
stage= (
xs|pl.sync.filter(filter_int)
# run the first stage as async task|pl.task.map(aop_1, workers=4)
# the leak does not arise if both stages are process based# | pl.process.map(op_1, workers=4)|pl.process.map(op_2, workers=4)
|pl.sync.each(sink_print)
)
# without partial## stage = pl.sync.filter(filter_int, xs)# stage = pl.task.map(aop_1, stage, workers=4)# stage = pl.process.map(op_2, stage, workers=4)# stage = pl.sync.each(sink_print, stage)pl.sync.run(stage)
fds.append(self_proc.num_fds())
print(f"\n[{rep}] FDS:", self_proc.num_fds())
# forcing colletion does not solve# gc.collect()print(fds)
Observed
The output if using sync -> task -> process combination
A clear and concise description of what you expected to happen.
Library Info
pypeln == 0.4.9
psutil == 5.9.8
Additional context
Context:
say we have a 4 stage pipeline where
stage 1 is a fixed filter
stages 2-3 are map operations that have a 10 millisecond time cost
standing for some cpu-bound load.
stage 4 is sequential write operation.
say we have, for stages 2 and 3, the option to execute
the code locally (as a regular def)
or simulate delegating to a service rpc style
(here reprsented by the async def version of the operation unit-function).
in the small example this means chainging the time.sleep faking the cpu bound
load into an asyncio.sleep simulating the same but off-loaded over the network.
So could have mixed variations such as:
filter
op_1 (proc)
op_2 (task)
write
or
filter
op_1 (task)
op_2 (proc)
write
The text was updated successfully, but these errors were encountered:
Describe the bug
NOTE: This is also a response to #112
Hey,
I too am about to start using your great lib in production, so...
I'd like to eventually help out.
I do not currently satisfy the clauses in #112 to become maintainer, but:
I think i have found a bug and maybe i can write a PR to fix it.
Before i go dig though i'd like a feedback.
if you think I am doing something wrong or if I am onto somtething.
So i might satisfy those clauses after this potential bug-hunt.
tldr;
Minimal code to reproduce
Observed
The output if using sync -> task -> process combination
Growing number of open file descriptors
Expected behavior
Which is what happens if we do not mix process/async
Note the actual number of FDS (6) is not important but rather
that it stays constant
A clear and concise description of what you expected to happen.
Library Info
Additional context
Context:
say we have a 4 stage pipeline where
standing for some cpu-bound load.
say we have, for stages 2 and 3, the option to execute
the code locally (as a regular def)
or simulate delegating to a service rpc style
(here reprsented by the async def version of the operation unit-function).
in the small example this means chainging the time.sleep faking the cpu bound
load into an asyncio.sleep simulating the same but off-loaded over the network.
So could have mixed variations such as:
or
The text was updated successfully, but these errors were encountered: