Skip to content

Conversation

@sneakers-the-rat
Copy link
Collaborator

@sneakers-the-rat sneakers-the-rat commented Jan 13, 2026

Fix: #69
Fix: #96

Builds on: #127

Opening as an early draft just to track progress.

currently at a point where the iterators exhaust and prematurely close the runner

  • Separate start/stop and init/deinit - have intermediate state whre nodes stay inited but aren't actively running. also need to keep scheduler and store after failures, so probably move clearing those to the init method.
  • iter method to iterate through results
  • run method with unlimited n
  • run method with fixed n
  • run method with iter
  • test store clearing on all nodes
  • figure out why zmqrunner command node can escape pytest stdout capture

Along the way we resolved a few other things

  • Figured out that one source of the zmq thread/async deadlock might be threading locks held within the logging modules when the processes are forked, so ensured that loggers in child processes create their own file and stdout loggers. this ends up being much cleaner than the prior implementation, where we not have one logging file per process rather than one per process per logger, which created an ungodly spray of log files
  • along the same lines, we also added a 'ping' loop to the zmq startup, where sometimes (esp on mac in the github runner) the nodes are slow to start listening to events in pub/sub sockets. this loop prompts them to re-identify themselves, which then sends additional announce messages, which then causes them to update their states, which then appropriately releases the "ready" lock in the command node.
  • this let us turn off debug logging by default in testing, which was a weird crutch that kept things running on mac in the CI runners...

📚 Documentation preview 📚: https://noob--129.org.readthedocs.build/en/129/

@coveralls
Copy link

coveralls commented Jan 13, 2026

Coverage Status

coverage: 87.551% (+1.3%) from 86.203%
when pulling 197fbd0 on zmq-freerun
into 90ed7df on main.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 16, 2026

Merging this PR will not alter performance

✅ 7 untouched benchmarks
⏩ 7 skipped benchmarks1


Comparing zmq-freerun (197fbd0) with main (90ed7df)

Open in CodSpeed

Footnotes

  1. 7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@sneakers-the-rat sneakers-the-rat force-pushed the zmq-freerun branch 2 times, most recently from 0db6288 to 3d0c251 Compare January 17, 2026 06:25
@sneakers-the-rat sneakers-the-rat linked an issue Jan 23, 2026 that may be closed by this pull request
@sneakers-the-rat sneakers-the-rat marked this pull request as ready for review January 24, 2026 00:32
nonlocal finished

finished = True
with work_ready:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ok to be not nonlocal? clearly it's working since tests are passing but why

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean? this is nonlocal because it's referring to finished which is defined in the outer function scope, and we are modifying it, so it needs to be nonlocal (otherwise assigning to it would just create finished in the inner _done scope.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry i was asking about work_ready

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha my b. No thats fine since we dont rebind work_ready

@sneakers-the-rat sneakers-the-rat merged commit 5fff4b7 into main Jan 27, 2026
27 of 28 checks passed
@sneakers-the-rat sneakers-the-rat deleted the zmq-freerun branch January 27, 2026 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spradically Hanging ZMQ Runner Free running mode for zmq runner

3 participants