Skip to content

Conversation

@justinwlin
Copy link
Contributor

@justinwlin justinwlin commented Jun 12, 2025

Description

Running runpod --help crashes the runpod python sdk b/c we are starting a multithreaded process is causing the crash.

The fix is to not create the object until methods are specifically called. So at initialization we should initialize to None and only when the classes are acted on, do we initialize and member variables for multithreading

Debug Stack:

Error when I ran --help

(venv) justinwlin@Justins-MBP-2 runpod-python % runpod
...
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
       ...
    buf = self._recv(4)
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/multiprocessing/connection.py", line 399, in _recv
    raise EOFError
EOFError
(venv) justinwlin@Justins-MBP-2 runpod-python % 

The main thing to look at is:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

Looking deeper into why the reasoning is:

runpod --help
  ↓
imports runpod.cli
  ↓  
imports runpod.cli.groups.pod
  ↓
imports runpod.serverless.modules.rp_job 
  ↓
job_progress = JobsProgress()  # This line runs at import time
  ↓
JobsProgress.__init__() runs
  ↓
Manager() gets created  # ❌ Multiprocessing object created during import

rp_job.py

JOB_GET_URL = str(os.environ.get("RUNPOD_WEBHOOK_GET_JOB")).replace("$ID", WORKER_ID)

log = RunPodLogger()
job_progress = JobsProgress()

How to Test:

I wasn't sure how to test, so I just ran the tests again and had tests generated by claude for --help command for sanity test in the future.

Passing status quo cases

Screenshot 2025-06-12 at 6 17 52 PM

Showing that without the fix we can capture --help errors in the future:

Screenshot 2025-06-12 at 6 38 15 PM

@justinwlin justinwlin requested a review from deanq June 12, 2025 22:19
@justinwlin justinwlin force-pushed the fix-error-lazy-initialization-singleton branch 3 times, most recently from a755a8e to 9ac6602 Compare June 12, 2025 22:26
@justinwlin justinwlin force-pushed the fix-error-lazy-initialization-singleton branch from 9ac6602 to 6290b27 Compare June 12, 2025 22:27
@deanq
Copy link
Member

deanq commented Jun 15, 2025

After many iterations, I have decided to start from scratch to fix this without hacks. Thanks for the initial sweep, @justinwlin . Please refer to this PR #430 for the final fix.

@deanq deanq closed this Jun 15, 2025
@deanq deanq deleted the fix-error-lazy-initialization-singleton branch June 17, 2025 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants