Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No processes were matched" after set parameter KINETO_DAEMON_INIT_DELAY_S #286

Open
KepingYan opened this issue Aug 22, 2024 · 4 comments

Comments

@KepingYan
Copy link

Test script: scripts/pytorch/linear_model_example.py
If I run it with command KINETO_USE_DAEMON=1 python3 scripts/pytorch/linear_model_example.py, I will meet this error as described in docs manifests as a segfault.
Then I add parameter KINETO_DAEMON_INIT_DELAY_S to run this script KINETO_USE_DAEMON=1 KINETO_DAEMON_INIT_DELAY_S=3 python3 scripts/pytorch/linear_model_example.py, but the trace file won't be generated as expected.

$ dyno gputrace --log-file /tmp/libkineto_trace.json
Kineto config =
ACTIVITIES_LOG_FILE=/tmp/libkineto_trace.json
PROFILE_START_TIME=0
ACTIVITIES_DURATION_MSECS=500
PROFILE_REPORT_INPUT_SHAPES=false
PROFILE_PROFILE_MEMORY=false
PROFILE_WITH_STACK=false
PROFILE_WITH_FLOPS=false
PROFILE_WITH_MODULES=false
response length = 133
response = {"activityProfilersBusy":0,"activityProfilersTriggered":[],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[]}
No processes were matched, please check --job-id or --pids flags

Environment:

Ubuntu 22.04
torch     2.4.0
dynolog install:
wget https://github.com/facebookincubator/dynolog/releases/download/v0.3.2/dynolog_0.3.2-0-amd64.deb
sudo dpkg -i dynolog_0.3.2-0-amd64.deb
sudo systemctl restart dynolog
@briancoutinho
Copy link
Contributor

Hi @KepingYan , do you see this print "INFO:2022-10-22 00:59:13 151209:151209 init.cpp:98] Registering daemon config loader" docs
We can run dynolog, example script and dyno cmd in 3 different shells to get the expected behavior.

Also to help debug, does the machine have GPUs? Ideally, this should work for both CPU only and CPU+GPU systems but just checking

@htbig
Copy link

htbig commented Oct 1, 2024

i have met the same thing,i have gpu,and i also see "INFO:2024-10-01 12:29:42 271363:271364 init.cpp:131] Registering daemon config loader, cpuOnly = 0",but i also see "response = {"activityProfilersBusy":0,"activityProfilersTriggered":[],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[]}
No processes were matched, please check --job-id or --pids flags",i have point the --pids like this:sudo ./build/bin/dyno gputrace --log-file ./a.log --pids 271363

@htbig
Copy link

htbig commented Oct 2, 2024

@KepingYan ,i have resolved this problem by add time.sleep(5) before i use GPU device in my script. you should have a try.

@KepingYan
Copy link
Author

Hi @briancoutinho, this is the log of python scripts/pytorch/linear_model_example.py terminal, and it is executed on CPU+GPU(V100) systems.

$ KINETO_USE_DAEMON=1 KINETO_DAEMON_INIT_DELAY_S=1 python3 scripts/pytorch/linear_model_example.py
INFO:2024-10-09 14:45:19 384686:384687 init.cpp:131] Registering daemon config loader, cpuOnly =  0
ERROR: External init callback must run in same thread as registerClient (-932395456 != 2078357312)
cuda:0
99 874.3150024414062
10099 8.817167282104492
20099 8.817167282104492
30099 8.817168235778809
40099 8.817168235778809
50099 8.817168235778809
60099 8.817168235778809
70099 8.817168235778809
80099 8.817168235778809
90099 8.817168235778809
100099 8.817168235778809
110099 8.817168235778809
120099 8.817168235778809
130099 8.817168235778809
140099 8.817168235778809
150099 8.817168235778809
160099 8.817168235778809
170099 8.817168235778809
180099 8.817168235778809
190099 8.817168235778809
Result: y = 1.862860976586944e-09 + 0.8567266464233398 x + -1.0942008188408181e-08 x^2 + -0.09332837164402008 x^3

I've tried @htbig advice, but it doesn't work for my environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants