-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net.Dial doesn't work in zabbix_agentd #5
Comments
What operating system and version of Go and Zabbix are you using? |
I can reproduce this issue on Go 1.5.1, Zabbix 2.2.10 on CentOS 7.1. I'll advise what I find. Might be an issue with go routines firing up in the |
System call trace:
|
I have similar issue with
Standalone exe is OK, but shared library hangs on this function - probably related to net package. Strace:
Go stacktrace 1 from zabbix.log, when I send SIGQUIT signal:
Go stacktrace 2:
|
Thanks Jan, that's really helpful. I'll dig right into this one ASAP. |
Probably problem is cgo. Net package doesn't have full native go implementation. That's also reason why is .so dynamically linked:
Still no success. |
Same issue occurs in https://github.com/spektroskop/zabbix-socket/. |
I think I have the same issue with my Zabbix module for MongoDB based on On my local dev VM and on a small cloud instance, the module seems to work perfectly... 9 out of 10 times. Occasionally the first connection attempt hangs, but if I restart the Zabbix agent it will work again. Once the first connection works, it looks like all the subsequent ones also do. These 2 VMs only have 1 CPU/1 core. Is there any dirty or temporary workaround? |
Thanks for posting those details. Unfortunately I haven't been able to find a solution or even a workaround. The best I can offer is that it looks like a dead locking issue somewhere in the Go runtime's goroutines when it is loaded as a shared library. Behind the scenes, all of the Go stdlib's socket based APIs (such as Someone recently validated this on @jangaraj's stack overflow post: http://stackoverflow.com/a/36898305/5809680. This will likely need to be fleshed out and escalated to the Go dev team for resolution, or Zabbix to change the way modules are loaded to be post-fork. |
I've reproduced this issue outside of Zabbix and raised issue #15538 with the Go team. This might be a more general C/Linux issue however, due to the way memory is mapped when a process is forked. I may need to suggest to the Zabbix team to load agent modules after the worker processes are forked to fix this issue. |
This code reproduces the issue: https://gist.github.com/cavaliercoder/688a3cd7dac20c8edb0c0f6f2851b54d |
Thanks a lot for your investigations. Looking at the answer from the Go team, it looks like the only option would be to convince the Zabbix team to change the way they load external modules. From what I understand, any Zabbix plugin using threads would have the same issue right? If so, that is be very restrictive. |
Raised with Zabbix: https://support.zabbix.com/browse/ZBX-10751 |
I'm struggling to find a way forward with this issue. I think IPC will be the only way, in which case a separate project may be in order to start, watch and communicate with processes written in any language. Here's what I've discovered so far:
Problems:
|
This is a note for me to experiment with
@m-barthelemy notes above that g2z behaves okay on single core machines (where Go will run on a single OS thread). |
I tested this with |
Yes, it has always been working fine on single core. And I confirm that setting It's been a while, but IIRC the way the Zabbix agent does its forking prevents users from making plugins in any language using threads by default (or using any library that does so), right? |
When run
zabbix_agentd -t key
all is working fine. When try zabbix_get with the same key the library hangs on net.Dial.Log file have "Before dial" and never "After dial" and "Dial ok". Timeout in Dial also doesn't work.
The text was updated successfully, but these errors were encountered: