-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is a file descriptor leak when the aggregator and the sampler are using different auth types #1319
Comments
Nice catch @johnstile . We'll fix this error path. |
@nichamon , @narategithub have either of you had a chance to look at this yet? |
@tom95858 I'm looking into this right now. |
The issue is easily reproduced by 1 sampler daemon and |
Thanks @narategithub |
@narategithub maybe where the fd is removed from the epoll set? |
@tom95858 I think so too. I enabled |
@tom95858 I think I know why. When the I'm fixing this in |
Thank you @johnstile for submitting the issue. @tom95858 Here's some updates:
|
@johnstile is this resolved? |
In this case the aggregator ldmsd is configured to use munge authentication to the sampler ldmsd, but the sampler ldmsd is configure to use ovis authentication.
The aggregator ldmsd connects to the sampler ldmsd using munge authentication, which is rejected, but the sampler ldmsd never closes the file descriptor completely.
The socket in the thread gets
shutdown()
but it then hangs forever inepoll()
.The aggregator then reconnects, creating another thread which will never die and holds a file descriptor open. Lather, rinse, repeat, explode.
Tracing threads lead to an
epoll_wait(88, <unfinished ...>
gdb shows this as:
This is the code of the thread. The call to function pointer at line 1412 which I suspect it how it goes of and does the auth checking:
aggregator ldmsd:
aggregator conf: /aggregator.conf
sampler ldmsd
sampler conf: sampler.conf
When the sampler ldmsd is configure to use munge, everything works as expected.
The text was updated successfully, but these errors were encountered: