Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix print race in ldmsd_stream_subscribe #1254

Conversation

narategithub
Copy link
Collaborator

The msglog printing in ldmsd_stream_subscribe does not lock the print mutex for the entire EVENT printing, leading to interleaving EVENT records in the output. This patch fixed the bug.

The msglog printing in ldmsd_stream_subscribe does not lock the print
mutex for the entire EVENT printing, leading to interleaving EVENT
records in the output. This patch fixed the bug.
@tom95858
Copy link
Collaborator

tom95858 commented Aug 9, 2023

@narategithub I don't understand this fix. The static variable should be identical to declaring it global. There should only be a single instance of it. There mutex is taken inside the function instead of outside. What am I missing?

@morrone
Copy link
Collaborator

morrone commented Aug 9, 2023

I think the purpose of moving the lock to global file scope is so that the lock can be held across multiple calls to msglog(). If the lock is static function scope, the lock can only be held within a single msglog() call.

@narategithub
Copy link
Collaborator Author

@tom95858 It's what @morrone said. The problem was the multiple calls of msglog() here

https://github.com/ovis-hpc/ovis/pull/1254/files#diff-6b46340af13fc098b2850942cefccb52c53db44579e025f2c230f492fd00af88L168-R180

However, I think I'll close this PR in favor of a simpler patch #1255 .

@@ -175,6 +178,7 @@ static int stream_recv_ev(ldms_stream_event_t ev, void *arg)
if (!events_raw)
msglog("}");
msglog("\n");
MSGLOG_UNLOCK();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it has anything to do with the "scope of the lock" it has to do with the need to hold the lock across multiple calls to msglog(). This is probably what @morrone meant; so it's the description of the problem and the solution that was giving me pause. I was interpreting "scope" to mean "symbol scope."

@@ -27,18 +27,20 @@ static int daemon_io;
static int daemon_noroot;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narategithub why did you close this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1255 is a simpler fix to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants