Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a --debug flag to the CLI to help retrieve more logs. #941

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

almet
Copy link
Contributor

@almet almet commented Oct 7, 2024

While adding Github Issue Templates on this repository (#939) we found that the commands we require the user to enter in their terminals can be quite complex. Some of them might require some bash-fu, for instance to define the location of the custom seccomp profile we are using, as it differs depending the OS.

This is a proposal to add a --debug flag to the dangerzone-cli to simplify the process of generating proper logs.

When the flag is set:

  • The RUNSC_DEBUG=1 environment variable is added to the outer container ;
  • the stderr from the outer container is attached to the exception, and displayed to the user on failures.

Info: tests are currently failing and I didn't put too much effort in making them pass, as this is here mainly to see if it could make sense to add this in the first place.

@almet almet mentioned this pull request Oct 7, 2024
Copy link
Contributor

@apyrgio apyrgio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice step towards getting debug logs from users without asking them to run weird Docker/Podman commands. I've commented on a few things that we can improve, but overall I like where this is going.

dangerzone/cli.py Outdated Show resolved Hide resolved
dangerzone/conversion/errors.py Outdated Show resolved Hide resolved
dangerzone/cli.py Show resolved Hide resolved
@almet
Copy link
Contributor Author

almet commented Oct 14, 2024

This seems correlated to #442.

@apyrgio
Copy link
Contributor

apyrgio commented Nov 21, 2024

We had a discussion with Alexis. He mentioned that adding the --debug flag makes the Dangezone CLI hang. The reason behind this bug is the fact that we read from stderr only after the conversion is done. However, with RUNSC_DEBUG=1, gVisor logs to stderr early, and thus the container hangs, because we don't read from the pipe.

@almet
Copy link
Contributor Author

almet commented Nov 26, 2024

I've rebased this branch on top of the latest main (actually replayed the changes manually) and it shows that it hangs.

@almet
Copy link
Contributor Author

almet commented Dec 18, 2024

This is currently blocking, because the docker process writes to stderr but the pipe is never called by the other side, resulting in a process hang (the buffer is full and so it stops there).

Our conversion process is currently running in a thread pool,

After a discussion with @apyrgio this morning on the matter, there seem to be two different outcomes for this:

  • One "involved" way where we change all of the caller code to be async;
  • We start a thread reading stderr

I went with the second option for now, but the first one could also make sense.

@almet almet force-pushed the debug-cli branch 2 times, most recently from bea256d to 4f4c523 Compare December 19, 2024 09:14
dangerzone/isolation_provider/base.py Outdated Show resolved Hide resolved
dangerzone/isolation_provider/container.py Outdated Show resolved Hide resolved
dangerzone/isolation_provider/container.py Outdated Show resolved Hide resolved
dangerzone/logic.py Outdated Show resolved Hide resolved
dangerzone/isolation_provider/base.py Outdated Show resolved Hide resolved
@almet almet force-pushed the debug-cli branch 3 times, most recently from 9e86988 to a4e34ad Compare January 8, 2025 10:46
Copy link
Contributor

@apyrgio apyrgio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the changes. I think we're 🤏 close to merging it. I've left a few comments, but overall I like the new approach.

Oh, let's also have an issue to ask people to use the --debug flag, once we have a release out.

Comment on lines 357 to 358
self.stderr.seek(0)
debug_log = read_debug_text(self.stderr, MAX_CONVERSION_LOG_CHARS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if the stderr thread is still writing to the BytesIO buffer (e.g., due to a race), then seeking to 0 risks overwriting the first bytes in the buffer. Also, we may get incomplete logs, without us realizing, which was not the case before (because we use p.poll() is not None.

I propose we make some minor adjustments to the above logic:

  1. Check if the thread is alive after waiting for 1 second (.join(timeout=1) and .is_alive())
  2. Grab the BytesIO buffer with BytesIO.getvalue(), which doesn't require seeking.
  3. Adjust read_debug_text() to sanitize the bytes buffer of (2), instead of reading the BytesIO object.
  4. In case the thread is still alive, add an (incomplete) indication to the ----- DOC TO PIXELS LOG START ----- message.

How does that sound, do you have a better idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this in 3a56f51, let me know if that suits you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot for the change. I have one important comment, and then some nits that you can ignore if you want to:

I think that MAX_CONVERSION_LOG_CHARS will actually truncate the debug output of gVisor. It amounts to 150 lines of 50 chars roughly, but I think gVisor's debug logs are more than that. Now that we have a non-blocking way to read the stderr, I think we can drop the MAX_CONVERSION_LOG_CHARS constant altogether, since it no longer serves the original purpose.

Nits:

  • I'd maybe rename read_debug_text() to sanitize_debug_text().
  • It would be nice to know if the debug log is incomplete, without scrolling to the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About MAX_CONVERSION_LOGS_CHARS, I felt exactly the same so thanks for the feedback! Dropped now and renamed the function in 8532688

About the "incomplete" log, I feel that having in the end is enough as it's where we would expect to have some more info. I might be missing some practical information you have. Do you have anything specific in mind?

Copy link
Contributor

@apyrgio apyrgio Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About MAX_CONVERSION_LOGS_CHARS, I felt exactly the same so thanks for the feedback! Dropped now and renamed the function in 8532688

Great, thanks!

About the "incomplete" log, I feel that having in the end is enough as it's where we would expect to have some more info. I might be missing some practical information you have. Do you have anything specific in mind?

Nothing fancy, something like:

DOC_TO_PIXELS_LOG_START = "----- DOC TO PIXELS LOG START -----"
DOC_TO_PIXELS_LOG_START_INCOMPLETE = "----- DOC TO PIXELS LOG START (incomplete) -----"

log_header = DOC_TO_PIXELS_LOG_START
if stderr_thread.is_alive():
    log_header = DOC_TO_PIXELS_LOG_START_INCOMPLETE

log.info(
    "Conversion output (doc to pixels)\n"
    f"{log_header}\n"
    f"{debug_log}"  # no need for an extra newline here
    f"{DOC_TO_PIXELS_LOG_END}"
)

But I'll leave it to you, really. I've approved the PR, so you can resolve this thread and merge it.

dangerzone/isolation_provider/container.py Outdated Show resolved Hide resolved
dangerzone/logic.py Outdated Show resolved Hide resolved
dangerzone/isolation_provider/base.py Outdated Show resolved Hide resolved
almet added 2 commits January 13, 2025 15:07
When the flag is set, the `RUNSC_DEBUG=1` environment variable is added
to the outer container, and stderr is captured in a separate thread, before printing its output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants