Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cases when LibreOffice hangs #878

Open
apyrgio opened this issue Jul 25, 2024 · 2 comments
Open

Handle cases when LibreOffice hangs #878

apyrgio opened this issue Jul 25, 2024 · 2 comments
Labels
container enhancement New feature or request timeout Dangerzone Times Out

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Jul 25, 2024

When running Dangerzone against our large test set, we found that some files (e.g., fdo78883.docx and ofz21168-1.doc) make LibreOffice 7.6 hang.

We opened a bug report for these files, but until the underlying issue is solved, we need a way to detect such hangs, and stop the conversion.

@apyrgio apyrgio added bug Something isn't working container timeout Dangerzone Times Out labels Jul 25, 2024
@apyrgio
Copy link
Contributor Author

apyrgio commented Jul 25, 2024

Re-introducing timeouts for the whole document is a solution I'd personally like to avoid. They have bitten us a lot in the past (#749), they are arbitrary (documents with many pages lead to very large timeout times), and we have recently decided to ditch them altogether (#687).

What makes more sense to me is the following:

  • Get the number of pages in the document. Here's a way to do so: https://askubuntu.com/questions/305633/how-can-i-determine-the-page-count-of-odt-doc-docx-and-other-office-documents
  • Spin up an UNO server (https://github.com/unoconv/unoserver). This server is responsible for loading a document, and listening for API requests.
  • Send API requests to the UNO server, and ask to convert the document a single page at a time.
    • See some supported export parameters that LibreOffice offers by default, and the blog post by the LibreOffice dev that added those. The PageRange option is of interest to us.
    • The UNO server project provides a command-line client (unoconvert), but maybe we can send these API requests programmatically.
  • Set a timeout for each API request. Since at the API request level we know we're dealing with a signle document page, we can set a sensible timeout (e.g., 3 minutes). Anecdotally, converting a .docx of ~2000 pages took in my laptop 18 minutes, so this timeout is more than reasonable.

Some extra benefits of this approach:

  • UNO server has an option to send files back as binary data, instead of writing them to the filesystem. This will help with Defense in Depth - Traceless Sanitization #633.
  • We aim to introduce file previewing in Dangerzone (see File Preview PoC #758 for a PoC). One concern we have with file previewing is that LibreOffice documents may take a while to be converted to PDFs, so that we can stream their pixels afterwards. With this method, we can start streaming from the very first page.
  • Compared with providing PageRange arguments via the LibreOffice CLI, UNO server loads the document in memory once, so it offers faster conversion times for documents with lots of pages.

@apyrgio apyrgio added enhancement New feature or request and removed bug Something isn't working labels Jul 25, 2024
apyrgio added a commit that referenced this issue Jul 25, 2024
Some of the files in our large test set can make LibreOffice hang. We
do not have a proper solution for this yet, but we can at least make
the tests timeout quickly, so that they can finish at some point.

Refs #878
apyrgio added a commit that referenced this issue Jul 25, 2024
Some of the files in our large test set can make LibreOffice hang. We
do not have a proper solution for this yet, but we can at least make
the tests timeout quickly, so that they can finish at some point.

Refs #878
@eloquence
Copy link
Member

eloquence commented Jul 25, 2024

(Leaving unmilestoned for now given lower potential impact)

apyrgio added a commit that referenced this issue Aug 9, 2024
Some of the files in our large test set can make LibreOffice hang. We
do not have a proper solution for this yet, but we can at least make
the tests timeout quickly, so that they can finish at some point.

Refs #878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container enhancement New feature or request timeout Dangerzone Times Out
Projects
None yet
Development

No branches or pull requests

2 participants