Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateDerivatives Job fails on large PDF #567

Open
dolsysmith opened this issue Sep 4, 2024 · 0 comments
Open

CreateDerivatives Job fails on large PDF #567

dolsysmith opened this issue Sep 4, 2024 · 0 comments
Labels

Comments

@dolsysmith
Copy link
Contributor

Creation of derivatives fails for a PDF of over 300 MB. The logs show a heap memory error on Solr, which suggests that the problem happens during indexing of the PDF? (At least, keyword searches on terms in the full text of the PDF return no results.)

It's useful to note that, at least in this case, it's not possible to add a file of this size through the GWSS UI, since Nginx (as currently configured) doesn't allow such a large upload. So the error occurs during the Bulkrax ingest.

It would be useful to determine if the failure is triggered by the size of the PDF or by something else. If by size, perhaps include a check in the ingest process to prevent the derivative creation for files that are too large? (Eventually, the jobs fails completely, but Sidekiq keeps retrying it for a while, which in one case seems to have caused issues with the Solr instance that prevented other works from being indexed properly.)

Troubleshooting steps

  • Check container memory usage while the job runs (docker stats).
  • Check Java settings on the Solr instance.
@dolsysmith dolsysmith added the bug label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant