Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML backups get stuck #168

Open
3 tasks
eloiferrer opened this issue Oct 22, 2024 · 8 comments
Open
3 tasks

XML backups get stuck #168

eloiferrer opened this issue Oct 22, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@eloiferrer
Copy link
Member

Describe the bug
The XML backup script just gets stuck or takes forever to finish with no apparent output. Filtering for namespaces does not help.

For instance:

/usr/local/bin/php /var/www/html/maintenance/dumpBackup.php --current --output=gzip:/data/xml_backup_test.gz --filter=namespace:0 --conf /shared/LocalSettings.php

just hangs there, showing now output. Only after waiting several minutes it shows the process output for 1000 pages and an estimated finishing time > 1 month. Then it hangs again.

Expected behavior
(A clear and concise description of what you expected to happen.)

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Screenshots
(If applicable, add screenshots to help explain your problem.)

Additional context
Add any other context about the problem here.

  • In example, information about the device used for producing the bug

Checklist for this issue:
(Some checks for making sure this issue is completely formulated)

  • Assignee has been set for this issue
  • All fields of the issue have been filled
  • the main MaRDI project has been assigned as project
@eloiferrer eloiferrer added the bug Something isn't working label Oct 22, 2024
@eloiferrer
Copy link
Member Author

If we don't need the XML backups I suggest we deactivate them permanently.
Regardless of that, we can create JSON backups of only the Wikibase entities.

@physikerwelt
Copy link
Member

I would first try to exclude items and properties as well and test again.

@eloiferrer
Copy link
Member Author

That is what I have done. Even trying with much more restrictive filters such as --filter=namespace:0 which should only include the main namespace, gets stuck.

@physikerwelt
Copy link
Member

@eloiferrer did you see the following

Only after waiting several minutes it shows the process output for 1000 pages

for ns:0?

@eloiferrer
Copy link
Member Author

eloiferrer commented Oct 23, 2024

Yes. For ns:0 and also with any other configuration I've tried. The script seems to get stuck from the very beginning, no matter the options. It seems that the script goes through all the pages anyway, and gives some output as these get processed.

@physikerwelt physikerwelt self-assigned this Oct 23, 2024
@physikerwelt
Copy link
Member

I would like to understand why this is the case. There should be less than 1000 pages in ns:0. So it should finish in one hour or less.

@physikerwelt
Copy link
Member

I found my own docu on https://www.mediawiki.org/wiki/Manual:DumpBackup.php :-) and I think filter should go to the query but plugin would still process every page.

@physikerwelt
Copy link
Member

@eloiferrer you are right. The namespace filter is not in the query. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/504792 while I share the view of @brightbyte. Espcially with our memory limit for the single database setup, it will be impossible to run this daily. One option to get around would be to rewrite the query via the hook.
I think given the effort, we should exclude XML backups from the daily runs.

@physikerwelt physikerwelt removed their assignment Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants