You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When files are deleted or temporarily available, some sites replace those files with new content explaining the problem. For example, Imgur and Reddit both replace deleted images with a fixed "not found" image, and Bunkr returns a "Server under maintenance" video for temporarily unavailable files. Unfortunately, some hosts, including Bunkr, don't use HTTP 4xx status codes, so gallery-dl has no way of knowing that the files it downloaded were actually junk. This is clearly the site's fault, but I'm looking to find user-oriented fixes.
I can imagine trying to fix this by maintaining my own registry of known error file sizes and hashes, in order to use an exec postprocessor to detect known error files, delete them, and mark the download as failed. Importantly, I'd like the failure in the postprocessor to prevent the original file from being added to the download archive, since the failure may be temporary. From my testing, that doesn't currently seem to be possible. Would the project be interested in adding support for postprocessors blocking the download archive entry?
Alternatively, is there a way for a script to ask gallery-dl to remove certain entries from its archive? Honestly, any tooling to work with the download archive would be a welcome addition, but I'm not sure this is a great solution here because of race conditions. Thoughts?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
When files are deleted or temporarily available, some sites replace those files with new content explaining the problem. For example, Imgur and Reddit both replace deleted images with a fixed "not found" image, and Bunkr returns a "Server under maintenance" video for temporarily unavailable files. Unfortunately, some hosts, including Bunkr, don't use HTTP 4xx status codes, so gallery-dl has no way of knowing that the files it downloaded were actually junk. This is clearly the site's fault, but I'm looking to find user-oriented fixes.
I can imagine trying to fix this by maintaining my own registry of known error file sizes and hashes, in order to use an exec postprocessor to detect known error files, delete them, and mark the download as failed. Importantly, I'd like the failure in the postprocessor to prevent the original file from being added to the download archive, since the failure may be temporary. From my testing, that doesn't currently seem to be possible. Would the project be interested in adding support for postprocessors blocking the download archive entry?
Alternatively, is there a way for a script to ask gallery-dl to remove certain entries from its archive? Honestly, any tooling to work with the download archive would be a welcome addition, but I'm not sure this is a great solution here because of race conditions. Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions