Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server #474

adomasven · 2024-05-16T13:24:07Z

Seems like when the Connector was initially written there was no XHR support for Blob and ArrayBuffer types. Technically it's not very efficient to do this, but we are facing increasingly more complicated bot protection and it's unlikely to get better in the future.

dstillman · 2024-05-17T05:23:00Z

(Inspired by https://forums.zotero.org/discussion/114431/pdf-will-not-save-to-zotero, among others)

dstillman · 2024-06-10T20:05:47Z

https://forums.zotero.org/discussion/comment/465245/#Comment_465245

adomasven · 2024-08-07T08:09:29Z

https://forums.zotero.org/discussion/116557/zotero-ezproxy-issue

adomasven · 2024-10-04T14:54:10Z

While this works well for regular downloads, for some sites (ScienceDirect!) that use a JS redirect it will not without custom handling:

Add a hidden iframe on the page with sandbox settings that disallows file downloads (so the user doesn't get prompted to save a file)
Monitor for attempts to navigate to the expected mime-type (either navigation gets aborted after getting headers with content-disposition by iframe sandbox, or it succeeds, which means we needlessly load the file in the iframe)
Then refetch the same page using XHR

Unfortunately at least for ScienceDirect on Safari this doesn't work without some additional custom handholding, because we do not have full cookie access there. We need to make the fetch request for the PDF from the sciencedirect page (not background) as it's allowed per their CSP policy, but this can change at any point.

The biggest drawback here is that we're not equipped to display a captcha window on the browser if anything goes wrong, and even if we opened a new captcha tab, we wouldn't be able to do much with it since we cannot grab the loaded file directly from the browser without using XHR. Also it might trigger a save file prompt if we did this.

Also if some of this breaks we will be hostage to the browser extension approval processes and cannot update as fast as we can with the client.

The other option is to continue using Zotero BrowserDownload for pages where we need JS redirect, but that defeats one of the more exciting parts of this change.

adomasven · 2024-10-04T15:02:15Z

Unfortunately at least for ScienceDirect on Safari this doesn't work without some additional custom handholding, because we do not have full cookie access there. We need to make the fetch request for the PDF from the sciencedirect page (not background) as it's allowed per their CSP policy, but this can change at any point.

I guess the default policy for all attachment XHRs should be to attempt to fetch them using the content XHR and only fallback to background XHR if that fails (most likely due to CORS), and then that might fail too on Safari, but at least we won't need to add an exception.

This would also mean that maybe Safari would work better than now since cookies would be sent more often. On the other hand for multiple saves from Google Scholar and similar websites we'd be needlessly sending content XHR that would all generally fail due to CSP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server #474

Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server #474

adomasven commented May 16, 2024

dstillman commented May 17, 2024

dstillman commented Jun 10, 2024

adomasven commented Aug 7, 2024

adomasven commented Oct 4, 2024

adomasven commented Oct 4, 2024

Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server #474

Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server #474

Comments

adomasven commented May 16, 2024

dstillman commented May 17, 2024

dstillman commented Jun 10, 2024

adomasven commented Aug 7, 2024

adomasven commented Oct 4, 2024

adomasven commented Oct 4, 2024