Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server #474

Open
adomasven opened this issue May 16, 2024 · 5 comments

Comments

@adomasven
Copy link
Member

Seems like when the Connector was initially written there was no XHR support for Blob and ArrayBuffer types. Technically it's not very efficient to do this, but we are facing increasingly more complicated bot protection and it's unlikely to get better in the future.

@dstillman
Copy link
Member

@dstillman
Copy link
Member

@adomasven
Copy link
Member Author

@adomasven
Copy link
Member Author

While this works well for regular downloads, for some sites (ScienceDirect!) that use a JS redirect it will not without custom handling:

  • Add a hidden iframe on the page with sandbox settings that disallows file downloads (so the user doesn't get prompted to save a file)
  • Monitor for attempts to navigate to the expected mime-type (either navigation gets aborted after getting headers with content-disposition by iframe sandbox, or it succeeds, which means we needlessly load the file in the iframe)
  • Then refetch the same page using XHR

Unfortunately at least for ScienceDirect on Safari this doesn't work without some additional custom handholding, because we do not have full cookie access there. We need to make the fetch request for the PDF from the sciencedirect page (not background) as it's allowed per their CSP policy, but this can change at any point.

The biggest drawback here is that we're not equipped to display a captcha window on the browser if anything goes wrong, and even if we opened a new captcha tab, we wouldn't be able to do much with it since we cannot grab the loaded file directly from the browser without using XHR. Also it might trigger a save file prompt if we did this.

Also if some of this breaks we will be hostage to the browser extension approval processes and cannot update as fast as we can with the client.

The other option is to continue using Zotero BrowserDownload for pages where we need JS redirect, but that defeats one of the more exciting parts of this change.

@adomasven
Copy link
Member Author

Unfortunately at least for ScienceDirect on Safari this doesn't work without some additional custom handholding, because we do not have full cookie access there. We need to make the fetch request for the PDF from the sciencedirect page (not background) as it's allowed per their CSP policy, but this can change at any point.

I guess the default policy for all attachment XHRs should be to attempt to fetch them using the content XHR and only fallback to background XHR if that fails (most likely due to CORS), and then that might fail too on Safari, but at least we won't need to add an exception.

This would also mean that maybe Safari would work better than now since cookies would be sent more often. On the other hand for multiple saves from Google Scholar and similar websites we'd be needlessly sending content XHR that would all generally fail due to CSP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants