Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy from Zotero storage #82

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 47 additions & 2 deletions paperqa/contrib/zotero.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# This file gets PDF files from the user's Zotero library
import os
import shutil
from typing import Union, Optional
from pathlib import Path
import logging
Expand Down Expand Up @@ -44,8 +45,22 @@ def __init__(
library_id: Optional[str] = None,
api_key: Optional[str] = None,
storage: Optional[StrPath] = None,
zotero_storage: Union[StrPath,bool] = True,
**kwargs,
):
"""Initialize the ZoteroDB object.

Parameters
----------
storage: str, optional
The path to the directory where PDFs will be stored. Defaults to
`~/.paperqa/zotero`.
zotero_storage: str, optional
The path to storage directory where Zotero stores PDFs. Defaults to
`~/Zotero/storage/`. Set this to use previously-downloaded PDFs. Set to `False` to
disable this feature.
"""

self.logger = logging.getLogger("ZoteroDB")

if library_id is None:
Expand Down Expand Up @@ -76,9 +91,21 @@ def __init__(

if storage is None:
storage = CACHE_PATH.parent / "zotero"



if isinstance(zotero_storage, StrPath):
self.zotero_storage = Path(zotero_storage).expanduser()
if not self.zotero_storage.exists():
raise FileNotFoundError(f"Zotero storage directory {zotero_storage} does not exist.")

elif zotero_storage:
self.zotero_storage = Path.home() / "Zotero" / "storage"
if not self.zotero_storage.exists():
self.logger.warning(f"Zotero storage directory {zotero_storage} does not exist. Disabling copy from Zotero storage.")
self.zotero_storage = False

self.logger.info(f"Using cache location: {storage}")
self.storage = storage
self.storage = Path(storage)

super().__init__(
library_type=library_type, library_id=library_id, api_key=api_key, **kwargs
Expand Down Expand Up @@ -107,6 +134,24 @@ def get_pdf(self, item: dict) -> Union[Path, None]:
pdf_path = self.storage / (pdf_key + ".pdf")

if not pdf_path.exists():
if self.zotero_storage:
self.logger.info(f"| Looking for existing PDF for: {_get_citation_key(item)}")
try:
zotero_doc_folder = self.zotero_storage / pdf_key

if zotero_doc_folder.exists():
pdf_files = list(zotero_doc_folder.glob("*.pdf"))
if len(pdf_files) == 1:
self.logger.info(f"| Copying existing PDF for {_get_citation_key(item)} from Zotero storage.")
zotero_pdf_path = zotero_doc_folder / pdf_files[0]
shutil.copy(zotero_pdf_path, pdf_path)
return pdf_path
g-simmons marked this conversation as resolved.
Show resolved Hide resolved
else:
self.logger.warning("| Found more than one PDF for {_get_citation_key(item)}, so skipping.")

except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other exceptions are you thinking of here? Maybe catch them explicitly, and throw the error if something unexpected comes up?

Copy link
Author

@g-simmons g-simmons Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting one or more of the three things below? Or something else?

  1. catch specific error cases and resolve them
  2. catch, warn and fall back to downloading
  3. catch, wrap with a message, and raise

For unexpected edge cases, do you think raising an error would be better or just warning the user and falling back to downloading?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have anything specific in mind - could imagine permissions errors at the destination dir.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was no specific error you had in mind I would just remove the try-except. If you think there could be permissions issues for the shutil.copy I would check for the specific permissions error, and warn, then fall back to downloading. If there's an unexpected error then it should be raised.

self.logger.warning(f"Could not copy file from Zotero storage, redownloading file. Error: {e}")

pdf_path.parent.mkdir(parents=True, exist_ok=True)
self.logger.info(f"| Downloading PDF for: {_get_citation_key(item)}")
self.dump(pdf_key, pdf_path)
Expand Down