-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copy from Zotero storage #82
base: main
Are you sure you want to change the base?
Changes from 2 commits
c861de8
261acbf
8dc185f
e1f1e2a
754fdb6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
# This file gets PDF files from the user's Zotero library | ||
import os | ||
import shutil | ||
from typing import Union, Optional | ||
from pathlib import Path | ||
import logging | ||
|
@@ -44,8 +45,22 @@ def __init__( | |
library_id: Optional[str] = None, | ||
api_key: Optional[str] = None, | ||
storage: Optional[StrPath] = None, | ||
zotero_storage: Optional[Union[StrPath,bool]] = "~/Zotero/storage/", | ||
**kwargs, | ||
): | ||
"""Initialize the ZoteroDB object. | ||
|
||
Parameters | ||
---------- | ||
storage: str, optional | ||
The path to the directory where PDFs will be stored. Defaults to | ||
`~/.paperqa/zotero`. | ||
zotero_storage: str, optional | ||
The path to storage directory where Zotero stores PDFs. Defaults to | ||
`~/Zotero/storage/`. Set this to use previously-downloaded PDFs. Set to `False` to | ||
disable this feature. | ||
""" | ||
|
||
self.logger = logging.getLogger("ZoteroDB") | ||
|
||
if library_id is None: | ||
|
@@ -76,9 +91,14 @@ def __init__( | |
|
||
if storage is None: | ||
storage = CACHE_PATH.parent / "zotero" | ||
|
||
if zotero_storage: | ||
self.zotero_storage = Path(zotero_storage).expanduser() | ||
else: | ||
self.zotero_storage = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is where you can automatically set it, if the OS default exists. |
||
|
||
self.logger.info(f"Using cache location: {storage}") | ||
self.storage = storage | ||
self.storage = Path(storage) | ||
|
||
super().__init__( | ||
library_type=library_type, library_id=library_id, api_key=api_key, **kwargs | ||
|
@@ -107,6 +127,22 @@ def get_pdf(self, item: dict) -> Union[Path, None]: | |
pdf_path = self.storage / (pdf_key + ".pdf") | ||
|
||
if not pdf_path.exists(): | ||
if self.zotero_storage: | ||
self.logger.info(f"| Looking for existing PDF for: {_get_citation_key(item)}") | ||
try: | ||
zotero_doc_folder = self.zotero_storage / pdf_key | ||
|
||
if zotero_doc_folder.exists(): | ||
pdf_files = list(zotero_doc_folder.glob("*.pdf")) | ||
if len(pdf_files) == 1: | ||
self.logger.info(f"| Copying existing PDF for {_get_citation_key(item)} from Zotero storage.") | ||
zotero_pdf_path = zotero_doc_folder / pdf_files[0] | ||
shutil.copy(zotero_pdf_path, pdf_path) | ||
return pdf_path | ||
g-simmons marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
except Exception as e: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What other exceptions are you thinking of here? Maybe catch them explicitly, and throw the error if something unexpected comes up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you suggesting one or more of the three things below? Or something else?
For unexpected edge cases, do you think raising an error would be better or just warning the user and falling back to downloading? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Didn't have anything specific in mind - could imagine permissions errors at the destination dir. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there was no specific error you had in mind I would just remove the try-except. If you think there could be permissions issues for the |
||
self.logger.warning(f"Could not copy file from Zotero storage, redownloading file. Error: {e}") | ||
|
||
pdf_path.parent.mkdir(parents=True, exist_ok=True) | ||
self.logger.info(f"| Downloading PDF for: {_get_citation_key(item)}") | ||
self.dump(pdf_key, pdf_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And initialize it later on, based on what operating system it is.
Also, should use
pathlib.Path.home() / "Zotero" / "storage"
for safetyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so we could do:
But what about
I think default True makes it more obvious that the functionality is engaged by default. WYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also out of curiosity what's the diff btw
pathlib.Path.home() / "Zotero" / "storage"
andPath("~/Zotero/storage/").expanduser()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a more robust practice: On Windows, the "\" key is used instead of "/". So Pathlib's
/
operator will do the correct thing on each operating system.Also sometimes the
"~"
is not expanded correctly. Better to rely on Pathlib.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being explicit is often best so I would recommend one parameter for specifying the location, and another for specifying whether you should use it or not. When variables can take on a bunch of different types it can make code confusing. (I edited my original comment with this suggestion)