-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: fetch published doi if preprint #6311
Changes from 1 commit
695bfcd
a30ea7f
b48553c
675e97a
ba1315f
f46bfc6
ac4def2
c7db50c
5e2e86e
a0f4b21
c0b14fd
ffe8ca6
3336ac7
822fdd6
e021509
d529ef1
7ebe550
fad622d
6173001
6172791
eb2be5f
f26c72a
5b015d7
5eae9a4
74c118a
5305e47
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,7 +69,7 @@ | |
) | ||
res.raise_for_status() | ||
except Exception as e: | ||
if res.status_code == 404: | ||
if e.response is not None and e.response.status_code == 404: | ||
raise CrossrefDOINotFoundException from e | ||
else: | ||
raise CrossrefFetchException("Cannot fetch metadata from Crossref") from e | ||
|
@@ -82,7 +82,6 @@ | |
If the Crossref API URI isn't in the configuration, we will just return an empty object. | ||
This is to avoid calling Crossref in non-production environments. | ||
""" | ||
|
||
res = self._fetch_crossref_payload(doi) | ||
|
||
try: | ||
|
@@ -130,6 +129,10 @@ | |
|
||
# Preprint | ||
is_preprint = message.get("subtype") == "preprint" | ||
if is_preprint: | ||
published_metadata = self.fetch_published_metadata(message) | ||
if published_metadata: # if not, use preprint doi | ||
return published_metadata | ||
|
||
return { | ||
"authors": parsed_authors, | ||
|
@@ -143,19 +146,11 @@ | |
except Exception as e: | ||
raise CrossrefParseException("Cannot parse metadata from Crossref") from e | ||
|
||
def fetch_preprint_published_doi(self, doi): | ||
""" | ||
Given a preprint DOI, returns the DOI of the published paper, if available. | ||
""" | ||
|
||
res = self._fetch_crossref_payload(doi) | ||
message = res.json()["message"] | ||
is_preprint = message.get("subtype") == "preprint" | ||
|
||
if is_preprint: | ||
try: | ||
published_doi = message["relation"]["is-preprint-of"] | ||
if published_doi[0]["id-type"] == "doi": | ||
return published_doi[0]["id"] | ||
except Exception: | ||
pass | ||
def fetch_published_metadata(self, doi_response_message: dict) -> dict: | ||
try: | ||
published_doi = doi_response_message["relation"]["is-preprint-of"] | ||
# the new DOI to query for ... | ||
if published_doi[0]["id-type"] == "doi": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if they ever actually put anything other than a DOI, but since the API returns a list here, we should iterate and look for an entry with id-type 'doi' rather than just checking the first entry |
||
return self.fetch_metadata(published_doi[0]["id"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that the CrossRef API returns DOIs as strings with the /'s escaped (i.e. "10.3389\/fcvm.2020.00052" from https://api.crossref.org/works/10.1101/2019.12.31.892166). I think we'd need to strip these esc characters if they exist (unless urlparse and/or requests.get already implicitly account for that) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not needed; the single There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
except Exception: # if fetch of published doi errors out, just use preprint doi | ||
pass | ||
danieljhegeman marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if they ever actually put anything other than a DOI, but since the API returns a list here, we should iterate and look for an entry with id-type 'doi' rather than just checking the first entry