Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fetch published doi if preprint #6311

Merged
merged 26 commits into from
Jan 6, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
695bfcd
feat: fetch published doi if preprint
Dec 4, 2023
a30ea7f
iterate over relations to find first published doi
Dec 5, 2023
b48553c
explicitly return None for published metadata fetch
Dec 5, 2023
675e97a
Merge branch 'main' into dan/575-preprint-published-doi
Dec 5, 2023
ba1315f
Merge branch 'main' into dan/575-preprint-published-doi
Dec 7, 2023
f46bfc6
Merge branch 'main' into dan/575-preprint-published-doi
Dec 17, 2023
ac4def2
move doi parse logic up to allow return of parsed doi
Dec 17, 2023
c7db50c
fix: update unit tests
Jan 2, 2024
5e2e86e
fix remaining unit tests
Jan 2, 2024
a0f4b21
Merge branch 'main' into dan/575-preprint-published-doi
Jan 3, 2024
c0b14fd
fix: reference correct CrossrefProvider in unit tests
Jan 3, 2024
ffe8ca6
chore: remove unused CrossrefProvider code
Jan 3, 2024
3336ac7
Merge branch 'dan/6166-crossref-cleanup' into dan/575-preprint-publis…
Jan 3, 2024
822fdd6
reconcile newly added tests with doi changes
Jan 3, 2024
e021509
oops
Jan 3, 2024
d529ef1
Merge branch 'main' into dan/575-preprint-published-doi
Jan 3, 2024
7ebe550
Merge branch 'main' into dan/575-preprint-published-doi
Jan 3, 2024
fad622d
return Tuple of None when crossref call returns None
Jan 4, 2024
6173001
Fix processing unit test
Jan 4, 2024
6172791
fix test to comply with Tuple return
Jan 4, 2024
eb2be5f
correctly handle tuple return from call for published metadata
Jan 4, 2024
f26c72a
Do not prepend "/" to doi
Jan 5, 2024
5b015d7
do not allow bad DOI domain if curie is unchanged
Jan 5, 2024
5eae9a4
refactors
Jan 6, 2024
74c118a
remove print statements
Jan 6, 2024
5305e47
Merge branch 'main' into dan/575-preprint-published-doi
Jan 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 0 additions & 152 deletions backend/common/providers/crossref_provider.py

This file was deleted.

31 changes: 13 additions & 18 deletions backend/layers/thirdparty/crossref_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
)
res.raise_for_status()
except Exception as e:
if res.status_code == 404:
if e.response is not None and e.response.status_code == 404:
raise CrossrefDOINotFoundException from e
else:
raise CrossrefFetchException("Cannot fetch metadata from Crossref") from e
Expand All @@ -82,7 +82,6 @@
If the Crossref API URI isn't in the configuration, we will just return an empty object.
This is to avoid calling Crossref in non-production environments.
"""

res = self._fetch_crossref_payload(doi)

try:
Expand Down Expand Up @@ -130,6 +129,10 @@

# Preprint
is_preprint = message.get("subtype") == "preprint"
if is_preprint:
published_metadata = self.fetch_published_metadata(message)
if published_metadata: # if not, use preprint doi
return published_metadata

return {
"authors": parsed_authors,
Expand All @@ -143,19 +146,11 @@
except Exception as e:
raise CrossrefParseException("Cannot parse metadata from Crossref") from e

def fetch_preprint_published_doi(self, doi):
"""
Given a preprint DOI, returns the DOI of the published paper, if available.
"""

res = self._fetch_crossref_payload(doi)
message = res.json()["message"]
is_preprint = message.get("subtype") == "preprint"

if is_preprint:
try:
published_doi = message["relation"]["is-preprint-of"]
if published_doi[0]["id-type"] == "doi":
return published_doi[0]["id"]
except Exception:
pass
def fetch_published_metadata(self, doi_response_message: dict) -> dict:
try:
published_doi = doi_response_message["relation"]["is-preprint-of"]
# the new DOI to query for ...
if published_doi[0]["id-type"] == "doi":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if they ever actually put anything other than a DOI, but since the API returns a list here, we should iterate and look for an entry with id-type 'doi' rather than just checking the first entry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if they ever actually put anything other than a DOI, but since the API returns a list here, we should iterate and look for an entry with id-type 'doi' rather than just checking the first entry

return self.fetch_metadata(published_doi[0]["id"])
Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the CrossRef API returns DOIs as strings with the /'s escaped (i.e. "10.3389\/fcvm.2020.00052" from https://api.crossref.org/works/10.1101/2019.12.31.892166). I think we'd need to strip these esc characters if they exist (unless urlparse and/or requests.get already implicitly account for that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed; the single \ in the context of \/ gets removed once you load from json into dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl https://api.crossref.org/works/10.1101/2019.12.31.892166 | jq '.message.DOI'

except Exception: # if fetch of published doi errors out, just use preprint doi
pass

Check warning on line 156 in backend/layers/thirdparty/crossref_provider.py

View check run for this annotation

Codecov / codecov/patch

backend/layers/thirdparty/crossref_provider.py#L155-L156

Added lines #L155 - L156 were not covered by tests
danieljhegeman marked this conversation as resolved.
Show resolved Hide resolved
130 changes: 114 additions & 16 deletions tests/unit/backend/layers/thirdparty/test_crossref_provider.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import copy
import json
import unittest
from unittest.mock import patch

from requests import RequestException
from requests.models import HTTPError, Response

from backend.common.providers.crossref_provider import (
from backend.layers.thirdparty.crossref_provider import (
CrossrefDOINotFoundException,
CrossrefException,
CrossrefFetchException,
Expand All @@ -15,15 +16,15 @@


class TestCrossrefProvider(unittest.TestCase):
@patch("backend.common.providers.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.requests.get")
def test__provider_does_not_call_crossref_in_test(self, mock_get):
provider = CrossrefProvider()
res = provider.fetch_metadata("test_doi")
self.assertIsNone(res)
with self.assertRaises(CrossrefParseException):
provider.fetch_metadata("test_doi")
mock_get.assert_not_called()

@patch("backend.common.providers.crossref_provider.requests.get")
@patch("backend.common.providers.crossref_provider.CorporaConfig")
@patch("backend.layers.thirdparty.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.CorporaConfig")
def test__provider_calls_crossref_if_api_key_defined(self, mock_config, mock_get):

# Defining a mocked CorporaConfig will allow the provider to consider the `crossref_api_key`
Expand Down Expand Up @@ -72,8 +73,105 @@ def test__provider_calls_crossref_if_api_key_defined(self, mock_config, mock_get

self.assertDictEqual(expected_response, res)

@patch("backend.common.providers.crossref_provider.requests.get")
@patch("backend.common.providers.crossref_provider.CorporaConfig")
@patch("backend.layers.thirdparty.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.CorporaConfig")
def test__published_doi_used_if_exists_for_preprint(self, mock_config, mock_get):

# Defining a mocked CorporaConfig will allow the provider to consider the `crossref_api_key`
# not None, so it will go ahead and do the mocked call.

def make_response(content):
response = Response()
response.status_code = 200
response._content = str.encode(json.dumps(content))
return response

body = {
"status": "ok",
"message": {
"author": [
{
"given": "John",
"family": "Doe",
"sequence": "first",
},
{
"given": "Jane",
"family": "Doe",
"sequence": "additional",
},
],
"published-online": {"date-parts": [[2021, 11, 10]]},
"container-title": ["Nature"],
"subtype": "preprint",
"relation": {
"is-preprint-of": [
{
"id": "published_doi",
"id-type": "doi",
}
]
},
},
}

provider = CrossrefProvider()

with self.subTest("Published DOI is used when available"):
preprint_body = copy.deepcopy(body)
response_preprint = make_response(preprint_body)

published_body = copy.deepcopy(body)
del published_body["message"]["subtype"]
published_body["message"]["author"][0]["given"] = "Jonathan"
response_published = make_response(published_body)

responses = [response_published, response_preprint]
mock_get.side_effect = lambda *x, **y: responses.pop()
res = provider.fetch_metadata("preprint_doi")

expected_response = {
"authors": [{"given": "Jonathan", "family": "Doe"}, {"given": "Jane", "family": "Doe"}],
"published_year": 2021,
"published_month": 11,
"published_day": 10,
"published_at": 1636502400.0,
"journal": "Nature",
"is_preprint": False,
}

self.assertDictEqual(expected_response, res)

with self.subTest("Preprint DOI is used when published is referenced but cannot be retrieved") and patch.object(
provider, "fetch_published_metadata"
) as fetch_published_metadata_mock:
fetch_published_metadata_mock.return_value = None

preprint_body = copy.deepcopy(body)
response_preprint = make_response(preprint_body)

published_body = copy.deepcopy(body)
del published_body["message"]["subtype"]
response_published = make_response(published_body)

responses = [response_published, response_preprint]
mock_get.side_effect = lambda *x, **y: responses.pop()
res = provider.fetch_metadata("preprint_doi")

expected_response = {
"authors": [{"given": "John", "family": "Doe"}, {"given": "Jane", "family": "Doe"}],
"published_year": 2021,
"published_month": 11,
"published_day": 10,
"published_at": 1636502400.0,
"journal": "Nature",
"is_preprint": True,
}

self.assertDictEqual(expected_response, res)

@patch("backend.layers.thirdparty.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.CorporaConfig")
def test__provider_parses_authors_and_dates_correctly(self, mock_config, mock_get):

response = Response()
Expand Down Expand Up @@ -138,8 +236,8 @@ def test__provider_parses_authors_and_dates_correctly(self, mock_config, mock_ge

self.assertDictEqual(expected_response, res)

@patch("backend.common.providers.crossref_provider.requests.get")
@patch("backend.common.providers.crossref_provider.CorporaConfig")
@patch("backend.layers.thirdparty.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.CorporaConfig")
def test__provider_throws_exception_if_request_fails(self, mock_config, mock_get):
"""
Asserts a CrossrefFetchException if the GET request fails for any reason
Expand All @@ -155,8 +253,8 @@ def test__provider_throws_exception_if_request_fails(self, mock_config, mock_get
with self.assertRaises(CrossrefException):
provider.fetch_metadata("test_doi")

@patch("backend.common.providers.crossref_provider.requests.get")
@patch("backend.common.providers.crossref_provider.CorporaConfig")
@patch("backend.layers.thirdparty.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.CorporaConfig")
def test__provider_throws_exception_if_request_fails_with_404(self, mock_config, mock_get):
"""
Asserts a CrossrefFetchException if the GET request fails for any reason
Expand All @@ -170,8 +268,8 @@ def test__provider_throws_exception_if_request_fails_with_404(self, mock_config,
with self.assertRaises(CrossrefDOINotFoundException):
provider.fetch_metadata("test_doi")

@patch("backend.common.providers.crossref_provider.requests.get")
@patch("backend.common.providers.crossref_provider.CorporaConfig")
@patch("backend.layers.thirdparty.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.CorporaConfig")
def test__provider_throws_exception_if_request_fails_with_non_2xx_code(self, mock_config, mock_get):
"""
Asserts a CrossrefFetchException if the GET request return a 500 error (any non 2xx will work)
Expand All @@ -186,8 +284,8 @@ def test__provider_throws_exception_if_request_fails_with_non_2xx_code(self, moc
with self.assertRaises(CrossrefFetchException):
provider.fetch_metadata("test_doi")

@patch("backend.common.providers.crossref_provider.requests.get")
@patch("backend.common.providers.crossref_provider.CorporaConfig")
@patch("backend.layers.thirdparty.crossref_provider.requests.get")
@patch("backend.layers.thirdparty.crossref_provider.CorporaConfig")
def test__provider_throws_exception_if_request_cannot_be_parsed(self, mock_config, mock_get):
"""
Asserts an CrossrefParseException if the GET request succeeds but cannot be parsed
Expand Down
Loading