Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harvester: add arXiv metadata harvester #68

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ChiaraBi
Copy link
Member

No description provided.

@ChiaraBi ChiaraBi force-pushed the arxiv_harvester branch 5 times, most recently from 3ca54b5 to 1548182 Compare July 18, 2019 16:25
@coveralls
Copy link

coveralls commented Jul 18, 2019

Pull Request Test Coverage Report for Build 240

  • 14 of 44 (31.82%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.8%) to 64.662%

Changes Missing Coverage Covered Lines Changed/Added Lines %
asclepias_broker/harvester/arxiv.py 14 44 31.82%
Totals Coverage Status
Change from base Build 238: -0.8%
Covered Lines: 1226
Relevant Lines: 1896

💛 - Coveralls

@slint slint requested a review from Glignos July 19, 2019 09:03
else:
return False

def get_metadata(self, arxiv_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the same function name in this case (for Client and Harvester might be a bit misleading)

# Identifiers
result['Identifier'] = []
doi = metadata['doi']
if doi:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a suggestion maybe the construction of the results could be a separate function to make it more distinct.


def get_metadata(self, arxiv_id):
"""Get metadata from ArXiv."""
res = arxiv.query(query="",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is in the scope, but maybe we could make this function a bit more versatile by providing a by default empty query, giving us the chance in the future to harvest by query too if needed without having to refactor this.

"""."""
data = self.get_metadata(identifier)
if data:
providers = set(providers) if providers else set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where we can get duplicate providers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants