Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: software mentions counts on homepage #1329

Merged
merged 1 commit into from
Oct 31, 2024
Merged

Conversation

dmijatovic
Copy link
Contributor

@dmijatovic dmijatovic commented Oct 24, 2024

Improve mention count on homepage

Closes #1326

Changes proposed in this pull request:

  • Improve  homepage_counts RPC to use mentions_by_software RPC for software mentions count
  • We count unique combination of mention and software id's. In other words, if multiple software packages (registered in RSD) are referenced from one paper we count this reference multiple times (for each software registered in RSD).

How to test:

  • Easier to validate is to start without test data. docker compose build && docker compose up
  • Login and create 2 software packages and 1 project
  • In the first software package in reference papers use this DOI 10.1186/s13321-017-0220-4
  • In the second software package in reference papers use this DOI 10.5194/gmd-7-267-2014
  • In the project output use this DOI 10.5194/gmd-10-3167-2017
  • Let the scrapers run and collect the citations (wait 15 minutes or more).
  • Navigate to home page and check homepage stats. It should show 555 software mentions
    • 10.1186/s13321-017-0220-4 produces 461 software mentions
    • 10.5194/gmd-7-267-2014 produces 16 software mentions
    • 10.1016/j.softx.2020.100549 produces 14 software mentions
    • 10.1016/j.future.2018.08.004 produces 64 software mentions

Example software overview

image

PR Checklist:

  • Increase version numbers in docker-compose.yml
  • Link to a GitHub issue
  • Update documentation
  • Tests

@dmijatovic dmijatovic changed the title fix: use rpc mentions_by_software software mentions counts fix: software mentions counts on homepage Oct 24, 2024
@jmaassen
Copy link
Member

jmaassen commented Oct 25, 2024

Works as expected for the test cases described above.

I also tested manually adding a mention to software1 that was also found automatically. This isn't counted twice, which is correct.

When also adding this same mention manually to software2 it is counted. This is also correct.

However, if I use the same reference paper for both software 1 and software 2, the citations are only counted once. I don't think this is correct? I agree this is an artificial example though. A more realistic one is the following:

AMBER has 10.1016/j.softx.2020.100549 as one of it's reference papers. This is cited 13 times.

KernelTuner has 10.1016/j.future.2018.08.004 as one of it reference papers. This is cited 64 times.

Because these are different pieces of software, the total citation count should be 13+64 = 77. However, the RSD only reports 75, because some citations are coming from the same source. For example this book chapter 10.1007/978-3-031-69577-3_7 cites both AMBER and KernelTuner. It should count as 2, but is now only counted as 1.

Copy link
Collaborator

@ewan-escience ewan-escience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well, I'll leave it up to @jmaassen to decide if the semantics should be changed.

@dmijatovic dmijatovic force-pushed the 1326-homepage-mention-cnt branch from c07ca2d to ddcdb75 Compare October 28, 2024 11:10
@dmijatovic
Copy link
Contributor Author

@jmaassen I have adjusted RPC to count on unique combination of mention and software id's. Using this approach we count one paper multiple times if it references more than one RSD software entry. Can you test it again, please?

…unt unique entries per software and mention id.
@dmijatovic dmijatovic force-pushed the 1326-homepage-mention-cnt branch from ddcdb75 to 576f7fa Compare October 28, 2024 13:01
Copy link

Copy link
Member

@jmaassen jmaassen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected. The overall count on the landing pages makes sense now.

@dmijatovic dmijatovic merged commit 49c0401 into main Oct 31, 2024
5 checks passed
@dmijatovic dmijatovic deleted the 1326-homepage-mention-cnt branch November 11, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPC homepage_counts.software_mention_cnt differs from actual software mentions
3 participants