Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better filtering of dead endpoints and finding new active endpoints #1

Open
vemonet opened this issue Nov 15, 2024 · 1 comment
Open
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@vemonet
Copy link

vemonet commented Nov 15, 2024

Hi @GabrieleT0 , thanks for this service! I tried to use it a bit and have some feedbacks

Dead endpoints

When navigating the website: http://www.isislab.it:12280/kgheartbeat/pages/Search there is a really important amount of endpoints that are completely dead and it is really hard to find actually interesting endpoints through the 2089 endpoints

The problem comes from the fact that the LOD cloud dataset is really not good data, there are tons of dead datasets, and the actually new endpoints are not registered there anymore. For a user point of view I really don't care much about all those endpoints that have been dead for years, and I really just want to see endpoints that have been active recently. I would recommend to completely hide out endpoints that are not reachable at all (maybe provide a button to easily show back these endpoints, but when navigating it would be really nice to directly have access to actually living endpoints)

Finding new active endpoints not in LOD cloud

I have myself worked a bit on this problematic some years ago with: https://index.semanticscience.org/ (code: https://github.com/vemonet/shapes-of-you)

For automatically finding endpoints I also used the LOD cloud (only keeping endpoints that answered to a query) + YummyData + by scraping GitHub and GitLab repositories searching for .rq files I could find a lot of endpoints (thanks to people storing SPARQL queries to these endpoints for https://grlc.io/).

You can find the python code used for scraping git repos here, in case that can help you to improve the automatic discovery of endpoints: https://github.com/vemonet/shapes-of-you/blob/main/etl/index_shapes.py#L580 you could also search for endpoint.txt file in the root of the repo that is also used by grlc.io to provide the endpoint URL

Duplication of same endpoints

Note you also have the same endpoints duplicated multiple times, e.g. if I search for "uniprot" there are 9 items for it. It might be interesting to merge endpoints on their URL to avoid duplication

@GabrieleT0
Copy link
Collaborator

Hi @vemonet, thank you very much for the helpful feedback, I really appreciate it!

Dead endpoints

I realize that searching for active KGs is complicated, we thought we'd make it easier by introducing the toggle switch you see below the search bar, once active, that one only allows you to show KGs that had a SPARQL endpoint online in the last analysis. However, in order not to exclude KGs that perhaps had an offline SPARQL endpoint in the last analysis but are active projects, we plan to use the same mechanism presented by YummyData.

Finding new active endpoints not in LOD cloud

Thank you for the material provided! In the last few weeks we were thinking just how we could solve that problem and include as many KGs as possible in the tool. I will definitely take a cue from the material you provided, I plan to use the scraper you shared with me. Of course you will receive full credit for your work.

Duplication of same endpoints

The problem with this duplication, is that in the LOD Cloud, sometimes the sub-graph of a KG is indexed separately, even if it is under the same SPARQL endpoint. We will take steps to fix this problem as well, probably enable grouping by SPARQL endpoint.

Thank you again for your feedback!

@GabrieleT0 GabrieleT0 self-assigned this Nov 16, 2024
@GabrieleT0 GabrieleT0 added bug Something isn't working enhancement New feature or request labels Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants