You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @GabrieleT0 , thanks for this service! I tried to use it a bit and have some feedbacks
Dead endpoints
When navigating the website: http://www.isislab.it:12280/kgheartbeat/pages/Search there is a really important amount of endpoints that are completely dead and it is really hard to find actually interesting endpoints through the 2089 endpoints
The problem comes from the fact that the LOD cloud dataset is really not good data, there are tons of dead datasets, and the actually new endpoints are not registered there anymore. For a user point of view I really don't care much about all those endpoints that have been dead for years, and I really just want to see endpoints that have been active recently. I would recommend to completely hide out endpoints that are not reachable at all (maybe provide a button to easily show back these endpoints, but when navigating it would be really nice to directly have access to actually living endpoints)
For automatically finding endpoints I also used the LOD cloud (only keeping endpoints that answered to a query) + YummyData + by scraping GitHub and GitLab repositories searching for .rq files I could find a lot of endpoints (thanks to people storing SPARQL queries to these endpoints for https://grlc.io/).
You can find the python code used for scraping git repos here, in case that can help you to improve the automatic discovery of endpoints: https://github.com/vemonet/shapes-of-you/blob/main/etl/index_shapes.py#L580 you could also search for endpoint.txt file in the root of the repo that is also used by grlc.io to provide the endpoint URL
Duplication of same endpoints
Note you also have the same endpoints duplicated multiple times, e.g. if I search for "uniprot" there are 9 items for it. It might be interesting to merge endpoints on their URL to avoid duplication
The text was updated successfully, but these errors were encountered:
Hi @vemonet, thank you very much for the helpful feedback, I really appreciate it!
Dead endpoints
I realize that searching for active KGs is complicated, we thought we'd make it easier by introducing the toggle switch you see below the search bar, once active, that one only allows you to show KGs that had a SPARQL endpoint online in the last analysis. However, in order not to exclude KGs that perhaps had an offline SPARQL endpoint in the last analysis but are active projects, we plan to use the same mechanism presented by YummyData.
Finding new active endpoints not in LOD cloud
Thank you for the material provided! In the last few weeks we were thinking just how we could solve that problem and include as many KGs as possible in the tool. I will definitely take a cue from the material you provided, I plan to use the scraper you shared with me. Of course you will receive full credit for your work.
Duplication of same endpoints
The problem with this duplication, is that in the LOD Cloud, sometimes the sub-graph of a KG is indexed separately, even if it is under the same SPARQL endpoint. We will take steps to fix this problem as well, probably enable grouping by SPARQL endpoint.
Hi @GabrieleT0 , thanks for this service! I tried to use it a bit and have some feedbacks
Dead endpoints
When navigating the website: http://www.isislab.it:12280/kgheartbeat/pages/Search there is a really important amount of endpoints that are completely dead and it is really hard to find actually interesting endpoints through the 2089 endpoints
The problem comes from the fact that the LOD cloud dataset is really not good data, there are tons of dead datasets, and the actually new endpoints are not registered there anymore. For a user point of view I really don't care much about all those endpoints that have been dead for years, and I really just want to see endpoints that have been active recently. I would recommend to completely hide out endpoints that are not reachable at all (maybe provide a button to easily show back these endpoints, but when navigating it would be really nice to directly have access to actually living endpoints)
Finding new active endpoints not in LOD cloud
I have myself worked a bit on this problematic some years ago with: https://index.semanticscience.org/ (code: https://github.com/vemonet/shapes-of-you)
For automatically finding endpoints I also used the LOD cloud (only keeping endpoints that answered to a query) + YummyData + by scraping GitHub and GitLab repositories searching for
.rq
files I could find a lot of endpoints (thanks to people storing SPARQL queries to these endpoints for https://grlc.io/).You can find the python code used for scraping git repos here, in case that can help you to improve the automatic discovery of endpoints: https://github.com/vemonet/shapes-of-you/blob/main/etl/index_shapes.py#L580 you could also search for
endpoint.txt
file in the root of the repo that is also used by grlc.io to provide the endpoint URLDuplication of same endpoints
Note you also have the same endpoints duplicated multiple times, e.g. if I search for "uniprot" there are 9 items for it. It might be interesting to merge endpoints on their URL to avoid duplication
The text was updated successfully, but these errors were encountered: