-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Old style paper ids don't work #24
Comments
Hm, interesting. The search actually isn't supported by arXiv, so I'm not sure we can easily fix that part of the issue. The arxiv "id" is indeed being extracted from the title here: |
The url is less reliable than the title? The arxiv api documentation says that the id can be found from the url by stripping the leading part For the search you are not using the api? |
I'd trust that not to change about as much as I'd trust the title not to change :) I don't think either are reliable. Though, I'd be surprised if they changed much in the past decade, so they may be reliable in that sense.
The API is used, but only the search_query parameter is passed by our system (js/search.js#L92), so results are unfortunately empty for older IDs. |
I would expect the url to be far, far more stable than the title. The url would be hard for them to change (it appears in many places) and at least appears in the api description as a method for getting the id. The title, on the other hand, is just text and I really don't know why the id appears in it at all. Of course they really should provide an id tag that covers this .... I don't know how the search is used by people in practice. Writing a generic, powerful search would require more work and isn't justified unless there is a demand for it. However, being able to search for papers by id, including old ids, would be nice. That has been needed a few times during coffee (and I might use it more frequently if I knew it worked). So it would be nice if that could also be supported, though is not essential. |
Ok; I'm getting a bit confused between searching/importing via API and importing via RSS. I dug a bit and here is a summary of the behavior:
I can also imagine this breaking if the RSS title format changes, but I can also imagine additional routes like http://arxiv.org/abs/1502.06506/v2, http://arxiv.org/astro-ph/1502.06506v2, etc. becoming available and preferable to http://arxiv.org/abs/1502.06506v2, and the RSS link field reflecting this. So things are a bit messy and liable to break if arXiv changes things; let's just hope they don't :) And so I'd like to change:
|
Going down the rabbit hole, I agree that the RSS feed |
Also what if we attempt to search with |
Old style paper ids such as gr-qc/0103044v6 do not seem to be parsed correctly and are not searchable.
This shows up in the listing of June 15, 2015 for the update of the old paper "The Meaning of Einstein's Equation". PDF and Article links are not provided in the paper listing. This appears to be due to assuming that the ids will be of the form 'arxiv:ddddd' in cron.php where the article id is parsed from the title. It may be better to use the 'link' tag or the rdf:about attribute of the item tag. Of course you also are stripping the article id information from the title, so this will also need to be done with more care.
The text was updated successfully, but these errors were encountered: