-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Turtle, TriG, RDF/XML and SPARQL tests to better validate relative IRI resolution #6
Comments
Thanks @gkellogg, feel free to assign this issue to me. |
I don't have the permission to add you quite yet (which is a bit odd). Once you're added, you should be able to assign yourself. Thanks! BTW, I think I own you a response relating to |
See PR #13 for Turtle and TriG tests. |
About RDF/XML, I wasn't sure where to put those. The RDF/XML tests have a pretty specific structure. |
I've created RDF/XML versions of these tests for my own use, which can be added to the RDF/XML test suite (I also have JSON-LD tests). If you don't get to it, @RubenVerborgh, I'll handle it later this week. SPARQL's another matter. |
Only thing that held me back on RDF/XML is the weird folder structure; wasn't sure where to put things. So I'd leave it up to you if that's okay. |
Also RDFa tests. |
I added RDF/XML tests to PR #13. |
I was pointed at the following text in RFC 3987 5.3.2.4. Path Segment Normalization
says dot-segments are only for the beginning of a relative IRIs. But is it a prohibition for use elsewhere? RFC 3986: section 3.3 Paths
Also RFC 3986: section 1.2.3 Hierarchical Identifiers
seems to say dot-segments processing applies to absolute URIs as it is called out specially. All of which leaves me confused about the use of dot-segments in absolute IRIs and URIs and to some extent in relative URIs not at the beginning. Multiple readings seem possible depending on the "SHOULD NOT"/ "MUST NOT" ness of "intended only for" and "are defined". Opinions? |
Not necessarily “beginning” though. The algorithm accounts for
The problem is “intended”. What does that even mean? |
The algorithm works more generally and also works for sorting out absolute URIs as well by operating on their path. My sense is that the case of base URI having The algorithm accounts for Where
Where
The difference between When, except by explicit choice, does And it's different if the base URI is sorted out first which is what leads to oddities.
|
The thing is, RFC3986 says that “Normalization of the base URI, as described in Sections 6.2.2 and 6.2.3, is optional” but the rest of the algorithm seems to silently assume that this normalization has been performed (as there would be no obvious reason not to). That's also what many libraries that perform resolution just assume. However, the Turtle spec says that we should “[use] only the basic algorithm in section 5.2”. While I insist that this wording is ambiguous, I think that the intention of “only the basic algorithm” was to say that we should not do anything optional, which thus means not normalizing the base URI. Perhaps @cygri can help us clarify the intention. |
Yes - sorting out the absolute base before even getting to the relative URI resolution seems highly likely. The text leading up to the quote from the Turtle spec makes it clear the text applies to relative IRIs so we're exposed to RFC processing on the base earlier (e.g. at the point of Absolute URIs have other issues; this is why this came up for me. And there is URI scheme So the problematic cases for http are:
Asking again: When does " The text is the same in SPARQL 1.1 (@ericprud ? can you remember the history?). |
In practice, I can't imagine why anyone would like to do that. Just like I hope nobody ever mints other very ugly URIs. In theory, however, it is possible, and the RDF allows it:
So we should avoid them, but at the same time, the spec acknowledges that they exist and are valid. For me, however, the following part of the spec is just a disaster:
Because this means that dereferencing is utterly broken, since intermediaries are allowed to do such normalization. We should just always have to normalize. But that's another discussion altogether 😉 |
Good point. If the spec calls out "/../" (trailing /) which is case where "everything just works", then may be the tests should use that mostly, not "/.." (no trailing /) and have just one or two tests of "/..". |
I think it's safer to test that behavior as well. If |
@RubenVerborgh what's the next step for this? Have we reached consensus on that tests to include and what to do with the .. segments in the base? It would be good to complete this issue. |
I think this is mostly up to @afs. I could live with removing the |
Focusing on the area where there is no disagreement seems like the way forward. Is that test sets with no dot-segments in absolute IRIs which are 01, 02, 07 and 08? |
I think we have a larger subset we agree on. I would propose everything except 5 and 6. Would that work? |
03 and 04 have dot-segments in See the external feedback which quotes RFC3987 that reference resolution is necessary when the reference is already an IRI and my testing with Redland. |
Ah, I thought only trailing dot segments were an issue. So what do we do with the other cases? We just accept that different parsers have different outcomes there, and thus we don't include them? Or we ask the spec authors what they intended? |
The case of One Turtle editor has responded and said that saying they are "absolute IRIs" did mean to him that RFC 3987 applies, and not that absolute IRIs are untouched. |
Any other opinions on this? Are 01, 02, 07 and 08 the only ones we agree on? |
@RubenVerborgh can you drive this to conclusion and simply propose a change where we can ask for objections? I'd like to get these tests integrated so we can move on. |
Yes, asked for confirmation on the mailing list. |
Turtle and TriG tests completed via PR #30. Still need similar tests for SPARQL and RDF/XML. JSON-LD and RDFa can be done elsewhere. |
I just stumbled over this issue. I have two questions:
|
@leipert Until a new Working Group is chartered to update RDF and/or SPARQL, there is really no way to move from There is, of course, a whole sub-tree for SPARQL tests, but the suites are independent, so new tests would need to be created referencing local queries and data files. Ideally, both queries and data files are the minimum necessary to test the particular feature. (chair hat off) I don't think it's appropriate to create a SPARQL test for each Turtle test in general, but you can certainly propose some specific tests for the community to consider. (chair hat back on) The way this has been done in the past is to propose a set of tests, on the rdf-tests mailing list along with either or both of public-rdf-comments or public-sparql-dev to gain consensus for the need for such tests; create a PR with changes necessary for those tests, after which we (me usually) send out a call for consensus to merge these tests into the main (gh-pages) branch of this repo. I expect, at some point, that there will be new tests to check for a consensus change to EXISTS that's been in the works for a while. |
This was suggested by @RubenVerborgh and discussed on public-rdf-comments. The gist has been updated to include evaluation tests for Turtle, JSON-LD, and RDF/XML, which can readily be translated for TriG and SPARQL.
The text was updated successfully, but these errors were encountered: