Add Turtle, TriG, RDF/XML and SPARQL tests to better validate relative IRI resolution #6

gkellogg · 2015-09-09T22:12:24Z

This was suggested by @RubenVerborgh and discussed on public-rdf-comments. The gist has been updated to include evaluation tests for Turtle, JSON-LD, and RDF/XML, which can readily be translated for TriG and SPARQL.

RubenVerborgh · 2015-09-09T22:13:43Z

Thanks @gkellogg, feel free to assign this issue to me.

gkellogg · 2015-09-09T23:06:33Z

I don't have the permission to add you quite yet (which is a bit odd). Once you're added, you should be able to assign yourself. Thanks!

BTW, I think I own you a response relating to urn:ex:s276, which I haven't looked into.

gkellogg · 2015-10-19T15:52:33Z

See PR #13 for Turtle and TriG tests.

RubenVerborgh · 2015-10-19T15:56:02Z

About RDF/XML, I wasn't sure where to put those. The RDF/XML tests have a pretty specific structure.

gkellogg · 2015-10-19T16:07:18Z

I've created RDF/XML versions of these tests for my own use, which can be added to the RDF/XML test suite (I also have JSON-LD tests). If you don't get to it, @RubenVerborgh, I'll handle it later this week.

SPARQL's another matter.

RubenVerborgh · 2015-10-19T16:09:23Z

Only thing that held me back on RDF/XML is the weird folder structure; wasn't sure where to put things. So I'd leave it up to you if that's okay.

gkellogg · 2015-10-19T16:23:38Z

Also RDFa tests.

gkellogg · 2015-10-19T19:21:49Z

I added RDF/XML tests to PR #13.

afs · 2015-10-20T11:58:47Z

I was pointed at the following text in RFC 3987 5.3.2.4. Path Segment Normalization

The complete path segments "." and ".." are intended only for use
within relative references

says dot-segments are only for the beginning of a relative IRIs. But is it a prohibition for use elsewhere?

RFC 3986: section 3.3 Paths

The path segments "." and "..", also known as dot-segments, are
defined for relative reference within the path name hierarchy. They
are intended for use at the beginning of a relative-path reference
(Section 4.2) to indicate relative position within the hierarchical
tree of names.
...
Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax.

Also RFC 3986: section 1.2.3 Hierarchical Identifiers

All URI references are parsed by generic syntax parsers when used.
However, because hierarchical processing has no effect on an absolute
URI used in a reference unless it contains one or more dot-segments
(complete path segments of "." or "..", as described in Section 3.3),
URI scheme specifications can define opaque identifiers by
disallowing use of slash characters, question mark characters, and
the URIs "scheme:." and "scheme:..".

seems to say dot-segments processing applies to absolute URIs as it is called out specially.

All of which leaves me confused about the use of dot-segments in absolute IRIs and URIs and to some extent in relative URIs not at the beginning. Multiple readings seem possible depending on the "SHOULD NOT"/ "MUST NOT" ness of "intended only for" and "are defined".

Opinions?

RubenVerborgh · 2015-10-20T12:04:29Z

says dot-segments are only for the beginning of a relative IRIs.

Not necessarily “beginning” though. The algorithm accounts for ../ not at the beginning.

The complete path segments "." and ".." are intended only for use
within relative references

The problem is “intended”. What does that even mean?

afs · 2015-10-20T21:48:29Z

The algorithm works more generally and also works for sorting out absolute URIs as well by operating on their path. My sense is that the case of base URI having .. was either not given much weight or it was considered to be sorted out as part of making the base URI in the first place . I.e. .. does not appear. "intended" == "design space". Most of the ways to determine a base URI and usage like browser document base URLs don't have such segments.

The algorithm accounts for .. everywhere except at the end of the base URI. There, it simply looses it at the merge step if not <> and it has no effect.

Where .. has some action

<http://host/> <a/b/../c/d> => <http://host/a/c/d> 
<http://host/a/b/../c/> <d> => <http://host/a/c/d> 
<http://host/a/b/> <../c/d> => <http://host/a/c/d>
<http://host/x/y/> <z/..> => <http://host/x/y/>

Where .. has no effect

<http://host/a/b/..> <c/d> => <http://host/a/b/c/d>

The difference between .. and ../ but it isn't an arbitrary split anyway : </c/d> for example.

When, except by explicit choice, does .. appear in a base URI?

And it's different if the base URI is sorted out first which is what leads to oddities.

BASE <http://host/>
BASE </a/b/..>
<urn:ex:s> <urn:ex:p> <c/d> .

RubenVerborgh · 2015-10-21T12:56:27Z

My sense is that the case of base URI having .. was either not given much weight or it was considered to be sorted out as part of making the base URI in the first place

The thing is, RFC3986 says that “Normalization of the base URI, as described in Sections 6.2.2 and 6.2.3, is optional” but the rest of the algorithm seems to silently assume that this normalization has been performed (as there would be no obvious reason not to). That's also what many libraries that perform resolution just assume.

However, the Turtle spec says that we should “[use] only the basic algorithm in section 5.2”. While I insist that this wording is ambiguous, I think that the intention of “only the basic algorithm” was to say that we should not do anything optional, which thus means not normalizing the base URI. Perhaps @cygri can help us clarify the intention.

afs · 2015-10-21T14:57:51Z

Yes - sorting out the absolute base before even getting to the relative URI resolution seems highly likely. The text leading up to the quote from the Turtle spec makes it clear the text applies to relative IRIs so we're exposed to RFC processing on the base earlier (e.g. at the point of @base) so this isn't about the relative URI step.

Absolute URIs have other issues; this is why this came up for me. <file:data.ttl> is strictly illegal as is <file:/path> (which is what java URL.toString()produces). Jena normalizes those to (legal) <file:///fullpath>.

And there is URI scheme C: found on some operation systems :-).

So the problematic cases for http are:

trailing /.. in an absolute the base URI which leads to it's being ignored. Any other final component works including single dot. A base of @base <..> . works.
The special case of <> in 5.2.2 which bypasses remove_dot_segments and exposes the raw base URI which might elsewhere be clean or raw. This is an important case and should be tested. At least the assumptions of the tests data need to be captured.

Asking again: When does ".." appear in an absolute base URI in practice? We could avoid testing this one situation if it is a test-case corner case.

The text is the same in SPARQL 1.1 (@ericprud ? can you remember the history?).

RubenVerborgh · 2015-10-21T15:05:44Z

When does ".." appear in an absolute base URI in practice?

In practice, I can't imagine why anyone would like to do that. Just like I hope nobody ever mints other very ugly URIs. In theory, however, it is possible, and the RDF allows it:

IRI normalization: Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of RFC3987. Non-normalized forms that are best avoided include:

[…]

“/./” or “/../” in the path component of an IRI

[…]

So we should avoid them, but at the same time, the spec acknowledges that they exist and are valid.

For me, however, the following part of the spec is just a disaster:

IRI equality: Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of RFC3987. Further normalization must not be performed when comparing IRIs for equality.

Because this means that dereferencing is utterly broken, since intermediaries are allowed to do such normalization. We should just always have to normalize. But that's another discussion altogether 😉

afs · 2015-10-21T19:37:03Z

Good point. If the spec calls out "/../" (trailing /) which is case where "everything just works", then may be the tests should use that mostly, not "/.." (no trailing /) and have just one or two tests of "/..".

RubenVerborgh · 2015-10-21T19:38:35Z

I think it's safer to test that behavior as well. If /../ can occur, so can /...

…f-tests#6. Fixes #394.

gkellogg · 2015-11-04T01:05:20Z

@RubenVerborgh what's the next step for this? Have we reached consensus on that tests to include and what to do with the .. segments in the base? It would be good to complete this issue.

RubenVerborgh · 2015-11-04T08:53:03Z

I think this is mostly up to @afs. I could live with removing the /.. tests, but I'm not convinced this is necessary. However, if this speeds up the issue, we can indeed remove those and commit what we have already (and add the rest later if needed).

afs · 2015-11-04T14:34:06Z

Focusing on the area where there is no disagreement seems like the way forward.

Is that test sets with no dot-segments in absolute IRIs which are 01, 02, 07 and 08?

RubenVerborgh · 2015-11-05T08:53:48Z

I think we have a larger subset we agree on. I would propose everything except 5 and 6. Would that work?

afs · 2015-11-05T10:22:08Z

03 and 04 have dot-segments in @base URIs and lead to different results.

See the external feedback which quotes RFC3987 that reference resolution is necessary when the reference is already an IRI and my testing with Redland.

RubenVerborgh · 2015-11-05T10:29:28Z

Ah, I thought only trailing dot segments were an issue.

So what do we do with the other cases? We just accept that different parsers have different outcomes there, and thus we don't include them? Or we ask the spec authors what they intended?

afs · 2015-11-05T11:22:33Z

The case of <> makes them different. <> bypasses the dot-segment removal step in relative URI resolution.

One Turtle editor has responded and said that saying they are "absolute IRIs" did mean to him that RFC 3987 applies, and not that absolute IRIs are untouched.

RubenVerborgh · 2015-11-05T12:11:08Z

Any other opinions on this? Are 01, 02, 07 and 08 the only ones we agree on?

gkellogg · 2016-01-04T16:57:54Z

@RubenVerborgh can you drive this to conclusion and simply propose a change where we can ask for objections? I'd like to get these tests integrated so we can move on.

RubenVerborgh · 2016-01-05T08:27:10Z

Yes, asked for confirmation on the mailing list.

gkellogg · 2016-01-07T16:41:47Z

Turtle and TriG tests completed via PR #30. Still need similar tests for SPARQL and RDF/XML. JSON-LD and RDFa can be done elsewhere.

…f-tests#6. Fixes #394.

…f-tests#6. Fixes json-ld#394.

leipert · 2017-03-21T20:55:12Z

I just stumbled over this issue. I have two questions:

The turtle and trig tests have been merged in Add IRI resolution tests (subset) #30. The tests currently have the rdft:Proposed approval status. Is there any resource how this approval process works?
What would be the best way to add SPARQL tests for this? CONSTRUCT/SELECT/INSERT tests with a copy of the turtle tests as data?

gkellogg · 2017-03-22T23:30:16Z

@leipert Until a new Working Group is chartered to update RDF and/or SPARQL, there is really no way to move from rdft:Proposed to rdft:Approved. All the CG can really do is propose new tests to be considered at a future date. But, for all practical purposes, if they have been merged into this repo, the community has had a chance to vet the tests, and they are fairly stable.

There is, of course, a whole sub-tree for SPARQL tests, but the suites are independent, so new tests would need to be created referencing local queries and data files. Ideally, both queries and data files are the minimum necessary to test the particular feature. (chair hat off) I don't think it's appropriate to create a SPARQL test for each Turtle test in general, but you can certainly propose some specific tests for the community to consider. (chair hat back on)

The way this has been done in the past is to propose a set of tests, on the rdf-tests mailing list along with either or both of public-rdf-comments or public-sparql-dev to gain consensus for the need for such tests; create a PR with changes necessary for those tests, after which we (me usually) send out a call for consensus to merge these tests into the main (gh-pages) branch of this repo.

I expect, at some point, that there will be new tests to check for a consensus change to EXISTS that's been in the works for a while.

afs · 2023-01-22T09:44:10Z

This is done by #30.

#87 has just tweaked the tests of #30.

In #30, the "subset" is the subset of #6 we discussed on (although reading old issues is never a case of being completely sure!)

gkellogg added the enhancement label Sep 9, 2015

RubenVerborgh self-assigned this Sep 10, 2015

gkellogg added Turtle SPARQL labels Oct 10, 2015

gkellogg mentioned this issue Oct 19, 2015

Create IRI Resolution tests similar to those used for Turtle json-ld/json-ld.org#394

Closed

gkellogg added a commit to json-ld/json-ld.org that referenced this issue Oct 21, 2015

These IRI normalization tests directly correspond to those for w3c/rd…

c975e75

…f-tests#6. Fixes #394.

gkellogg mentioned this issue Oct 21, 2015

These IRI normalization tests directly correspond to those for json-ld/json-ld.org#395

Merged

RubenVerborgh mentioned this issue Jan 5, 2016

Add IRI resolution tests (subset) #30

Merged

lanthaler pushed a commit to json-ld/tests that referenced this issue Oct 4, 2016

These IRI normalization tests directly correspond to those for w3c/rd…

8e0ad27

…f-tests#6. Fixes #394.

harlantwood pushed a commit to CoMakery/json-ld.org that referenced this issue Nov 24, 2016

These IRI normalization tests directly correspond to those for w3c/rd…

1aa74a2

…f-tests#6. Fixes json-ld#394.

harlantwood pushed a commit to CoMakery/json-ld.org that referenced this issue Nov 24, 2016

These IRI normalization tests directly correspond to those for w3c/rd…

38db88d

…f-tests#6. Fixes json-ld#394.

leipert mentioned this issue Jun 8, 2017

SPARQL Tests for REGEX function #46

Open

gkellogg mentioned this issue Jul 21, 2017

Remove base dot segments in RFC3986 tests. json-ld/json-ld.org#525

Closed

gkellogg removed the Turtle label Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Turtle, TriG, RDF/XML and SPARQL tests to better validate relative IRI resolution #6

Add Turtle, TriG, RDF/XML and SPARQL tests to better validate relative IRI resolution #6

gkellogg commented Sep 9, 2015

RubenVerborgh commented Sep 9, 2015

gkellogg commented Sep 9, 2015

gkellogg commented Oct 19, 2015

RubenVerborgh commented Oct 19, 2015

gkellogg commented Oct 19, 2015

RubenVerborgh commented Oct 19, 2015

gkellogg commented Oct 19, 2015

gkellogg commented Oct 19, 2015

afs commented Oct 20, 2015

RubenVerborgh commented Oct 20, 2015

afs commented Oct 20, 2015

RubenVerborgh commented Oct 21, 2015

afs commented Oct 21, 2015

RubenVerborgh commented Oct 21, 2015

afs commented Oct 21, 2015

RubenVerborgh commented Oct 21, 2015

gkellogg commented Nov 4, 2015

RubenVerborgh commented Nov 4, 2015

afs commented Nov 4, 2015

RubenVerborgh commented Nov 5, 2015

afs commented Nov 5, 2015

RubenVerborgh commented Nov 5, 2015

afs commented Nov 5, 2015

RubenVerborgh commented Nov 5, 2015

gkellogg commented Jan 4, 2016

RubenVerborgh commented Jan 5, 2016

gkellogg commented Jan 7, 2016

leipert commented Mar 21, 2017

gkellogg commented Mar 22, 2017

afs commented Jan 22, 2023 •

edited

Loading

Add Turtle, TriG, RDF/XML and SPARQL tests to better validate relative IRI resolution #6

Add Turtle, TriG, RDF/XML and SPARQL tests to better validate relative IRI resolution #6

Comments

gkellogg commented Sep 9, 2015

RubenVerborgh commented Sep 9, 2015

gkellogg commented Sep 9, 2015

gkellogg commented Oct 19, 2015

RubenVerborgh commented Oct 19, 2015

gkellogg commented Oct 19, 2015

RubenVerborgh commented Oct 19, 2015

gkellogg commented Oct 19, 2015

gkellogg commented Oct 19, 2015

afs commented Oct 20, 2015

RubenVerborgh commented Oct 20, 2015

afs commented Oct 20, 2015

RubenVerborgh commented Oct 21, 2015

afs commented Oct 21, 2015

RubenVerborgh commented Oct 21, 2015

afs commented Oct 21, 2015

RubenVerborgh commented Oct 21, 2015

gkellogg commented Nov 4, 2015

RubenVerborgh commented Nov 4, 2015

afs commented Nov 4, 2015

RubenVerborgh commented Nov 5, 2015

afs commented Nov 5, 2015

RubenVerborgh commented Nov 5, 2015

afs commented Nov 5, 2015

RubenVerborgh commented Nov 5, 2015

gkellogg commented Jan 4, 2016

RubenVerborgh commented Jan 5, 2016

gkellogg commented Jan 7, 2016

leipert commented Mar 21, 2017

gkellogg commented Mar 22, 2017

afs commented Jan 22, 2023 • edited Loading

afs commented Jan 22, 2023 •

edited

Loading