SPARQL test suite missing coverage of `RAND` #70

kasei · 2020-11-25T17:28:03Z

The current SPARQL test suite has a single test for the RAND function:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
ASK {
	BIND(RAND() AS ?r)
	FILTER(DATATYPE(?r) = xsd:double && ?r >= 0.0 && ?r < 1.0)
}

I seem to recall that the testing was limited because it's hard to test randomness in the manifest-based approach of this test suite (and due to WG time pressure). However, I recently ran across a case where a SPARQL user found several implementations that seem to do all implement RAND in a way that I think is contrary to the spec's intention (evaluating RAND once per syntactic call site). It would be nice to find a way to improve the tests to cover the expectations that:

A single syntactic RAND produces different results for each intermediate solution mapping (each result row should get a different random value)
Two different syntactic RAND calls should produce different results for the same solution mapping

The text was updated successfully, but these errors were encountered:

lisp · 2020-11-25T18:04:15Z

A single syntactic RAND produces different results for each intermediate solution mapping (each result row should get a different random value)

does a count distinct not suffice

select (count (distinct ?x) as ?count)
where {
  values ?y { 1 2 }
  bind(rand() as ?x)
}

https://dydra.com/james/test/@query#rand-must-be-distinct

kasei · 2020-11-25T18:08:03Z

That would work almost always. But if you had a poor PRNG, it might stochastically fail if two consecutive RAND calls returned the same value. I think I recall WG discussions about not wanting to make assumptions about the quality of the PRNG, but I don't know if that's a reasonable thing to worry about. For whatever small chance you might come across such a situation, you could reduce it further by using DISTINCT increasing the number of random values you're operating over, and just verifying that the distinct count is greater than 1.

lisp · 2020-11-25T18:08:13Z

Two different syntactic RAND calls should produce different results for the same solution mapping

does in not suffice just to test?

select ?notequal ?x ?y
where {
  bind(rand() as ?x)
  bind(rand() as ?y)
  bind((?x != ?y) as ?notequal)
}

https://dydra.com/james/test/@query#rand-in-solution-must-be-distinct

afs · 2020-11-27T12:43:09Z

Two calls to RAND can return the same value by chance (normally, a very small chance!)

lisp · 2020-11-27T12:53:42Z

does why it happens matter?

afs · 2020-11-27T13:08:11Z

Yes - the tests test the results ?notequal will be different and the test fail.
In fact exposing ?x ?y makes it untestable in the framework we have.

afs · 2020-11-27T13:11:38Z

I don't see how we can test for "bad implementation" and provide utility.

If we have tests that "sometimes fail", the whole tests suite is called into question.

It would be better to document expected behaviours (e.g. not eval by call sites) in the spec or some other documenation.

lisp · 2020-11-27T15:05:47Z

i do not follow this logic.
either it matters, or it doesn't.
if it matters, then test it.
if it does not matter, do not mention it.

kasei · 2020-11-27T15:31:11Z

i do not follow this logic.

either it matters, or it doesn't.

if it matters, then test it.

if it does not matter, do not mention it.

It is both important and may be impossible to test correctly every time given the current testing framework. I think making this perfectly testable would likely require either large changes to the testing framework or enormously burdensome requirements on the PRNG used by implementations.

@afs agreed. Maybe this should be either a 1.1 errata or a sparql-1.2 issue about improving the spec text.

lisp · 2020-11-27T15:59:05Z

It is both important and may be impossible to test correctly every time given the current testing framework.

or, by its nature.
is this a situation where it is the test's responsibility to prove something about the test subject or the test subject's responsibility to prove something about itself?

gkellogg added the SPARQL label Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARQL test suite missing coverage of `RAND` #70

SPARQL test suite missing coverage of `RAND` #70

kasei commented Nov 25, 2020

lisp commented Nov 25, 2020

kasei commented Nov 25, 2020

lisp commented Nov 25, 2020

afs commented Nov 27, 2020

lisp commented Nov 27, 2020

afs commented Nov 27, 2020

afs commented Nov 27, 2020

lisp commented Nov 27, 2020

kasei commented Nov 27, 2020

lisp commented Nov 27, 2020

SPARQL test suite missing coverage of RAND #70

SPARQL test suite missing coverage of RAND #70

Comments

kasei commented Nov 25, 2020

lisp commented Nov 25, 2020

kasei commented Nov 25, 2020

lisp commented Nov 25, 2020

afs commented Nov 27, 2020

lisp commented Nov 27, 2020

afs commented Nov 27, 2020

afs commented Nov 27, 2020

lisp commented Nov 27, 2020

kasei commented Nov 27, 2020

lisp commented Nov 27, 2020

SPARQL test suite missing coverage of `RAND` #70

SPARQL test suite missing coverage of `RAND` #70