Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL test suite missing coverage of RAND #70

Open
kasei opened this issue Nov 25, 2020 · 10 comments
Open

SPARQL test suite missing coverage of RAND #70

kasei opened this issue Nov 25, 2020 · 10 comments
Labels

Comments

@kasei
Copy link
Contributor

kasei commented Nov 25, 2020

The current SPARQL test suite has a single test for the RAND function:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
ASK {
	BIND(RAND() AS ?r)
	FILTER(DATATYPE(?r) = xsd:double && ?r >= 0.0 && ?r < 1.0)
}

I seem to recall that the testing was limited because it's hard to test randomness in the manifest-based approach of this test suite (and due to WG time pressure). However, I recently ran across a case where a SPARQL user found several implementations that seem to do all implement RAND in a way that I think is contrary to the spec's intention (evaluating RAND once per syntactic call site). It would be nice to find a way to improve the tests to cover the expectations that:

  • A single syntactic RAND produces different results for each intermediate solution mapping (each result row should get a different random value)
  • Two different syntactic RAND calls should produce different results for the same solution mapping
@lisp
Copy link

lisp commented Nov 25, 2020

A single syntactic RAND produces different results for each intermediate solution mapping (each result row should get a different random value)

does a count distinct not suffice

select (count (distinct ?x) as ?count)
where {
  values ?y { 1 2 }
  bind(rand() as ?x)
}

https://dydra.com/james/test/@query#rand-must-be-distinct

@kasei
Copy link
Contributor Author

kasei commented Nov 25, 2020

That would work almost always. But if you had a poor PRNG, it might stochastically fail if two consecutive RAND calls returned the same value. I think I recall WG discussions about not wanting to make assumptions about the quality of the PRNG, but I don't know if that's a reasonable thing to worry about. For whatever small chance you might come across such a situation, you could reduce it further by using DISTINCT increasing the number of random values you're operating over, and just verifying that the distinct count is greater than 1.

@lisp
Copy link

lisp commented Nov 25, 2020

Two different syntactic RAND calls should produce different results for the same solution mapping

does in not suffice just to test?

select ?notequal ?x ?y
where {
  bind(rand() as ?x)
  bind(rand() as ?y)
  bind((?x != ?y) as ?notequal)
}

https://dydra.com/james/test/@query#rand-in-solution-must-be-distinct

@afs
Copy link
Contributor

afs commented Nov 27, 2020

Two calls to RAND can return the same value by chance (normally, a very small chance!)

@lisp
Copy link

lisp commented Nov 27, 2020

does why it happens matter?

@afs
Copy link
Contributor

afs commented Nov 27, 2020

Yes - the tests test the results ?notequal will be different and the test fail.
In fact exposing ?x ?y makes it untestable in the framework we have.

@afs
Copy link
Contributor

afs commented Nov 27, 2020

I don't see how we can test for "bad implementation" and provide utility.

If we have tests that "sometimes fail", the whole tests suite is called into question.

It would be better to document expected behaviours (e.g. not eval by call sites) in the spec or some other documenation.

@lisp
Copy link

lisp commented Nov 27, 2020

i do not follow this logic.
either it matters, or it doesn't.
if it matters, then test it.
if it does not matter, do not mention it.

@kasei
Copy link
Contributor Author

kasei commented Nov 27, 2020

i do not follow this logic.

either it matters, or it doesn't.

if it matters, then test it.

if it does not matter, do not mention it.

It is both important and may be impossible to test correctly every time given the current testing framework. I think making this perfectly testable would likely require either large changes to the testing framework or enormously burdensome requirements on the PRNG used by implementations.

@afs agreed. Maybe this should be either a 1.1 errata or a sparql-1.2 issue about improving the spec text.

@lisp
Copy link

lisp commented Nov 27, 2020

It is both important and may be impossible to test correctly every time given the current testing framework.

or, by its nature.
is this a situation where it is the test's responsibility to prove something about the test subject or the test subject's responsibility to prove something about itself?

@gkellogg gkellogg added the SPARQL label Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants