Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL Tests for REGEX function #46

Open
leipert opened this issue Jun 8, 2017 · 10 comments
Open

SPARQL Tests for REGEX function #46

leipert opened this issue Jun 8, 2017 · 10 comments
Labels

Comments

@leipert
Copy link

leipert commented Jun 8, 2017

Recently I had a look into the pattern definition of REGEX (and also REPLACE) to find out what "regex flavor" SPARQL is using. The regex definition of XQuery 1.0 and XPath 2.0 Functions and its base XML Schema Part 2 are quite interesting reads (and the latter is actually fun to read).

Unfortunately it seems like none of the SPARQL 1.1 libraries I tested (ARQ and RDFLib) use those definitions. They use the regex implementations of the underlying programming languages Java.util.regex and python's re. Who would blame the developers? This leads to a mismatch between the SPARQL 1.1 specification and the implementation.

Apache Jena ARQ has 100% in the latest implementation report, while not supporting the REGEX function correctly. In my opinion tests for this function should be added in the next iteration of the test suite.

A few of the queries I tested (all taken from XQuery/XPath regex description):

If I were to implement the tests, I would test, that...

  • ... all flags (s, m, i, x) work correctly.
  • ... faulty flags lead to errors
  • ... Character Classes work as intended
@lisp
Copy link

lisp commented Jun 8, 2017

if you would like to see them included, the best way is to implement them.

@leipert
Copy link
Author

leipert commented Jun 8, 2017

I know. Unfortunately I do not have the time at the moment, I just wanted to document it here, so that I can come to that back later and have something to start with. Or that someone else can have a crack at them.

Furthermore I have these issues (no critique, just mentioning) that prevent me from starting right away:

  1. I do not know how to run the tests (https://www.w3.org/2009/sparql/protocol_validator seems to return this perl script and not provide an service. From a quick glance on the repo I cannot figure out how to test a locally running server). And I definitely do not want to implement a test runner (right now)
  2. Even if I propose new tests and they are merged into this repository, it's not clear whether and when they will be moved from a proposal to approved "real" tests. @gkellogg explained the process to me . FWIW it is quite "depressing" to work on something if you do not know what comes first: a) Review of your work b) New Version of SPARQL
  3. Almost no implementation I care about (e.g. Stardog, OpenLink Virtuoso) seems to care about generating those reports. Of the 20 known implementations, only 8 even have a generated report.
  4. Regex is hard. And I believe that the maintainers of the few existing tools have more important problems to solve, than starting to write an own regex engine from scratch or bending existing ones to the extend that they are conform to SPARQL spec.

I could have written more than 5 tests by know instead of writing this. Please take it with a grain of salt and maybe I will have time in the future and

  1. find out to run the tests
  2. be more patient
  3. not only bug the maintainers of this repo, but also maintainers of SPARQL stores
  4. write a XPath regex engine for Java, JS, Python and C. (Just kidding on that last point)

PS: I definitely do not want to insult the maintainers of tools related to RDF or thinkers of the specs. I have the utmost respect for them.

@afs
Copy link
Contributor

afs commented Jun 9, 2017

ARQ passes both those tests when run in strict mode. --strict from the command line; ARQ.setStrictMode() in code.

Strict mode switches the regex engine to Apache Xerces.

@leipert
Copy link
Author

leipert commented Jun 9, 2017

Thanks for the info :)
However: org.apache.xerces.impl.xpath.regex.RegularExpression seems to implement more flags (e.g. u), which should throw a err:FORX0001 and it doesn't implement the character class \c.

  • Should throw, but doesnt:
    ASK {
      BIND("helloworld" AS ?s) .
      FILTER(REGEX(?s, "hello", "u")) .
    }
    
  • Throws, but should return yes:
    ASK {
      BIND("helloworld" AS ?s) .
      FILTER(REGEX(?s, "^\\c")) .
    }
    

Regarding the last query. I even suspect that the escaping of the backslash is "wrong", so that it should be REGEX(?s, "\w") instead of REGEX(?s, "\\w") for example.

@afs
Copy link
Contributor

afs commented Jun 9, 2017

It is \\ - extra \ to SPARQL-escape into the string as \.

@afs
Copy link
Contributor

afs commented Jun 9, 2017

FILTER(REGEX(?s, "hello", "u")) is false, not error. If an expression throws an error, the filter is false. I checked with direct evaluation and it does indeed throw a runtime error.

@afs
Copy link
Contributor

afs commented Jun 9, 2017

FILTER(REGEX(?s, "^\\c")) ARQ does ahead of time compilation of static patterns. If switched off (not configurable), you will get "no" because "^\c" is not a legal regex. \c must be followed by a character to indicate the control code.

@afs
Copy link
Contributor

afs commented Jun 9, 2017

ARQ does not support flag "q".

@leipert
Copy link
Author

leipert commented Jun 9, 2017

  1. \\ vs \. Yes, I saw it in the grammar. Thanks for clarifying.

  2. \c is defined as :

    Set of name characters, those ·match·ed by NameChar
    [src]

    So the \c of XML Schema is definitely not the same as Apache Xerces regular expression \c.

  3. If the example with the u flag returns "no", shouldn't the other example also return no instead of throwing?

@leipert
Copy link
Author

leipert commented Jun 9, 2017

FWIW: I hope you agree, that a more granular regex test suite for SPARQL is needed. The discussion already slips away and turns into an implementation specific one. Instead of arguing about the details of arqs implementation we should argue about test cases.

I am happy to start implementing test cases after July 18th, if someone could provide me guidance on how to run the suite (#47)

@gkellogg gkellogg added the SPARQL label Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants