Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of SPARQL protocol tests : questions #51

Open
BorderCloud opened this issue Aug 10, 2018 · 14 comments
Open

Implementation of SPARQL protocol tests : questions #51

BorderCloud opened this issue Aug 10, 2018 · 14 comments

Comments

@BorderCloud
Copy link
Contributor

I started to implement with Jmeter the SPARQL protocol tests.
https://github.com/w3c/rdf-tests/blob/gh-pages/sparql11/data-sparql11/protocol/manifest.ttl

I will add a property action in the ttl with the jmx file of Jmeter.

In input of each test, 3 variables (for the moment):

  • hostname
  • port
  • root path before "sparql/update/..." (ie. without sparql or query or other at the end)

Command line will be:
jmeter -n -t .../data-sparql11/protocol/test1.jmx -JHOSTNAME=172.17.0.3 -JPORT=8890 -JPATH=/

Everyone is ok with these variables ?

My first questions:

  • /sparql/?... or /sparql?... With or without slash ? both ?
  • In the request header "Content-Type: application/sparql-query" is optional ?
  • In the request header "User-agent: sparql-client/0.1" is optional ?

It's not clear in the comments of tests.

Thanks.

BorderCloud added a commit to BorderCloud/rdf-tests that referenced this issue Aug 10, 2018
BorderCloud added a commit to BorderCloud/rdf-tests that referenced this issue Aug 11, 2018
BorderCloud added a commit to BorderCloud/rdf-tests that referenced this issue Aug 11, 2018
@afs
Copy link
Contributor

afs commented Aug 11, 2018

/sparql/?... or /sparql?... With or without slash ? both ?

The endpoint can be any name, both of those are possible and so is "/dataset". The endpoint for update and the endpoint for query may be the same one or different ones. It'll probably need two variables.

In the request header "Content-Type: application/sparql-query" is optional ?

The correct Content-type depends on the test: it could be nothing (for GET with query string), or for POSTs application/sparql-query or application/x-www-url-form-urlencoded (HTML form) so if you are POSTing the query in the body, the Content-Type is not optional in general though I would not be surprised if some endpoints ignore and assume one of those content types.

In the request header "User-agent: sparql-client/0.1" is optional ?

The spec does not require "User-agent". Anything regular HTTP is allowed. You might want to add a User-agent with the name of your test runner.

@BorderCloud
Copy link
Contributor Author

BorderCloud commented Aug 11, 2018

May be, it's necessary to do a difference between the features of different databases and the minimum vital for SPARQL.

In my humble opinion, the minimum vital is one endpoint for query and, in same time, update with the word "sparql" at the end.
POST or GET http://example.org/XXXX/sparql?query= or http://example.org/XXXX/sparql?update=
(personally, I don't understand why several databases insert a slash after sparql and other not...or slash only for POSTing the query in the body... it's not clear for the moment)

DCAT and SPARQL services ontologies have only one property endpoint for read and update (there are not a property for endpoint update and another for query). (dcat:accessURL [2] , sd:endpoint [1])

Ofcourse in the config of the database, the administrator can do anything he wants but the tests are the minimum vital and also to simplify the communication between nodes in the Linked Data.
If the problem to have two endpoints is a necessity for a question of security, we need to refresh the protocol.

I'm implementing these tests to help to have the minimum common between SPARQL services to help the developers to implement only one client for all SPARQL services in same time...

Minimum number of endpoints for one SPARQL service ? one or 2 ?

[1] https://www.w3.org/TR/sparql11-service-description/#sd-endpoint
[2] https://www.w3.org/TR/vocab-dcat/

@afs
Copy link
Contributor

afs commented Aug 11, 2018

We don't need to get into making choices: what does the protocol spec say about naming?

(sd:endpoint - there can be multiple declarations for one service)

@BorderCloud
Copy link
Contributor Author

BorderCloud commented Aug 12, 2018

(sd:endpoint - there can be multiple declarations for one service)

How an agent can choose the good endpoint if there are multiple endpoint (with any name) ? It has to test each endpoint before to read or write ?

We don't need to get into making choices: what does the protocol spec say about naming?

The spec is one thing... but the different implementations of different SPARQL services prove the specification is not enough clear or too wide to implement the same SPARQL client for everybody. How implement the federated queries in a database if the API is different for each editor ?

To respond at your question about specification, the specification uses the verb "may" about this problem without never really to talk about this problem. The protocol is for me unfinished. The non-definition of error messages is the proof (and also in the SPARQL result, there is no place for the error messages).
For me, it's a mistake not to mention the number of endpoints for one SPARQL service. It might be obvious or optional for the writer of the document.
The result in 2018, there is still no consensus. I had implemented a basic SPARQL client for only 4-6 databases, I have already 4 parameters (2 endpoints and 2 parameters) and I know it's not enough to works with the other databases (accept is sometimes required or not...).

Is there no way to converge on a common minimum between different implementations ? a minimun core ?

@lisp
Copy link

lisp commented Aug 12, 2018

your comments have described the minimum core which covers the recommendations, yet you observe,

The spec is one thing... but the different implementations of different SPARQL services prove the specification is not enough clear or too wide to implement the same SPARQL client for everybody.

for one approach, see: https://doc.yasgui.org/doc/

the issue with federation is not clear:

How [is one to] implement the federated queries in a database if the API is different for each editor ?

please expand on it. the remark on deficiencies as to sparql results appears to concern goals for you test framework which relate to errors:

The protocol is for me unfinished. The non-definition of error messages is the proof (and also in the SPARQL result, there is no place for the error messages).

what are those goals?
do you intend to accomplish them within the limitations of http?

@BorderCloud
Copy link
Contributor Author

BorderCloud commented Aug 12, 2018

Yasgui and me use one endpoint for reading (with sparql at the end or not) but it is too often not enough.
Look the interface of Yasgui to describe one endpoint only for reading, I do it also. There are too many options for a beginner :
screenshot_20180812_123505

Not clear: How [is one to] implement the federated queries in a database if the API is different for each editor ?

In the spec : The query operation MUST be invoked with either the HTTP GET OR HTTP POST method.
So, if an editor implements query only via GET and another SPARQL service uses query POST by default. The specification allows it but these databases cannot communicate to resolve a federated querie. How to write in a SPARQL query the optional parameters of different SPARQL services ?

please expand on it. the remark on deficiencies as to sparql results appears to concern goals for you test framework which relate to errors:

The protocol is for me unfinished. The non-definition of error messages is the proof (and also in the SPARQL result, there is no place for the error messages).
what are those goals?
do you intend to accomplish them within the limitations of http?

I'm developing Sgvizler2. I use it to debug SPARQL query. When it's a federated query, the error messages are lost on the way between the different SPARQL services. It's a big problem for the SPARQL developers.
All my works are within the limitations of http.

@lisp
Copy link

lisp commented Aug 12, 2018

re.

Look the interface of Yasgui to describe one endpoint only for reading, ...

the reference was to the documentation, wherein the "client configuration documentation" is pertinent. that is, there can be many and the client must include provisions to disambiguate.

re:

... if an editor implements query only via GET and another SPARQL service uses query POST by default. The specification allows it but these databases cannot communicate to resolve a federated querie.

if the client presumes the incorrect method, it will fail.
on the other hand, http does support a degree of introspection through the OPTION method and/or the interpretation of a '501, whereby, if the endpoint does not include sufficient information in those responses, one might judge that to be a deficiency of the implementation, not just either the http or the sparql protocol.

@afs
Copy link
Contributor

afs commented Aug 12, 2018

Yasgui shows it is possible to implement a client for any endpoint.

It has to test each endpoint before to read or write ?

It needs the URL (hostname, port, path etc) to "/sparql" at the server anyway so I don't see this as any different. The UI allows for yagui to be pointed at an endpoint.

The usual way is to serve Yasgui from the server (see Apache Jena Fuseki) and then defaults for the fields are filled in. It is one of the feature of Yasgui.

If the purpose here is to write a jmeter test suite for SPARQL protocol test suite then have two parameters, query and update endpoints, or if only given one, use that for both.

If the purpose here is to promote a particular way of setting up public services, that is a fine but different purpose.

The query operation MUST be invoked with either the HTTP GET OR HTTP POST method.

That is not an accurate quote - the spec has MUST (RFC 2119) emphasized, not the "or". Its either-or for a single operation, it has go over one or the other, not both, not another way (SOAP, json-rpc, CORBA, ...).

@BorderCloud
Copy link
Contributor Author

BorderCloud commented Aug 12, 2018

I finished to integrate JMeter in TFT. You can see the first result in this draft of automatic report for SPARQL 1.1 and reproduce it.

This document and the workflow of TFT are experimental to visualize the actual interoperability between several editors of SPARQL services. It uses a new method to execute the SPARQL test suite (reproducible free).

For the moment, the protocol tests are too permissives (there are successes with 404 code). I let you to look it when you have the time.

If you want "two parameters, query and update endpoints", why not insert other parameters like the names of parameters for query and update endpoints (different sometimes) ? We can do it...
But is it the best thing to simplify SPARQL ?

@afs
Copy link
Contributor

afs commented Aug 12, 2018

The report still contains many mistakes that have been there for several years. It does not reflect previous feedback provided with corrections.

@BorderCloud
Copy link
Contributor Author

BorderCloud commented Aug 13, 2018

@afs an example ? I use now the last version of w3c/rdf-tests project (because there are a lot new tests). If it's bug in TFT can you open an issue in TFT project ? If it's a mistake in this project, you can open also an issue in this current project.

Concerning the protocol, to have the same API for all solutions, I have an idea: Varnish can realign the different APIs.
POST or GET http://example.org/XXXX/sparql?query= >>>[ Varnish >>> Any API (Docker image)]
POST or GET http://example.org/XXXX/sparql?update= >>>[ Varnish >>> Any API (Docker image)]
With this solution, you (or I) can insert all parameters necessary in Varnish to pass the tests about the minimum vital. I have not the time to implement all the protocol in Varnish for 4 databases. I will propose only the JMeter tests on this minimal API.

I prefer an unique API common with few features than a lot features without common API. I can insert Varnish the next week in the workflow.

Update: other advantage with Varnish, I am able to trace the transactions for federated queries between databases.

@afs
Copy link
Contributor

afs commented Aug 15, 2018

One example: xsd:string is not required in RDF 1.1, but there are test failures marked when it is missing but also the other way round as well.

POST or GET http://example.org/XXXX/sparql?query=

Be careful: There is no requirement that ?query is the first query string argument.

POST or GET http://example.org/XXXX/sparql?update=

GET+update is outside the spec (it is very bad HTTP practice to send changes via GET).

@BorderCloud
Copy link
Contributor Author

BorderCloud commented Aug 15, 2018

One example: xsd:string is not required in RDF 1.1, but there are test failures marked when it is missing but also the other way round as well.

A link ?
if a test has to support all the possibilities "not required", it may be necessary to replace the examples of results with XML schemas for XML.

POST or GET http://example.org/XXXX/sparql?query=
Be careful: There is no requirement that ?query is the first query string argument.

In the examples and tests (and the majorities of implementation), query is a mandatory parameter. A developper can switch easily between GET and POST. It's more simple to develop a SPARQL client with the same parameters for GET and POST.
If it's not in the requirements, it's too bad. I'm talking about minimum to use and implement easily a minimal SPARQL client.

POST or GET http://example.org/XXXX/sparql?update=
GET+update is outside the spec (it is very bad HTTP practice to send changes via GET).

I know it's bad via GET but it is necessary when a developer works It's very useful to test sometimes a little update query via a browser.
In my opinion, in production, a client SPARQL should work only with POST requests (and with credentials... not specified in this protocol).

BorderCloud added a commit to BorderCloud/rdf-tests that referenced this issue Aug 18, 2018
…edMustToHave

In manifest, replace /sparql/ by /sparql and insert new status dawgt:ApprovedMustToHave for 3 tests.
fix 3 tests jmx
w3c#51
BorderCloud added a commit to BorderCloud/tft-jena-fuseki that referenced this issue Aug 18, 2018
@BorderCloud
Copy link
Contributor Author

BorderCloud commented Aug 18, 2018

I tested Varnish. I cannot change the body of HTTP message but I can see the parameters in the body and to do a redirection if necessary.

So I implemented only 3 tests (with a new status in the manifest, dawgt:ApprovedMustToHave) and now if a database doesn't support this minimum, I can realign quickly their APIs with Varnish:

GET http://example.org/XXXX/sparql?query=                       => XML
POST http://example.org/XXXX/sparql     Body : query=       => XML
POST http://example.org/XXXX/sparql     Body : update=... 

With this method, you can pass all tests of SPARQL 1.1 to a database even if it does not implement the minimum of this protocol.

Update of report : http://tft-reports.bordercloud.com/
It's the last report with the protocol tests at 100% (with varnish for 3/4 of databases).
Examples of Varnish configuration to realign a SPARQL API : realign the endpoint and realign the default content-type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants