-
Notifications
You must be signed in to change notification settings - Fork 558
Literal reworking
Between version 3.4.0 and 4.0, a backwards incompatible change was made in RDFLib to how datatyped literals are handled.
First of all, this all looks very complicated, but rest assured, the changes are actually quite subtle and you are unlikely to notice unless you do something specialised.
Hopefully, the changes will not affect very many users of RDFLib, but this page collects the details of what was changed and has a list of any changes required in code using RDFLib.
The by far biggest problem was that in the pre 4.0 handling of Datatyped Literals, __hash__
and __eq___
were not consistent, i.e.
>>> Literal(2.5) == Literal("2.50",datatype=XSD.float)
True
>>> hash(Literal(2.5)) == hash(Literal("2.50",datatype=XSD.float))
False
This is very bad, and would lead to literals not working probably with data-structures such as sets and dicts.
Also, the old way tried to support equality and comparisons between typed Literals and python objects directly, which was convenient in some cases, but inconsistent and confusing in others.
All comparisons methods for literals have been reworking to be in line with the SPARQL 1.1 spec, which in turn builds in XPath and XML Schema. The nitty-gritty details:
-
Node equality according to
__eq__ / ==
are done according to the SPARQL sameTerm function, which refers to Section 6.5.1 of the RDF Abstract Syntax.__hash___
is naturally consistent with equals. -
A new method
Node.eq
does comparison according to: SPARQL RDF-Term equal (=) - i.e. value-based comparison, as defined in the RDF Abstract Syntax 6.5.2 -
Relative comparisons (
>, <, >=, <=
operators /__lt__, __gt__, __ge__, __le__
methods and therefore sort-ordering of Nodes is done according to SPARQL ORDER BY and <, > operators. The sorting is also done in value space, so all numerically typed literals will sort accordingly, otherwise literals are sorted by language tag, or by datatype URI. Nodes in general are sorted asNone, BNode, Variable, URIRef, Literal
-
Datatyped literals are optionally normalised at creation time, i.e. if a lexical form corresponds to a valid value in the value-space for a datatype, this value is again serialised to a string and this serialisation is used as the lexical form. Easier explained through an example:
>>> Literal("0000001", datatype=XSD.integer)
Literal("1", datatype=XSD.integer)
>>> Literal("0.00000", datatype=XSD.double)
Literal("0.0", datatype=XSD.double)
The flag is either set globally as rdflib.NORMALIZE_LITERALS
or as a keyword argument to Literal.__new__
. Normalization is enabled by default.
- Only semi-related, the Literal class also defines operators for arithmetic,
+, -, /, *, ~, ...
. This now return Literals, rather than whatever Python feels like, allowing us to do:
age=graph.value(bob, myschema.age)
graph.set(bob, myschema.age, age+1)
Most things now work in a fairly sane and sensible way, if you do not have existing stores/intermediate stored sorted lists, or hash-dependent something-or-other, you should be good to go.
i.e.
>>> Literal(2, datatype=XSD.int) == Literal(2, datatype=XSD.float)
False
But a new method eq
on all Nodes has been introduced, which does semantic equality checking, i.e.:
>>> Literal(2, datatype=XSD.int).eq(Literal(2, datatype=XSD.float))
True
The eq
method is still limited to what data-types map to the same value space, i.e. all numeric types map to numbers and will compare, xsd:string and plain literals both map to strings and compare fine, but:
>>> Literal(2, datatype=XSD.int).eq(Literal('2'))
False
If you care about the exact lexical representation of a literal, and not just the value. Either set rdflib.NORMALIZE_LITERALS
to False
before creating your literal, or pass normalize=False
to the Literal constructor
...
You can add new mappings of datatype URIs to python objects using the rdflib.term.bind
method.
This also allows you to specify a constructor for constructing objects from the lexical string representation, and a serialisation method for generating a lexical string representation from an object.