fix serialiser makes invalid turtle curis #3364

lisat-dstg · 2026-01-09T05:44:00Z

Fixes issue #1395

So far just the breaking test (1 new test artefact generates 4 breaking tests) -- one each for formats: trig, ttl, turtle and n3

lisat-dstg · 2026-01-12T00:10:09Z

Suggested location for fix where the entire string should be perform backslash escaping for any character which will fail to parse later (includes % character).

https://github.com/RDFLib/rdflib/blob/7.x/rdflib/plugins/serializers/turtle.py#L331

The list of characters that need backslash escaping is at https://www.w3.org/TR/turtle/#grammar-production-PN_LOCAL_ESC

…cial characters in local names of IRIs

lisat-dstg · 2026-01-12T04:32:35Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

…ackslash escaping

WhiteGobo · 2026-01-13T09:27:27Z

This looks fine to me at least. I have compared it to the discussions in the corresponding issue.
I think getQName should also be renamed to getPName as one of the comments of niklasl suggests.

Can you test every to be escaped character?
And you could add the link https://www.w3.org/TR/turtle/#grammar-production-PN_LOCAL_ESC as comment to getQName.

lisat-dstg · 2026-01-14T01:04:55Z

This looks fine to me at least. I have compared it to the discussions in the corresponding issue. I think getQName should also be renamed to getPName as one of the comments of niklasl suggests.

Can you test every to be escaped character? And you could add the link https://www.w3.org/TR/turtle/#grammar-production-PN_LOCAL_ESC as comment to getQName.

I will address these point on Friday. I actually have a feeling some other magic code deals with every character except percent but haven't had a chance to get back to dev environment to test thoroughly. I will split each test into its own file if needed....

ioggstream · 2026-01-14T10:50:35Z

rdflib/plugins/serializers/turtle.py

        prefix, namespace, local = parts

-        local = local.replace(r"(", r"\(").replace(r")", r"\)")
+        local = re.sub(r"[\"'~!$&\(\)*+,;=/\?#@%]", r"\\\0", local)


Thanks! Should this RE be compiled somewhere to make it faster?
I don't know whether there are perf test somewhere though.

e.g.

# After imports ... RE_ESCAPE_CHARS = re.compile(r"[\"'~!$&*+,;=/\?#@%]") ... # in the function ... local RE_ESCAPE_CHARS.sub(r"\\\0", local)

Yes absolutely should :)

lisat-dstg · 2026-01-19T05:35:47Z

These are the canonical references that I need to check when I get back to this:

Turtle grammar (see https://www.w3.org/TR/turtle/#sec-grammar-grammar) defines LOCALNAME in the PN_LOCAL production (see https://www.w3.org/TR/turtle/#grammar-production-PN_LOCAL).
TriG grammar (see https://www.w3.org/TR/rdf12-trig/#grammar-ebnf) defines LOCALNAME in the PN_LOCAL production (see https://www.w3.org/TR/rdf12-trig/#grammar-production-PN_LOCAL).
N-Triples grammar (see https://www.w3.org/TR/rdf12-n-triples/#sec-grammar-grammar) does not define LOCALNAME because N-Triples doesn't permit prefixed names. It permits something similar for blank nodes, but not for named nodes, so is irrelevant to disucssion/specification of LOCALNAME.
Same for N-Quads.
RDF XML does include the concept of LOCALNAME. IRIs can be formed in three ways, one of which is via qualified names (namespace-qualified elements or attribute names). Qualified names (QNames) are basically the XML version of prefixed names. Just like a prefixed name a QName has a namespace prefix followed by colon followed by a localname. The QName grammar (see https://www.w3.org/TR/REC-xml-names/#ns-qualnames) decribes the localname equivalent grammar in the LocalPart production (see https://www.w3.org/TR/REC-xml-names/#NT-LocalPart) which in turn is described in the NCName production (see https://www.w3.org/TR/REC-xml-names/#NT-NCName) which states the localname comprises all characters in the Name production minus colon (see https://www.w3.org/TR/REC-xml/#NT-Name)

lisat-dstg · 2026-01-23T07:15:15Z

Removing the following from the test file as none of these triples failed roundtrip and I assume there is probably test coverage elsewhere for these. It was just the percent sign being escaped that failed.

:foo_\'_bar :prop "test iri including escaped char '" .
:foo_\~_bar :prop "test iri including escaped char ~" .
:foo_\!_bar :prop "test iri including escaped char !" .
:foo_\$_bar :prop "test iri including escaped char $" .
:foo_\&_bar :prop "test iri including escaped char &" .
:foo_\(_bar :prop "test iri including escaped char (" .
:foo_\)_bar :prop "test iri including escaped char )" .
:foo_\*_bar :prop "test iri including escaped char *" .
:foo_\+_bar :prop "test iri including escaped char +" .
:foo_\,_bar :prop "test iri including escaped char ," .
:foo_\;_bar :prop "test iri including escaped char ;" .
:foo_\=_bar :prop "test iri including escaped char =" .
:foo_\/_bar :prop "test iri including escaped char /" .
:foo_\?_bar :prop "test iri including escaped char ?" .
:foo_\#_bar :prop "test iri including escaped char #" .
:foo_\@_bar :prop "test iri including escaped char @" .

… get_pname incl docs RDFLib#1395

lisat-dstg · 2026-01-23T09:33:20Z

Ready again.

The test and fix is now very precisely targeted to escaped percent character in a localname. This works in Jena turtle serialiser/parser just fine. Rdflib must have missed it due to percent character appearing in the grammar specifically to percent-escape other/unprintable characters with 2-digit hexadecimal sequence.

The fix is using a precompiled regex to detect percent (%) characters not followed by 2-digit hex sequence. Such characters are replaced by blackslash plus percent character.

As requested also took opportunity to rename getQName function to get_pname where it appeared to be in fact getting a Prefixed Name. This was relevant to or touched four serialisers: Turtle, Long turtle, Trig and N3.

lisat-dstg · 2026-01-23T09:34:14Z

Am I expected to rebase and selectively squash to clean up commits or will they get squashed on merge?

lisat-dstg · 2026-02-02T00:05:26Z

Bump - anyone out there?

edmondchuc · 2026-02-02T02:17:36Z

Hi @lisat-dstg, thanks for providing a fix. I haven't reviewed your code just yet but I have approved the running of the validate workflows on this PR.

There appears to be a mypy error. Do you mind taking a look? If you're unsure of anything, please reach out and I'll try to respond promptly.

  py38: commands[3]> poetry run python -m mypy --show-error-context --show-error-codes --junit-xml=test_reports/3.8-macos-latest-mypy-junit.xml
  rdflib/plugins/serializers/longturtle.py: note: In member "get_pname" of class "LongTurtleSerializer":
  rdflib/plugins/serializers/longturtle.py:198: error: "LongTurtleSerializer" has no attribute "LOCALNAME_PECRENT_CHARACTER_REQUIRING_ESCAPE_REGEX"  [attr-defined]
  Found 1 error in 1 file (checked 464 source files)

rdflib/plugins/serializers/turtle.py

…ib#1395

lisat-dstg · 2026-02-02T06:22:11Z

Hi @lisat-dstg, thanks for providing a fix. I haven't reviewed your code just yet but I have approved the running of the validate workflows on this PR.

There appears to be a mypy error. Do you mind taking a look? If you're unsure of anything, please reach out and I'll try to respond promptly.
  py38: commands[3]> poetry run python -m mypy --show-error-context --show-error-codes --junit-xml=test_reports/3.8-macos-latest-mypy-junit.xml
  rdflib/plugins/serializers/longturtle.py: note: In member "get_pname" of class "LongTurtleSerializer":
  rdflib/plugins/serializers/longturtle.py:198: error: "LongTurtleSerializer" has no attribute "LOCALNAME_PECRENT_CHARACTER_REQUIRING_ESCAPE_REGEX"  [attr-defined]
  Found 1 error in 1 file (checked 464 source files)

Thanks @edmondchuc I think I've fixed but having trouble with dev envt. Can you please rerun the CI pipelines? Thx

edmondchuc · 2026-02-04T00:55:01Z

@lisat-dstg I've triggered the CI again. Sorry for the delay.

Your changes look great. It correctly escapes % when not followed by two hex digits, as demonstrated in your test file, so percent-encoded values remain untouched. Nice work!

RDFLib#1395 add breaking test artefact for makes invalid turtle cURIs

0c02602

lisat-dstg changed the base branch from main to 7.x January 9, 2026 05:44

lisat-dstg mentioned this pull request Jan 9, 2026

RDFlib makes invalid Turtle cURIs #1395

Open

RDFLib#1395 fix turtle serialisation to backslash-escape required spe…

c1783c4

…cial characters in local names of IRIs

lisat-dstg marked this pull request as ready for review January 12, 2026 04:25

pre-commit-ci bot and others added 3 commits January 12, 2026 04:32

[pre-commit.ci] auto fixes from pre-commit.com hooks

bf362e4

for more information, see https://pre-commit.ci

RDFLib#1395 update test cases

decadf5

fix: Update regex for special characters in IRI localname that need b…

82e6ad9

…ackslash escaping

ioggstream reviewed Jan 14, 2026

View reviewed changes

lisat-dstg added 3 commits January 23, 2026 09:13

fix: More focused test; more focused fix; Also renamed getQName fn to…

8bd74e5

… get_pname incl docs RDFLib#1395

style: format with ruff RDFLib#1395

7272656

style: another ruff fix RDFLib#1395

6cb2425

lisat-dstg commented Feb 2, 2026

View reviewed changes

rdflib/plugins/serializers/turtle.py Show resolved Hide resolved

fix: Move 'percent' regex from turtle serializer to parent class RDFL…

aad5a53

…ib#1395

Merge branch '7.x' into 1395-fix-serialiser-makes-invalid-turtle-curis

0367f03

fix serialiser makes invalid turtle curis #3364

Are you sure you want to change the base?

fix serialiser makes invalid turtle curis #3364

Uh oh!

Conversation

lisat-dstg commented Jan 9, 2026

Uh oh!

lisat-dstg commented Jan 12, 2026

Uh oh!

lisat-dstg commented Jan 12, 2026

Uh oh!

WhiteGobo commented Jan 13, 2026

Uh oh!

lisat-dstg commented Jan 14, 2026

Uh oh!

ioggstream Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lisat-dstg Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

lisat-dstg commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lisat-dstg commented Jan 23, 2026

Uh oh!

lisat-dstg commented Jan 23, 2026

Uh oh!

lisat-dstg commented Jan 23, 2026

Uh oh!

lisat-dstg commented Feb 2, 2026

Uh oh!

edmondchuc commented Feb 2, 2026

Uh oh!

Uh oh!

lisat-dstg commented Feb 2, 2026

Uh oh!

edmondchuc commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ioggstream Jan 14, 2026 •

edited

Loading

lisat-dstg commented Jan 19, 2026 •

edited

Loading