You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Turns out that 0xed 0xb3 0xb6 is valid utf8, but describes \uDCF7 which is not a valid defined codepoint, which postgres barfs on when you try to insert it.
Python2 doesn't recognise there being anything invalid about it, however.
The workaround in the end was to use iconv_codecs to use iconv to strip invalid codepoints out of the string before handing to postgres, with something like:
matrixbot
changed the title
Dummy issue
Sqlite can end up with valid utf8 sequences which describe invalid codepoints, which break synapse_port_db
Dec 21, 2023
This issue has been migrated from #3538.
Error ends up looking like:
in the event_search logic.
Turns out that 0xed 0xb3 0xb6 is valid utf8, but describes \uDCF7 which is not a valid defined codepoint, which postgres barfs on when you try to insert it.
Python2 doesn't recognise there being anything invalid about it, however.
The workaround in the end was to use
iconv_codecs
to useiconv
to strip invalid codepoints out of the string before handing to postgres, with something like:row["value"].encode("iconv:utf8", "ignore").decode("utf8")
Which seemed to work on linux, but fails on macOS.
Thanks to
@flux:matrix.org
for reporting and debugging this!The original cause of the bad data is matrix-org/synapse#3537
The text was updated successfully, but these errors were encountered: