As already mentioned in Slack, the support for wide-character strings needs a rather complete overhaul and/or an explicit documentation of its features. As far as I understand it is broken on macos/Linux.
- First of all, usage of
sqlite_*_text16() vs. wstring_convert<codecvt_utf8_utf16<wchar_t>> is kind of intermixed.
- There are only unit tests for a single code path: binding to a statement and extracting from a result set via
codecvt_utf8_utf16<wchar_t>, but neither conversion from a column value nor calling a function or returning from it. [see test case]
- Returning a string from a function is broken [
statement_binder<>::result()]
sqlite3_result_text16() expects the number of bytes, not characters. [see 3rd parameter]
sqlite3_result_text16() should be instructed to copy the string using SQLITE_TRANSIENT), otherwise the resulting memory goes out of scope. [see 4th parameter]
- Expecting UTF-16 encoded strings is correct on Windows, but not on other operating systems like macos/Linux:
- On Windows, everything's working fine:
sizeof(wchar_t) == 2 (16-bit), and encoding is UTF-16.
- On macos/linux:
sizeof(wchar_t) == 4 (32-bit):
- Using
sqlite3_*_16() functions is outrightly wrong.
- Using
codecvt_utf8_utf16<> is bad:
- While it's not prohibited to use wchar_t for UTF-16, it easily leads to subtly unexpected behaviour: Because wchar_t is 32-bit, it usually carries UTF-32, not UTF-16.
- Passed UTF-32 strings are suddenly treated as UTF-16 by sqlite_orm/sqlite.
- Returned UTF-16 strings are suddenly treated as UTF-32 by the program.
- In any case,
[codecvt_utf8_utf16<>](https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16) expects UTF-16, no matter the sizeof wchar_t: "If Elem is a 32-bit type, one UTF-16 code unit will be stored in each 32-bit character of the output sequence.". I emphasize again that this isn't the regular expectation on macos/Linux.
- While we are at it, I'd like to see a separation of wide-string support from
SQLITE_ORM_OMITS_CODECVT, if possible: One might want to be able to pass or return wide-strings from Windows API functions, even if not being able to serialize the statement.
One way of fixing the UTF-16 issue on macos/Linux quickly is by disabling UTF-16 unicode when not on Windows altogether. This might not even have any impact, given that UTF-8 is prevalent on those systems.
As already mentioned in Slack, the support for wide-character strings needs a rather complete overhaul and/or an explicit documentation of its features. As far as I understand it is broken on macos/Linux.
sqlite_*_text16()vs.wstring_convert<codecvt_utf8_utf16<wchar_t>>is kind of intermixed.codecvt_utf8_utf16<wchar_t>, but neither conversion from a column value nor calling a function or returning from it. [see test case]statement_binder<>::result()]sqlite3_result_text16()expects the number of bytes, not characters. [see 3rd parameter]sqlite3_result_text16()should be instructed to copy the string usingSQLITE_TRANSIENT), otherwise the resulting memory goes out of scope. [see 4th parameter]sizeof(wchar_t) == 2(16-bit), and encoding is UTF-16.sizeof(wchar_t) == 4(32-bit):sqlite3_*_16()functions is outrightly wrong.codecvt_utf8_utf16<>is bad:[codecvt_utf8_utf16<>](https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16)expects UTF-16, no matter the sizeof wchar_t: "If Elem is a 32-bit type, one UTF-16 code unit will be stored in each 32-bit character of the output sequence.". I emphasize again that this isn't the regular expectation on macos/Linux.SQLITE_ORM_OMITS_CODECVT, if possible: One might want to be able to pass or return wide-strings from Windows API functions, even if not being able to serialize the statement.One way of fixing the UTF-16 issue on macos/Linux quickly is by disabling UTF-16 unicode when not on Windows altogether. This might not even have any impact, given that UTF-8 is prevalent on those systems.