Skip to content

utf8_arith.h - utf8_decode_step() doesn't decode all valid sequences correct #1

@gulrak

Description

@gulrak

The utf8_decode_step in utf8_arith.h doesn't work for various valid sequences, e.g. "\xED\x81\x80" should be correctly decoded to codepoint U+D040, but the function decodes it wrongly to U+D000 (tested on macOS with clang from XCode 9.2 with unoptimized debug code).

Failing UTF-8 sequences start with 0xE0, 0xED, 0xF1, 0xF2 and 0xF3.
I couldn't easily find the reason, but it shouldn't be used (or with care) as it is now.

The utf8_branch.h version, while using the same tables, works flawless in my tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions