Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeUtil updates: TryUTF8toUTF16, ReadOnlySpan methods, #1024 #1057

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

paulirwin
Copy link
Contributor

@paulirwin paulirwin commented Dec 4, 2024

  • You've read the Contributor Guide and Code of Conduct.
  • You've included unit or integration tests for your change, where applicable.
  • You've included inline docs for your change, where applicable.
  • There's an open issue for the PR that you are making. If you'd like to propose a change, please open an issue to discuss the change or find an existing issue.

Various improvements to the UnicodeUtil APIs

Fixes #1024

Description

This PR includes several changes to the UnicodeUtil APIs:

  • New TryUTF8toUTF16 method added that uses the Try-method pattern common in .NET, so that it will return false if there is an incomplete UTF-8 byte sequence at the end
  • UTF8toUTF16 has been updated to throw a ParseException instead of IndexOutOfRangeException if there is an incomplete UTF-8 byte sequence at the end
  • New UTF8toUTF16WithFallback method that is like UTF8toUTF16 but adds the U+FFFD character to the output if there is an incomplete UTF-8 byte sequence at the end, instead of throwing
  • Moved ToCharArray to the ObsoleteAPI folder as a partial class, and marked for removal in the 4.8.0 RC
  • Add ReadOnlySpan overloads where applicable
  • Add TryUtf8ToString and Utf8ToStringWithFallback methods on BytesRef that call the respective methods in UnicodeUtil
  • Update callers where it makes sense to use these methods, such as ToString and logging/exception formatting where it doesn't make sense to throw an exception on invalid UTF-8

@paulirwin paulirwin added the notes:improvement An enhancement to an existing feature label Dec 4, 2024
@paulirwin paulirwin requested a review from NightOwl888 December 4, 2024 23:55
@paulirwin paulirwin marked this pull request as ready for review December 15, 2024 20:22
@paulirwin
Copy link
Contributor Author

Publishing as ready for review as I am not aware of any other usages that need to be updated. Let me know if you find any.

@paulirwin paulirwin marked this pull request as draft December 17, 2024 20:39
@paulirwin paulirwin marked this pull request as ready for review December 18, 2024 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes:improvement An enhancement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert UTF8toUTF16 to TryUTF8toUTF16
1 participant