Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Str.from_utf8/16/32 with lossy variants #7514

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

shua
Copy link
Contributor

@shua shua commented Jan 14, 2025

This change adds the following new functions to the builtin Str module: from_utf8_lossy, from_utf16, from_utf16_lossy, from_utf32, from_utf32_lossy. Only from_utf8_lossy was implemented as a call to zig bitcode, all the rest are implemented as roc code.

I originally implemented them all as zig bitcode, following the existing implementation of from_utf8. I ported it to roc code because it makes for a smaller change, and it feels like a good precedent to implement what I can in roc, and only port it to bitcode as necessary.

There's an issue with the wasm backend where decoding surrogate pairs throws an integer multiplication overflow exception. If I replace the single multiply in the implementation with an equivalent Num.shift_left_by then it panics with integer subtraction overflow exception. I wasn't able to figure out why, so I added a should_panic to the test so it continues to run the test and any magic fixes get flagged as such.

closes #7390

@shua shua force-pushed the fromutf-roc branch 2 times, most recently from 9ad7fe3 to 517ac0a Compare January 14, 2025 21:50
Copy link
Collaborator

@lukewilliamboswell lukewilliamboswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you 😃

I think this kind of change would be a perfect candidate for @bhansconnect's https://github.com/bhansconnect/roc-fuzz though we haven't looked at that in a while.

@imclerran
Copy link
Contributor

Ha! Just started writing from_utf16 this afternoon! Guess I don't need to finish that!

@smores56
Copy link
Collaborator

In case you didn't notice @shua, you've got failures in your tests. Consider running the following commands suggested by the failing test module:

cargo test -p test_mono -p uitest --no-fail-fast
git add -u
git commit -S -m "update mono tests"
git push origin YOUR_BRANCH_NAME

@shua
Copy link
Contributor Author

shua commented Jan 15, 2025

@smores56 Unfortunately, there seems to be some disagreement on mono output between osx aarch64 and every other target. So "fixing" the mono output for one target will break it for the other.

You can see before @lukewilliamboswell 's change, the tests all passed for everything except osx aarch64, after his change only osx aarch64 passes and everything else fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode conversion helpers for Str
4 participants