Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no domain error checks #4

Open
jerch opened this issue Feb 9, 2023 · 9 comments
Open

no domain error checks #4

jerch opened this issue Feb 9, 2023 · 9 comments

Comments

@jerch
Copy link

jerch commented Feb 9, 2023

It seems the SIMD version does no domain error checks while decoding?

Imho you should at least mention that in the readme, so ppl are aware of this limitation and dont run into follow-up bugs by decoded bytes from broken base64 data.

On a sidenote - proper base64 decoding with SIMD is a remarkably tough issue - see the works of Lemire and Mula. Furthermore none of their speedy solutions works with wasm-simd, as they rely on 256 or 512 bit width.

@mitschabaude
Copy link
Owner

When you say "domain error checks", are you referring to something like erroring on invalid characters in the input?

@jerch
Copy link
Author

jerch commented Feb 10, 2023

Yepp, anything beyond the base64 alphabet chars.

@mitschabaude
Copy link
Owner

Interesting, yeah I have to check out Lemire. You're right, I guess my implementation just applies some piece wise linear mapping to the input characters after they've been interpreted as bytes by the built-in JS text encoder, to map them to the 0-63 range

@mitschabaude
Copy link
Owner

Which means characters out of the expected range will produce some random garbage

@mitschabaude
Copy link
Owner

mitschabaude commented Feb 10, 2023

However, I think I could just do a few more lanewise gt / lt instructions to check the bounds, and then error if any of them are true. Shouldn't add much overhead

@jerch
Copy link
Author

jerch commented Feb 10, 2023

Yeah, you prolly can simply aggregate error conditions in another v128 variable, which only needs to be eval'ed once at the end. Would be enough for a good vs. bad data distinction (unless you want to return the position of the first faulty byte).

I dont recommend to introduce branching in the data loop (means early error exit), this is really a perf killer in SIMD realms.

@jerch
Copy link
Author

jerch commented Feb 12, 2023

Hmm I get almost no speed benefit with SIMD + error check compared to a uint32 LUT variant (its ~1.5 GB/s SIMD vs ~1.4 GB/s LUT). Still investigating whether SIMD can be further optimized (and I cannot go w'o error check, as the data comes untested from outside).

@jerch
Copy link
Author

jerch commented Feb 13, 2023

Update:
With Mulas pshufb variant I get it to >2.5 GB/s (correct error detection still untested). All other variants are much worse. The higher bit variants by Lemire are not applicable due to a lack of 256/512 SIMD.

@jerch
Copy link
Author

jerch commented Feb 13, 2023

Here is my adoption of Mulas ideas with error checking: https://github.com/jerch/xterm-addon-image/blob/de66af18e1b06e0d244ae3632c4d308950c69b15/src/base64.wasm.ts#L102 (Its commented out, since I wont use it until Safari gets wasm-simd sorted out.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants