Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

Consider adopting lookup4 #3

Open
lemire opened this issue Jun 24, 2020 · 5 comments
Open

Consider adopting lookup4 #3

lemire opened this issue Jun 24, 2020 · 5 comments

Comments

@lemire
Copy link

lemire commented Jun 24, 2020

The simdjson library has a new UTF8 validator called lookup 4 which is simpler and faster than most alternatives.

See

https://github.com/simdjson/simdjson/blob/master/src/generic/stage1/utf8_lookup4_algorithm.h

It is really not a lot of code!!!

@lemire lemire changed the title Consider adopting lookup3 Consider adopting lookup4 Jun 27, 2020
@milkey-mouse
Copy link

first fastvalidate-utf-8, then lookup2, 3, and 4... you're writing new validators faster than I can port them!

@lemire
Copy link
Author

lemire commented Jun 30, 2020

I think that lookup4 is going to be hard to beat. It is really down to the metal. Have a look.

@milkey-mouse
Copy link

milkey-mouse commented Jun 30, 2020

I just benchmarked lookup4 (as implemented in simdjson) vs. this implementation (vs. the standard library's). lookup4 comes out on top in all but a couple cases. It's also much more consistent (not relying on the branch predictor as much)

@lemire
Copy link
Author

lemire commented Jul 1, 2020

+1

@pickfire
Copy link

This is done in https://github.com/rusticstuff/simdutf8

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants