[WIP] Adds function for encoding flags along with strings. #543

tcr · 2014-10-07T01:45:50Z

This does not work on the JIT branch (yet).

natevw · 2014-10-07T21:18:06Z

Heh, nice. I could use this to store one or both of these flags:

is_ascii (fast-path for tm_str_lookup_*, fast-path for tm_str_to_utf8)
is_utf8 (fast-path for tm_str_to_utf8)

Not sure how/if I could use the other 6 or 7 bits to speed up tm_str_lookup_* in the general case though. For that I'd like a GC-associated userdata value where I can store the offset of each multibyte sequence, then I only have to iterate through that rather than the original string. Will that be a possibility later, under LuaJIT?

If a byte is all we'll ever get, perhaps the bits could be used something like this:

Bit	Meaning if set
7	Whole string is UTF-8 (but not ASCII)
6	First half of string is ASCII
5	First half of second half of string is ASCII
4	First half of second half of second half of string is ASCII
3	…turtles…
0	Zeno's paradox is ASCII

I might have the recursive division swapped from what would be most useful (trying to especially speed up iteration through/past the beginning of the string as that would be the most visited portion in a for loop) or maybe just a simple proportional division, or maybe trying to use this bitfield will make the code a horrid mess and there's better uses…

Another thing nice to directly optimize would be str.length, but not sure how valuable a 6-bit attempt at that would be without also optimizing the lookup. (How often is str.length used on its own?)

And while I'm hijacking this PR anyway for random rambling: we need to somehow come up with real-worldish benchmarks before any optimization! I tried one optimization, that could save a bunch of CPU and avoid a malloc/free in some cases, and it slowed down the test suite a little, so I backed it out. Perhaps optimizations made possible by this will also slow the test suite down…how will we judge if they've actually helped real world performance in [what preferred?] use cases.

natevw · 2014-10-07T21:31:57Z

Oooh, another [roughly drafted] idea for the byte:

Value	Meaning
0	Must be ASCII
1–254	Lua length - this = JS length
255	goto slow path

…or perhaps steal the msb to flag "has supplementary characters, i.e. can't treat this CESU-8 as valid UTF-8" and allow the Lua length vs. JS length to differ only by 127 before slow path.

This could optimize str.length if "ASCII text with a few BMP characters sprinkled here and there" is a common case, but doesn't optimize the actual lookup much most of the time. It's more appealing than trying to mark regions of the string, but maybe code simplicity is not as valuable as speeding up iteration through large strings…BENCHMARKS!

tcr · 2014-10-07T21:32:00Z

Let's use the flag for classification for now. I can extend this to, frankly, as many bits as necessary; but one byte lets me not patch colony-lua before I'm ready to.

I'm unsold by the need for caching arbitrary string character access, and mostly put off by the memory constraints. (Unsold of the short-term need, rather; arbitrary character access probably has bigger pitfalls in __index in the short term). For this particular piece of the puzzle, let's assume it will come in the next string-related PR.

I can get to work on storing ucs2_length (and hell, utf8_length?) directly in the struct as 32-bit ints. Those has the largest and most obvious benefit, and I think are straightforward.

Performance test suites are needed. I would like to emulate something like Rusts' where they actually calculate speed tradeoffs overtime (they even commit guard against changes). Right now where so much functionality is being added I'm weary to enforce it, obviously, but if the JIT component ever happens (making progress...) then it's almost imperative that there exist benchmarking of memory and code use. Right now, maybe rudimentary tests for runtime that we can just tack as evidence to PRs? process.hrtime() and funning code is the best case right now.

natevw · 2014-10-07T21:34:56Z

Another random thought, I promise I am trying to close this tab and move on.

We could crowdsource the benchmark! Ship the unoptimized lookup code and see who complains 😜

kevinmehall · 2014-10-07T22:57:58Z

If we determine we want caching for access by index, I like the simplicity of @raffecat's idea from #137 (comment) -- cache the (JS index, byte pos) of the last lookup, and search from there. I'd imagine that most string indexing is going to be largely sequential, so we don't need a large cache (and another memory allocation) for a full table for every multibyte character.

natevw · 2014-10-07T23:07:40Z

Ah, yes, thanks for the reminder — that's a much better idea than odd bitfield tricks for iteration. It might be worth extending it to also cover the case where people are accessing str.length each loop iteration, but seems simple enoug and probably covers a lot of cases.

So basically the best approach might be something like:

use this spare byte to tag broad things like ASCII/UTF-8 opportunities (I wonder how often the flag would be _re_used after first calculation though…)
use static/global vars (I'm assuming C code need not be re-entrant?) to note the string hash (or mere pointer?) and basic state needed for quicker re-entry at the next offset

tcr · 2014-10-10T02:42:42Z

In all fairness, with compiler defines we can probably attach an arbitrary amount of state to each string. Let me ensure this will be future-proof with the JIT branch and I'll update this branch.

This branch should also be rebased over the tcr-utf8 branch so those changes can be made.

natevw · 2014-10-10T03:46:24Z

Before you go changin' lets wait to at least stub out a start on this and see what we need. Changing the byte to a size_t (or a byte and a size_t, or two…oh boy!) would reduce the need for static vars (although wouldn't necessarily be re-entrant yet).

The Objective-C runtime guy gave me some good advice on benchmarking, btw. We should keep an eye out for libraries that benchmark themselves (in appropriate ways, perhaps a JS raytracer isn't going to help us optimize string performance…) and gather 'em up.

Adds function for encoding flags along with strings.

6fe33a9

tcr mentioned this pull request Oct 7, 2014

Strings now exposed externally as array of UCS-2 codepoints #542

Merged

tcr changed the title ~~Adds function for encoding flags along with strings.~~ [NRY] Adds function for encoding flags along with strings. Oct 10, 2014

tcr force-pushed the master branch from dfae813 to 814e161 Compare October 10, 2014 17:47

natevw mentioned this pull request Oct 10, 2014

Optimize string access/iteration #559

Open

tcr changed the title ~~[NRY] Adds function for encoding flags along with strings.~~ [WIP] Adds function for encoding flags along with strings. Oct 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Adds function for encoding flags along with strings. #543

[WIP] Adds function for encoding flags along with strings. #543

Uh oh!

tcr commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

tcr commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

kevinmehall commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

tcr commented Oct 10, 2014

Uh oh!

natevw commented Oct 10, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WIP] Adds function for encoding flags along with strings. #543

Are you sure you want to change the base?

[WIP] Adds function for encoding flags along with strings. #543

Uh oh!

Conversation

tcr commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

tcr commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

kevinmehall commented Oct 7, 2014

Uh oh!

natevw commented Oct 7, 2014

Uh oh!

tcr commented Oct 10, 2014

Uh oh!

natevw commented Oct 10, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants