Skip to content

Conversation

@tcr
Copy link
Member

@tcr tcr commented Oct 7, 2014

This does not work on the JIT branch (yet).

@natevw
Copy link
Contributor

natevw commented Oct 7, 2014

Heh, nice. I could use this to store one or both of these flags:

  • is_ascii (fast-path for tm_str_lookup_*, fast-path for tm_str_to_utf8)
  • is_utf8 (fast-path for tm_str_to_utf8)

Not sure how/if I could use the other 6 or 7 bits to speed up tm_str_lookup_* in the general case though. For that I'd like a GC-associated userdata value where I can store the offset of each multibyte sequence, then I only have to iterate through that rather than the original string. Will that be a possibility later, under LuaJIT?

If a byte is all we'll ever get, perhaps the bits could be used something like this:

Bit Meaning if set
7 Whole string is UTF-8 (but not ASCII)
6 First half of string is ASCII
5 First half of second half of string is ASCII
4 First half of second half of second half of string is ASCII
3 …turtles…
0 Zeno's paradox is ASCII

I might have the recursive division swapped from what would be most useful (trying to especially speed up iteration through/past the beginning of the string as that would be the most visited portion in a for loop) or maybe just a simple proportional division, or maybe trying to use this bitfield will make the code a horrid mess and there's better uses…

Another thing nice to directly optimize would be str.length, but not sure how valuable a 6-bit attempt at that would be without also optimizing the lookup. (How often is str.length used on its own?)

And while I'm hijacking this PR anyway for random rambling: we need to somehow come up with real-worldish benchmarks before any optimization! I tried one optimization, that could save a bunch of CPU and avoid a malloc/free in some cases, and it slowed down the test suite a little, so I backed it out. Perhaps optimizations made possible by this will also slow the test suite down…how will we judge if they've actually helped real world performance in [what preferred?] use cases.

@natevw
Copy link
Contributor

natevw commented Oct 7, 2014

Oooh, another [roughly drafted] idea for the byte:

Value Meaning
0 Must be ASCII
1–254 Lua length - this = JS length
255 goto slow path

…or perhaps steal the msb to flag "has supplementary characters, i.e. can't treat this CESU-8 as valid UTF-8" and allow the Lua length vs. JS length to differ only by 127 before slow path.

This could optimize str.length if "ASCII text with a few BMP characters sprinkled here and there" is a common case, but doesn't optimize the actual lookup much most of the time. It's more appealing than trying to mark regions of the string, but maybe code simplicity is not as valuable as speeding up iteration through large strings…BENCHMARKS!

@tcr
Copy link
Member Author

tcr commented Oct 7, 2014

Let's use the flag for classification for now. I can extend this to, frankly, as many bits as necessary; but one byte lets me not patch colony-lua before I'm ready to.

I'm unsold by the need for caching arbitrary string character access, and mostly put off by the memory constraints. (Unsold of the short-term need, rather; arbitrary character access probably has bigger pitfalls in __index in the short term). For this particular piece of the puzzle, let's assume it will come in the next string-related PR.

I can get to work on storing ucs2_length (and hell, utf8_length?) directly in the struct as 32-bit ints. Those has the largest and most obvious benefit, and I think are straightforward.

Performance test suites are needed. I would like to emulate something like Rusts' where they actually calculate speed tradeoffs overtime (they even commit guard against changes). Right now where so much functionality is being added I'm weary to enforce it, obviously, but if the JIT component ever happens (making progress...) then it's almost imperative that there exist benchmarking of memory and code use. Right now, maybe rudimentary tests for runtime that we can just tack as evidence to PRs? process.hrtime() and funning code is the best case right now.

@natevw
Copy link
Contributor

natevw commented Oct 7, 2014

Another random thought, I promise I am trying to close this tab and move on.

We could crowdsource the benchmark! Ship the unoptimized lookup code and see who complains 😜

@kevinmehall
Copy link
Member

If we determine we want caching for access by index, I like the simplicity of @raffecat's idea from #137 (comment) -- cache the (JS index, byte pos) of the last lookup, and search from there. I'd imagine that most string indexing is going to be largely sequential, so we don't need a large cache (and another memory allocation) for a full table for every multibyte character.

@natevw
Copy link
Contributor

natevw commented Oct 7, 2014

Ah, yes, thanks for the reminder — that's a much better idea than odd bitfield tricks for iteration. It might be worth extending it to also cover the case where people are accessing str.length each loop iteration, but seems simple enoug and probably covers a lot of cases.

So basically the best approach might be something like:

  • use this spare byte to tag broad things like ASCII/UTF-8 opportunities (I wonder how often the flag would be _re_used after first calculation though…)
  • use static/global vars (I'm assuming C code need not be re-entrant?) to note the string hash (or mere pointer?) and basic state needed for quicker re-entry at the next offset

@tcr
Copy link
Member Author

tcr commented Oct 10, 2014

In all fairness, with compiler defines we can probably attach an arbitrary amount of state to each string. Let me ensure this will be future-proof with the JIT branch and I'll update this branch.

This branch should also be rebased over the tcr-utf8 branch so those changes can be made.

@tcr tcr changed the title Adds function for encoding flags along with strings. [NRY] Adds function for encoding flags along with strings. Oct 10, 2014
@natevw
Copy link
Contributor

natevw commented Oct 10, 2014

Before you go changin' lets wait to at least stub out a start on this and see what we need. Changing the byte to a size_t (or a byte and a size_t, or two…oh boy!) would reduce the need for static vars (although wouldn't necessarily be re-entrant yet).

The Objective-C runtime guy gave me some good advice on benchmarking, btw. We should keep an eye out for libraries that benchmark themselves (in appropriate ways, perhaps a JS raytracer isn't going to help us optimize string performance…) and gather 'em up.

@tcr tcr changed the title [NRY] Adds function for encoding flags along with strings. [WIP] Adds function for encoding flags along with strings. Oct 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants