Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned bit arrays on the JavaScript target #3946

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

richard-viney
Copy link
Contributor

@richard-viney richard-viney commented Dec 3, 2024

Summary 📘

This PR adds support for unaligned bit arrays on the JavaScript target. Specifically:

In expressions:

  • Arbitrary sized integer segments:
    <<1:4>>
    <<12:9-little>>
    <<12:29-big>>
    <<1234:100-little>>
  • Arbitrary sized bits segments:
    <<<<0xABCD:15>>:bits-10>>

In patterns:

  • Arbitrary sized int segments:
    let assert <<_:7, i:19-little-signed>> = <<0xABCDEF12:26>>
  • Sized and unsized bits segments:
    let assert <<_:7, a:bits-3, b:size(14)-bits, c:bits>> = <<0xABCDEF:24, 0x1234:16>>

There is a warning if the above features are used when gleam.toml specifies a version < v1.7.0.


Implementation Details 🛠️

  • The BitArray class in the prelude now has both bitSize and byteSize fields.
  • The value of any unused low bits in the final byte are undefined. They will be zero in many common use cases, but making them undefined allows for some additional optimisations when slicing.
  • The BitArray class in the prelude has been reworked in a few ways:
    • Public API for use in FFI code is now: get rawBuffer(), get bitSize(), get byteSize(), get isWholeBytes(), byteAt().
    • Deprecated APIs that are frequently used by existing FFI code: get buffer(), get length(). Using these emits a deprecation warning at runtime.
    • Various internal APIs have been removed/replaced.
    • JSDoc annotations have been added to all functions allowing type-checking by adding // @ts-check to the top of the file.
    • BitArray.sliceToInt() has internal variants for aligned and unaligned access, as well as variants for both number and BigInt. The number variant is used when the size is <= 53 bits.BigInt is typically 5-10x slower, hence the decision to support both paths.

Implications for @external JavaScript code 🌍

  • Existing JavaScript FFI code that operates on bit arrays needs to be updated. Until this is done such code will emit deprecation warnings at runtime due to use of deprecated BitArray.length and BitArray.buffer APIs.
  • If such code is called with an unaligned bit array it will round the bitSize up to a multiple of 8, and operate on the undefined low bits in the final byte, which will probably lead to the wrong output.
  • No existing code breaks because unaligned bit arrays on JavaScript weren't previously possible. Still, there could be code that is now valid on the JavaScript target, which wasn't valid previously, and which won't give the correct result for unaligned bit arrays.
  • I can make relevant updates to any affected packages to fix the deprecation warnings. Packages that only work with whole bytes should error in the case of an unaligned bit array.

Implications for gleam/stdlib 🤝

  • I have the updates for gleam/stdlib ready to go, mostly affecting gleam/bit_array. It can only be merged once this PR goes in as its tests don't run on Gleam 1.6.3. It may be necessary to run the new stdlib tests on nightly for a short period, with them segregated into their own file so they can be included/excluded depending on the active Gleam version. I'll sort that out once this PR makes it through review.
  • Future stdlib versions that support unaligned bit arrays on JavaScript will work fine on Gleam versions < 1.7.0, there are no compatibility concerns there.
  • We could print a warning if unaligned bit arrays are used on JavaScript and the package's stdlib version doesn't support them. Should we do this? If so, I'd prefer to implement it in a follow-up PR if that's ok.

Testing 🧪

There's certainly some complexity and tricky bitwise operations here, mostly in the JavaScript prelude. The following has been done to ensure correctness:

  • Many new tests added to language_tests.gleam, and test/javascript_prelude.
  • Every path and branch through BitArray.slice(), BitArray.sliceToInt(), BitArray.sliceToFloat() is covered by at least one test.
  • Extensive fuzzing has been performed on bit array construction, slicing, and slicing to ints and floats.
    • This validated millions of combinations of bit array contents, segment sizes, offsets, endianness, signedness, etc. on JavaScript against the result on the Erlang target.
    • Issues found by this testing were fixed and added to the language tests and prelude tests.

Limitations 🤔

The main limitation is that there is no allowance for unused high bits in the first byte of a bit array's buffer.

The motivation for allowing this would be to make bit array slices O(1) in all cases. Currently a slice is O(1) only if its start offset is byte-aligned (the end offset doesn't matter). If the start offset isn't byte-aligned then a slice is O(N) due to requiring a copy.

This makes the following O(N²) on JavaScript, but O(N) on Erlang:

pub fn print_bits(bits: BitArray) -> Nil {
  case bits {
    <<b:1, rest:bits>> -> {
      b |> int.to_string |> io.print
      print_bits(rest)
    }
    _ -> io.println("")
  }
}

This could be addressed at a later date, albeit with another round of impact on JavaScript FFI code that would need updating. So maybe it's better to bite the bullet now? Or maybe it's not important enough to warrant the additional complexity. There's also a reasonably good chance that any folks affected by this would be able to rework their code to avoid the performance issue (if they realise what the problem is).


✨✨✨

@richard-viney richard-viney marked this pull request as ready for review December 3, 2024 12:22
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from 2496b81 to bb69dd9 Compare December 5, 2024 01:45
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from b347126 to 9aa5446 Compare December 13, 2024 09:32
Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, what a fantastic bit of work! Thank you!

I've not digested these changes properly yet, but I have some initial questions from my first review.

There's a lot of public functions in the API of the bit array class now, but generally the aim is to have none. What is the motivation for having these?

Similar for the deprecated functions, we can remove them.

There's quite a lot of code in the class. Could we move them to free-standing functions and have the generated code only use them if absolutely necessary? That would help with JavaScript bundlers performing dead code elimination.

Some existing tests have been removed or changed, why is this? New features shouldn't alter existing tests, it makes it harder to review and to prevent regressions.

Thanks again!

compiler-core/src/javascript/expression.rs Outdated Show resolved Hide resolved
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 6 times, most recently from 544c327 to 0f6fbdb Compare December 23, 2024 00:38
@richard-viney
Copy link
Contributor Author

Thanks for taking a look!

There's a lot of public functions in the API of the bit array class now, but generally the aim is to have none. What is the motivation for having these?

These have been reduced down to the following: get rawBuffer(), get bitSize(), get byteSize(), get isWholeBytes(), and byteAt().

byteAt() could potentially be moved to a free function too, but was pre-existing and I've seen it use in JS FFI code, so it's still there for now and I haven't deprecated it.

Similar for the deprecated functions, we can remove them.

I've removed all except the BitArray.length and BitArray.buffer` accessors. Removing these would break most/all JS FFI code that operates on BitArrays, so they currently emit a deprecation warning at runtime if they're used.

There's quite a lot of code in the class. Could we move them to free-standing functions and have the generated code only use them if absolutely necessary? That would help with JavaScript bundlers performing dead code elimination.

Done.

Some existing tests have been removed or changed, why is this? New features shouldn't alter existing tests, it makes it harder to review and to prevent regressions.

The diff on the tests looked a bit more complex in places than it actually was so I've moved some things around to make it easier to digest. Some existing tests do need to be tweaked or removed, e.g. those that were testing for compilation errors if unaligned bit arrays were used on the JS target.

@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch from 0f6fbdb to 25d0be4 Compare December 23, 2024 00:44
@richard-viney
Copy link
Contributor Author

Also, if you could weigh in on the question at the end of the initial writeup about whether we should make bit array slices O(1) in all cases that would be helpful, because if that's a yes then more work is needed prior to being ready for final review.

@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch from 25d0be4 to f7034b6 Compare December 24, 2024 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants