[file_packager] Split data files when file size exceeds ArrayBuffer limit by arsnyder16 · Pull Request #24802 · emscripten-core/emscripten

arsnyder16 · 2025-07-29T18:01:36Z

Currently ArrayBuffer cannot be larger than 2046 on chrome. So bundling a large amount of files using file_packager.py the .data will not be allowed to load. Breaking up files into multiple data files by passes this issue.

error while handling : http://localhost:3040/Gtest.data Error: Unexpected error while handling : http://localhost:3040/Gtest.data 
RangeError: Array buffer allocation failed

Note this was a solution proposed for issues described in this thread #24691

arsnyder16 · 2025-07-29T18:02:07Z

@sbc100 Is this aligned with what you were thinking?

tools/file_packager.py

sbc100

Can we add some tests for this? What are the specific limits you are running into? Are those limits not fixed? i.e. does it need to be configurable?

arsnyder16 · 2025-07-29T18:21:18Z

Can we add some tests for this? What are the specific limits you are running into? Are those limits not fixed? i.e. does it need to be configurable?

I wanted to make sure i was on the right track before adding tests.
The limits are described in conversation in the issue i mentioned. We can hard code the limit to 2Gi as a likely guess at a true limit. If that is what you prefer its simpler from testing perspective . Seemed more flexible to make it configurable to anyone calling the utility, but i get the simplicity side of it

sbc100 · 2025-07-29T18:31:00Z

This does seem like a reasonable approach.

I'm still a little fuzzy on exactly why and when this might be needed in the real world, I think I need to go re-read our original discussion, but I also think including more information in the PR (i.e. in the description, or in comments) would be good.

arsnyder16 · 2025-07-29T18:44:01Z

This does seem like a reasonable approach.

I'm still a little fuzzy on exactly why and when this might be needed in the real world, I think I need to go re-read our original discussion, but I also think including more information in the PR (i.e. in the description, or in comments) would be good.

Updated the description, based on that if you want me to hard code the limit into the package i can take that approach

…ackages # Conflicts: # tools/file_packager.py

arsnyder16 · 2025-08-04T13:48:24Z

@sbc100 Going to look at adding tests for this. Do you have an recommendations? Would you like me to generate temporary files (s) so that i can reach the 2Gi limit?

arsnyder16 · 2025-08-11T12:20:06Z

@sbc100 This should be ready for review, not sure about the test failure if they are flaky or if its related to my change. I don't see them fail locally.

juj · 2025-08-18T20:35:58Z

Peculiarly I find that in Chrome, the max ArrayBuffer limit is not 2GB (2048MB), but 2046MB:

Firefox supports up to 16GB ArrayBuffers:

In node.js I get up to 4GB ArrayBuffers:

What is your target environment?

arsnyder16 · 2025-08-19T02:16:44Z

@juj Thanks! My target is chrome, i incorrectly associated in the ArrayBuffer documentation [Number.MAX_SAFE_INTEGER](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer/ArrayBuffer#exceptions) to MAX_INT`. I didn't specifically test the exact limit, rather was just trying to follow the documentation but clearly misinterpreted it.

Looks like Chrome has the lowest limit, i will change to use that limit.

Safari appears to be 4Gi

…ackages # Conflicts: # tools/file_packager.py

…nyder16/emscripten into asnyder/split_large_packages

…ackages # Conflicts: # tools/file_packager.py

arsnyder16 · 2025-09-17T18:48:26Z

@juj @sbc100 Anything else on this?

test/test_other.py

tools/file_packager.py

juj · 2025-09-17T21:41:19Z

test/test_other.py

+    proc = self.run_process([FILE_PACKAGER, 'test.data', '--preload', 'huge.dat'], check=False, stdout=PIPE, stderr=PIPE)
+    self.assertEqual(proc.returncode, 1)
+    self.assertContained('error: cannot package file greater than 2046 MB does not exist', proc.stderr)
+    self.clear()


It looks like the above tests verify that file packager does split up files, though there is no functional test to verify that e.g. the bytes were split appropriately, and that the final end result loads up properly? Would that be important to test?

I am not sure how critical it is, looks like all the file_packager logic is part of these unit tests, seems like out of scope for this PR to add tests that verify the loading of the file_packager results

There are many tests that verify load of file packager results. Basically any test that uses emcc with --preload-file is verifying this.

There are also tests that call FILE_PACKGER directly and then execute the result. See test_file_packager_separate_metadata for example.

@sbc100 Thanks for pointing out these decks, test_file_packager_separate_metadata was a good example to follow to test this logic

@sbc100 @juj I updated to verify the package(s) load but on CircleCI they seem to be crashing immediately. I suspect OOMKilled. They run fine for me locally

@juj @sbc100 Any guidance here?

@juj @sbc100 Could use some guidance here to get this through the finish line, now that i addressed the feedback to actually load the results in these tests they cause the circleci processes to run out of memory.

…ackages # Conflicts: # test/test_other.py # tools/file_packager.py

[file_packager] split data files when files exceeds configured limit

140d075

arsnyder16 added 2 commits July 29, 2025 14:06

comment

505d26a

whitespace

11ec397

sbc100 reviewed Jul 29, 2025

View reviewed changes

tools/file_packager.py Outdated Show resolved Hide resolved

tools/file_packager.py Outdated Show resolved Hide resolved

tools/file_packager.py Outdated Show resolved Hide resolved

sbc100 reviewed Jul 29, 2025

View reviewed changes

sbc100 mentioned this pull request Jul 29, 2025

Does the file packager need to simultanious embedding and preloading? #24803

Open

arsnyder16 added 5 commits July 31, 2025 12:24

static limit

d36ab45

Merge remote-tracking branch 'origin/main' into asnyder/split_large_p…

c2d0ba7

…ackages # Conflicts: # tools/file_packager.py

only if exclusively preload

a7ac251

cleanup

dfe6838

ruff

f73f613

arsnyder16 changed the title ~~[file_packager] split data files when files exceeds configured limit~~ [file_packager] split data files when file size exceeds 2Gi ArrayBuffer limit Jul 31, 2025

Merge branch 'emscripten-core:main' into asnyder/split_large_packages

5ad1003

arsnyder16 added 6 commits August 9, 2025 20:30

add tests

15d50ee

Merge branch 'main' into asnyder/split_large_packages

30421d5

rev

95b5f33

ruff

f495602

no needed

1daf7d4

remove debugging

bffa305

arsnyder16 marked this pull request as ready for review August 10, 2025 18:48

arsnyder16 added 3 commits August 18, 2025 08:21

Merge branch 'main' into asnyder/split_large_packages

e6217c0

test perf

c5d44a2

ruff

6d4fe7c

arsnyder16 added 3 commits August 18, 2025 22:56

2046 limit

b04d3ed

Merge branch 'main' into asnyder/split_large_packages

9ccb2ed

Merge branch 'main' into asnyder/split_large_packages

f1a88f1

arsnyder16 changed the title ~~[file_packager] split data files when file size exceeds 2Gi ArrayBuffer limit~~ [file_packager] split data files when file size exceeds ArrayBuffer limit Aug 22, 2025

arsnyder16 added 6 commits August 22, 2025 14:12

Merge remote-tracking branch 'origin/main' into asnyder/split_large_p…

e7be18a

…ackages # Conflicts: # tools/file_packager.py

Merge branch 'asnyder/split_large_packages' of https://github.com/ars…

9411cb6

…nyder16/emscripten into asnyder/split_large_packages

remove

4edef4d

Merge remote-tracking branch 'origin/main' into asnyder/split_large_p…

0e645d5

…ackages # Conflicts: # tools/file_packager.py

fix

04f04ad

try to reduce memory usage

7737645