Skip to content

[file_packager] Split data files when file size exceeds ArrayBuffer limit#24802

Open
arsnyder16 wants to merge 32 commits intoemscripten-core:mainfrom
arsnyder16:asnyder/split_large_packages
Open

[file_packager] Split data files when file size exceeds ArrayBuffer limit#24802
arsnyder16 wants to merge 32 commits intoemscripten-core:mainfrom
arsnyder16:asnyder/split_large_packages

Conversation

@arsnyder16
Copy link
Copy Markdown
Contributor

@arsnyder16 arsnyder16 commented Jul 29, 2025

Currently ArrayBuffer cannot be larger than 2046 on chrome. So bundling a large amount of files using file_packager.py the .data will not be allowed to load. Breaking up files into multiple data files by passes this issue.

error while handling : http://localhost:3040/Gtest.data Error: Unexpected error while handling : http://localhost:3040/Gtest.data 
RangeError: Array buffer allocation failed

Note this was a solution proposed for issues described in this thread #24691

@arsnyder16
Copy link
Copy Markdown
Contributor Author

arsnyder16 commented Jul 29, 2025

@sbc100 Is this aligned with what you were thinking?

Copy link
Copy Markdown
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for this? What are the specific limits you are running into? Are those limits not fixed? i.e. does it need to be configurable?

@arsnyder16
Copy link
Copy Markdown
Contributor Author

Can we add some tests for this? What are the specific limits you are running into? Are those limits not fixed? i.e. does it need to be configurable?

I wanted to make sure i was on the right track before adding tests.
The limits are described in conversation in the issue i mentioned. We can hard code the limit to 2Gi as a likely guess at a true limit. If that is what you prefer its simpler from testing perspective . Seemed more flexible to make it configurable to anyone calling the utility, but i get the simplicity side of it

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented Jul 29, 2025

This does seem like a reasonable approach.

I'm still a little fuzzy on exactly why and when this might be needed in the real world, I think I need to go re-read our original discussion, but I also think including more information in the PR (i.e. in the description, or in comments) would be good.

@arsnyder16
Copy link
Copy Markdown
Contributor Author

This does seem like a reasonable approach.

I'm still a little fuzzy on exactly why and when this might be needed in the real world, I think I need to go re-read our original discussion, but I also think including more information in the PR (i.e. in the description, or in comments) would be good.

Updated the description, based on that if you want me to hard code the limit into the package i can take that approach

@arsnyder16 arsnyder16 changed the title [file_packager] split data files when files exceeds configured limit [file_packager] split data files when file size exceeds 2Gi ArrayBuffer limit Jul 31, 2025
@arsnyder16
Copy link
Copy Markdown
Contributor Author

@sbc100 Going to look at adding tests for this. Do you have an recommendations? Would you like me to generate temporary files (s) so that i can reach the 2Gi limit?

@arsnyder16 arsnyder16 marked this pull request as ready for review August 10, 2025 18:48
@arsnyder16
Copy link
Copy Markdown
Contributor Author

@sbc100 This should be ready for review, not sure about the test failure if they are flaky or if its related to my change. I don't see them fail locally.

@juj
Copy link
Copy Markdown
Collaborator

juj commented Aug 18, 2025

Peculiarly I find that in Chrome, the max ArrayBuffer limit is not 2GB (2048MB), but 2046MB:

image

Firefox supports up to 16GB ArrayBuffers:

image

In node.js I get up to 4GB ArrayBuffers:

image

What is your target environment?

@arsnyder16
Copy link
Copy Markdown
Contributor Author

@juj Thanks! My target is chrome, i incorrectly associated in the ArrayBuffer documentation [Number.MAX_SAFE_INTEGER](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer/ArrayBuffer#exceptions) to MAX_INT`. I didn't specifically test the exact limit, rather was just trying to follow the documentation but clearly misinterpreted it.

Looks like Chrome has the lowest limit, i will change to use that limit.

Safari appears to be 4Gi
image

@arsnyder16 arsnyder16 changed the title [file_packager] split data files when file size exceeds 2Gi ArrayBuffer limit [file_packager] split data files when file size exceeds ArrayBuffer limit Aug 22, 2025
@arsnyder16
Copy link
Copy Markdown
Contributor Author

@juj @sbc100 Anything else on this?

proc = self.run_process([FILE_PACKAGER, 'test.data', '--preload', 'huge.dat'], check=False, stdout=PIPE, stderr=PIPE)
self.assertEqual(proc.returncode, 1)
self.assertContained('error: cannot package file greater than 2046 MB does not exist', proc.stderr)
self.clear()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the above tests verify that file packager does split up files, though there is no functional test to verify that e.g. the bytes were split appropriately, and that the final end result loads up properly? Would that be important to test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how critical it is, looks like all the file_packager logic is part of these unit tests, seems like out of scope for this PR to add tests that verify the loading of the file_packager results

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many tests that verify load of file packager results. Basically any test that uses emcc with --preload-file is verifying this.

There are also tests that call FILE_PACKGER directly and then execute the result. See test_file_packager_separate_metadata for example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbc100 Thanks for pointing out these decks, test_file_packager_separate_metadata was a good example to follow to test this logic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbc100 @juj I updated to verify the package(s) load but on CircleCI they seem to be crashing immediately. I suspect OOMKilled. They run fine for me locally

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juj @sbc100 Any guidance here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juj @sbc100 Could use some guidance here to get this through the finish line, now that i addressed the feedback to actually load the results in these tests they cause the circleci processes to run out of memory.

@sbc100 sbc100 changed the title [file_packager] split data files when file size exceeds ArrayBuffer limit [file_packager] Split data files when file size exceeds ArrayBuffer limit Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants