Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821

Open
cjvolzka opened this issue May 9, 2024 · 2 comments

Comments

@cjvolzka
Copy link
Collaborator

cjvolzka commented May 9, 2024

While compiling models like HuggingFace protectai/xlm-roberta-base-language-detection-onnx or mistralai/Mistral-7B-v0.1 I notice we take significantly larger amounts of memory than the entire model size during compiling.

For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by onnx-mlir, opt and llc compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT.

The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT

Is there anything that can be done to reduce the compile time memory required for these kind of models?

@cjvolzka cjvolzka changed the title Models take significant amounts of memory to compile xlm-roberta and Mistral-7B take significant amounts of memory during compilation May 9, 2024
@imaihal
Copy link
Collaborator

imaihal commented May 10, 2024

@cjvolzka How can we get onnx model for Mistral-7B-v0.1 ?

@cjvolzka
Copy link
Collaborator Author

@imaihal Sorry, I missed your question. Below is how I generated the Mistral onnx model.

Notes:

  • I exported the model using my Mac as the tools don't support s390x. Afterward, I transferred the folder it created (with the onnx file and constants) to the s390x host to compile the model.
  • the huggingface-cli comand will ask a couple of questions:
pip install huggingface_cli optimum
huggingface-cli login
optimum-cli export onnx --model mistralai/Mistral-7B-v0.1 --framework pt --atol 0.001 --task text-generation Mistral-7B-v0.1-text-generation

imaihal added a commit that referenced this issue Jul 22, 2024
* Write a constant value to single file without buffering to remove spikes in memory consumption.

This PR solves an issue of memory consumption reported in #2821
We found that there is a spike in memory consumption when writing a constant into a file (model.constants.bin). This is because all constants into a buffer once and write it to a file at once. This PR changes to write the constant to the file without the buffering. This removes the spike in memory consumption.

---------

Signed-off-by: Haruki Imai <imaihal@jp.ibm.com>
Co-authored-by: Tung D. Le <tung@jp.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants