-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xlm-roberta and Mistral-7B take significant amounts of memory during compilation #2821
Comments
cjvolzka
changed the title
Models take significant amounts of memory to compile
xlm-roberta and Mistral-7B take significant amounts of memory during compilation
May 9, 2024
@cjvolzka How can we get onnx model for |
@imaihal Sorry, I missed your question. Below is how I generated the Mistral onnx model. Notes:
|
imaihal
added a commit
that referenced
this issue
Jul 22, 2024
* Write a constant value to single file without buffering to remove spikes in memory consumption. This PR solves an issue of memory consumption reported in #2821 We found that there is a spike in memory consumption when writing a constant into a file (model.constants.bin). This is because all constants into a buffer once and write it to a file at once. This PR changes to write the constant to the file without the buffering. This removes the spike in memory consumption. --------- Signed-off-by: Haruki Imai <imaihal@jp.ibm.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While compiling models like HuggingFace protectai/xlm-roberta-base-language-detection-onnx or mistralai/Mistral-7B-v0.1 I notice we take significantly larger amounts of memory than the entire model size during compiling.
For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by
onnx-mlir
,opt
andllc
compiling with--O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT
.The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with
--O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT
Is there anything that can be done to reduce the compile time memory required for these kind of models?
The text was updated successfully, but these errors were encountered: