You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying llamafile on a ARM based AWS Graviton4, specifically a r8x.12large instance (48vCPU, 384GiB RAM).
Using Ollama llama3.1:70b I'm seeing much lower performance on llamafile compared to ollama.
For ollama I'm seeing 4.62 TPS with 4734% CPU usage measured via top, whereas for llamafile I'm seeing 3.7 TPS with 1987% CPU usage.
I've tried the -t argument with llamafile to increase the number of threads but there seems to be no measurable improvement that I can see.
Is this expected at the moment or is there other configuration options I should explore?
Version
llamafile v0.8.11
What operating system are you seeing the problem on?
Linux
Relevant log output
{"function":"print_timings","level":"INFO","line":327,"msg":"generation eval time = 88129.03 ms / 319 runs ( 276.27 ms per token, 3.62 tokens per second)","n_decoded":319,"n_tokens_second":3.619692707832883,"slot_id":0,"t_token":276.26654545454545,"t_token_generation":88129.028,"task_id":321,"tid":"34364393952","timestamp":1721802661}
The text was updated successfully, but these errors were encountered:
Contact Details
No response
What happened?
I'm trying llamafile on a ARM based AWS Graviton4, specifically a r8x.12large instance (48vCPU, 384GiB RAM).
Using Ollama llama3.1:70b I'm seeing much lower performance on llamafile compared to ollama.
For ollama I'm seeing 4.62 TPS with 4734% CPU usage measured via
top
, whereas for llamafile I'm seeing 3.7 TPS with 1987% CPU usage.I've tried the
-t
argument with llamafile to increase the number of threads but there seems to be no measurable improvement that I can see.Is this expected at the moment or is there other configuration options I should explore?
Version
llamafile v0.8.11
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: