Finetuning an LLM (7B Billion parameters or less) with Mac M4 Mini #1647
-
I am an LLM Developer. I wanted to buy either an Mac 2 Air (16 GB) or Mac M4 Mini (16 GB). I want to finetune small sized LLM with QLORA for experiment or research purposes. as they both have 16 GB GPU I understand these can devices can train and run these models. I was more interested about the speed. Both the speed of finetuning and the speed of inference. Can anyone give me a suggestion on which device I should buy and what kind of improvement (in terms of speed) I can accept from the Mac M4 mini? [P.S: I have never used any apple products before. So any review on how these devices really work in terms of finetuning LLM's would be really amazing to know or talk about ] |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
You should be able to QLoRA a 7B on either machine. The speed depends on several factors including:
In terms of inference speed for the base M4 Mini you can expect ~25 toks/sec or more with a 4-bit 7B and perhaps 15-20 for the M2 Air. |
Beta Was this translation helpful? Give feedback.
I would look around on the internet for an estimate of fp32 / fp16 FLOPs for the machines you are interested in. The speed difference for LoRA / QLoRA training tends to follow the difference in peak flops since it's a very compute bound workflow.