Optimizing 14GB Model on 4GB VRAM#16
Optimizing 14GB Model on 4GB VRAM#16agentifyanchor wants to merge 2 commits intoFlashLabs-AI-Corp:mainfrom
Conversation
|
Hi @agentifyanchor, thanks for this excellent contribution! 🎉 torch >= 2.7.0 If you encounter any issues with transformers 5.0.0, please let us know the specific errors so we can work on compatibility fixes together. |
updated requirements to reflect these tested versions
agentifyanchor
left a comment
There was a problem hiding this comment.
updated requirements.txt to reflect these tested versions
Hi @kaishen-Dotc, thank you for the feedback! I have verified the compatibility with transformers 5.0.0 as requested. I run a test using those versions:
The model loads correctly with 4-bit quantization and inference works. I ran the full loop (ASR -> Text Generation -> Audio Generation). I’ve also updated the requirements.txt in the PR to reflect the tested versions. |

Proposal: Low-VRAM Inference Script (4GB GPU Support)
Hi everyone! 👋
I managed to run Chroma-4B successfully on a consumer laptop GPU (RTX 3050 Ti 4GB) using 4-bit quantization (
bitsandbytes) and careful memory offloading.Performance:
I realized many developers might be struggling with the 14GB VRAM requirement, so I created a clean, minimal "Walkie-Talkie" script to demonstrate how to run this locally.
Included Files:
local_voice_chat.py: A clean, light script for "Talk & Listen" interaction.local_voice_chat_with_telemetry.py: Adds performance metrics (TF, RTF, Input/Output Latency).I would love to contribute these as examples under
local_runto help the community access this amazing model on lower-end hardware.Best regards,
Ilyes .M
A Fan of Chroma! 🚀