Skip to content

Latest commit

 

History

History
103 lines (93 loc) · 3.96 KB

README.md

File metadata and controls

103 lines (93 loc) · 3.96 KB

ComfyUI ExLlamaV2 Nodes

A simple text generator for ComfyUI utilizing ExLlamaV2.

Installation

Navigate to the root ComfyUI directory and clone the repository to custom_nodes:

git clone https://github.com/Zuellni/ComfyUI-ExLlama-Nodes custom_nodes/ComfyUI-ExLlamaV2-Nodes

Install the requirements depending on your system:

pip install -r custom_nodes/ComfyUI-ExLlamaV2-Nodes/requirements-VERSION.txt
requirements-no-wheels.txt ExLlamaV2 and FlashAttention, no wheels.
requirements-torch-21.txt Windows wheels for Python 3.11, Torch 2.1, CUDA 12.1.
requirements-torch-22.txt Windows wheels for Python 3.11, Torch 2.2, CUDA 12.1.

Check what version you need with:

python -c "import platform; import torch; print(f'Python {platform.python_version()}, Torch {torch.__version__}, CUDA {torch.version.cuda}')"

Caution

If none of the wheels work for you or there are any ExLlamaV2-related errors while the nodes are loading, try to install it manually following the official instructions. Keep in mind that wheels >= 0.0.13 require Torch 2.2.

Usage

Only EXL2 and 4-bit GPTQ models are supported. You can find a lot of them on Hugging Face. Refer to the model card in each repository for details about quant differences and instruction formats.

To use a model with the nodes, you should clone its repository with git or manually download all the files and place them in models/llm. For example, if you'd like to download Mistral-7B, use the following command:

git clone https://huggingface.co/LoneStriker/Mistral-7B-Instruct-v0.2-5.0bpw-h6-exl2-2 models/llm/mistral-7b-exl2-b5

Tip

You can add your own llm path to the extra_model_paths.yaml file and place the models there instead.

Nodes

Loader Loads models from the llm directory.
gpu_split Comma-separated VRAM in GB per GPU, eg 6.9, 8.
cache_8bit Lower VRAM usage but also lower speed.
max_seq_len Max context, higher number equals higher VRAM usage. 0 will default to config.
Generator Generates text based on the given prompt. Refer to text-generation-webui for parameters.
unload Unloads the model after each generation.
single_line Stops the generation on newline.
max_tokens Max new tokens, 0 will use available context.
Preview Displays generated text in the UI.
Replace Replaces variable names enclosed in brackets, eg [a], with their values.

Workflow

The example workflow is embedded in the image below and can be opened in ComfyUI.

workflow