ComfyUI ExLlamaV2 Nodes

A simple text generator for ComfyUI utilizing ExLlamaV2.

Installation

Navigate to the root ComfyUI directory and clone the repository to custom_nodes:

git clone https://github.com/Zuellni/ComfyUI-ExLlama-Nodes custom_nodes/ComfyUI-ExLlamaV2-Nodes

Install the requirements depending on your system:

pip install -r custom_nodes/ComfyUI-ExLlamaV2-Nodes/requirements-VERSION.txt

requirements-no-wheels.txt	ExLlamaV2 and FlashAttention, no wheels.
requirements-torch-21.txt	Windows wheels for Python 3.11, Torch 2.1, CUDA 12.1.
requirements-torch-22.txt	Windows wheels for Python 3.11, Torch 2.2, CUDA 12.1.

Check what version you need with:

python -c "import platform; import torch; print(f'Python {platform.python_version()}, Torch {torch.__version__}, CUDA {torch.version.cuda}')"

Caution

If none of the wheels work for you or there are any ExLlamaV2-related errors while the nodes are loading, try to install it manually following the official instructions. Keep in mind that wheels >= 0.0.13 require Torch 2.2.

Usage

Only EXL2 and 4-bit GPTQ models are supported. You can find a lot of them on Hugging Face. Refer to the model card in each repository for details about quant differences and instruction formats.

To use a model with the nodes, you should clone its repository with git or manually download all the files and place them in models/llm. For example, if you'd like to download Mistral-7B, use the following command:

git clone https://huggingface.co/LoneStriker/Mistral-7B-Instruct-v0.2-5.0bpw-h6-exl2-2 models/llm/mistral-7b-exl2-b5

Tip

You can add your own llm path to the extra_model_paths.yaml file and place the models there instead.

Nodes

Loader	Loads models from the `llm` directory.
	gpu_split	Comma-separated VRAM in GB per GPU, eg `6.9, 8`.
	cache_8bit	Lower VRAM usage but also lower speed.
	max_seq_len	Max context, higher number equals higher VRAM usage. `0` will default to config.
Generator	Generates text based on the given prompt. Refer to text-generation-webui for parameters.
	unload	Unloads the model after each generation.
	single_line	Stops the generation on newline.
	max_tokens	Max new tokens, `0` will use available context.
Preview	Displays generated text in the UI.
Replace	Replaces variable names enclosed in brackets, eg `[a]`, with their values.

Workflow

The example workflow is embedded in the image below and can be opened in ComfyUI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ComfyUI ExLlamaV2 Nodes

Installation

Usage

Nodes

Workflow

Files

README.md

Latest commit

History

README.md

File metadata and controls

ComfyUI ExLlamaV2 Nodes

Installation

Usage

Nodes

Workflow