Skip to content

lilesper/ComfyUI-LLM-Nodes

 
 

Repository files navigation

ComfyUI ExLlamaV2 Nodes

A simple text generator for ComfyUI utilizing ExLlamaV2.

Installation

Navigate to the root ComfyUI directory and clone the repository to custom_nodes:

git clone https://github.com/Zuellni/ComfyUI-ExLlama-Nodes custom_nodes/ComfyUI-ExLlamaV2-Nodes

Install the requirements depending on your system:

pip install -r custom_nodes/ComfyUI-ExLlamaV2-Nodes/requirements-VERSION.txt
requirements-no-wheels.txt ExLlamaV2 and FlashAttention, no wheels.
requirements-torch-21.txt Windows wheels for Python 3.11, Torch 2.1, CUDA 12.1.
requirements-torch-22.txt Windows wheels for Python 3.11, Torch 2.2, CUDA 12.1.

Check what version you need with:

python -c "import platform; import torch; print(f'Python {platform.python_version()}, Torch {torch.__version__}, CUDA {torch.version.cuda}')"

Caution

If none of the wheels work for you or there are any ExLlamaV2-related errors while the nodes are loading, try to install it manually following the official instructions. Keep in mind that wheels >= 0.0.13 require Torch 2.2.

Usage

Only EXL2 and 4-bit GPTQ models are supported. You can find a lot of them on Hugging Face. Refer to the model card in each repository for details about quant differences and instruction formats.

To use a model with the nodes, you should clone its repository with git or manually download all the files and place them in models/llm. For example, if you'd like to download Mistral-7B, use the following command:

git clone https://huggingface.co/LoneStriker/Mistral-7B-Instruct-v0.2-5.0bpw-h6-exl2-2 models/llm/mistral-7b-exl2-b5

Tip

You can add your own llm path to the extra_model_paths.yaml file and place the models there instead.

Nodes

Loader Loads models from the llm directory.
gpu_split Comma-separated VRAM in GB per GPU, eg 6.9, 8.
cache_8bit Lower VRAM usage but also lower speed.
max_seq_len Max context, higher number equals higher VRAM usage. 0 will default to config.
Generator Generates text based on the given prompt. Refer to text-generation-webui for parameters.
unload Unloads the model after each generation.
single_line Stops the generation on newline.
max_tokens Max new tokens, 0 will use available context.
Preview Displays generated text in the UI.
Replace Replaces variable names enclosed in brackets, eg [a], with their values.

Workflow

The example workflow is embedded in the image below and can be opened in ComfyUI.

workflow

About

LLM nodes for ComfyUI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.0%
  • JavaScript 10.0%