Skip to content

arun-goud/Volapuk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Volapuk

Volapuk (pronounced Vo La Puke) is a convenient Docker image that provides a ready-made development environment to experiment with Large Language Models (LLMs) and fine-tune Llama-style LLMs locally on a Windows PC equipped with an Nvidia RTX GeForce 30 series GPU.

The name Volapuk is a homage to the world's first auxlang. Partly inspired by another LLM fine-tuning environment Kolo, Volapuk is composed of and named after the most popular tools being used, as of 2025, for serving and fine-tuning LLMs locally:

  • V - vLLM, a high performance engine for serving LLMs and interacting with LLMs
  • O - Ollama for serving and interacting with LLMs
  • L - llama.cpp and its python binding llama-cpp-python for serving, interacting with GGUF model format LLMs and for quantizing, converting model checkpoints to GGUF format
  • A - Ampere microarchitecture Nvidia GPU (GeForce RTX 30 series), such as, RTX 3050/3060/3070/3080/3090 with CUDA compute capability = 8.6
  • P - PyTorch, a Deep Learning library and Machine Learning (ML) framework that vLLM and Unsloth rely upon
  • U - Unsloth, a VRAM-efficient, opinionated LLM fine-tuning framework
  • K - KoboldCpp, a local UI front-end for serving and chatting with LLMs

In addition, Volapuk includes:

  • Jupyter Lab - An interactive python notebook environment
  • Synthetic Data Kit - A library from Meta to generate synthetic Q&A pairs to be used as training dataset
  • uv - A Rust-based python package resolver, installer and virtual environment manager
  • CUDA 12.8 - Toolkit containing CUDA libraries, runtime and nvcc (Nvidia C Compiler)

Recommended System Requirements

  • Operating System: Windows 11 with WSL2 running Ubuntu >= 20.04 LTS
  • Containerization Software: Docker Desktop for Windows
  • Graphics Card: Nvidia GeForce RTX 30* GPU with at least 12GB of VRAM and driver supporting CUDA 12.8 or later
  • Memory: 16GB or more of system RAM
  • Storage: 100GB or more

Getting Started

1️⃣ Install Dependencies

Ensure WSL 2 is installed.

Ensure Ubuntu 20.04 LTS or later has been installed as the distro in WSL2.

Ensure Docker Desktop is installed. Go to Docker Desktop's Settings → Resources → WSL integration → Check the option Enable Integration with WSL distro.

Ensure Nvidia GeForce Game Ready Driver is installed.

2️⃣ Check CUDA Version Supported by Driver

Launch WSL2 Ubuntu terminal, run the following command and verify that the CUDA version reported is >= 12.8.


$ nvidia-smi
Fri Nov 21 15:22:16 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.02              Driver Version: 581.42         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        On  |   00000000:01:00.0  On |                  N/A |
| 36%   28C    P8              7W /  170W |     627MiB /  12288MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              23      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+

3️⃣ Build the Docker Image

To build the Docker image, run:


$ ./build_volapuk.sh
[+] Building 2496.7s (16/16) FINISHED                                                                    
.....
.....

Check that the Docker image has been generated:


$ docker image ls
REPOSITORY   TAG       IMAGE ID       CREATED       SIZE
volapuk      latest    e520e5ee3300   4 hours ago   38.8GB

4️⃣ Run the Docker Container

To start the container, run:


$ ./run_volapuk.sh

Within the current work directory in WSL2 Ubuntu, where the above command is run, the subdirectory named workspace will be mounted as /workspace in the container.

Application

🤖💬 A Llama chatbot trained on Tiny Tapeout SKY 25a datasheet

1️⃣ Jupyter notebook approach

Launch the demo Jupyter notebook TT_finetune_Llama3_2_1B.ipynb by running the below command and opening http://127.0.0.1:8888/lab in a web browser:


$ ./run_jupyter.sh

Step through the cells in the Jupyter notebook which will:

  • Download Llama-3.2-1B Instruct model
  • Generate synthetic data from Tiny Tapeout SKY 25a shuttle's datasheet
  • Fine-tune the instruct model by training it on the synthetic data
  • Run inference using the fine-tuned model
  • Save the fine-tuned model

The resulting Llama 3.2 1B LLM chatbot will be able to answer questions pertaining to Tiny Tapeout SKY 25a shuttle's datasheet.

2️⃣ Python script approach

Once the synthetic data has been generated fine-tune an LLM model on that data non-interactively using the below command:


$ ./run_unsloth_lora_peft.py

Testing LLM inference engines inside Docker container

The below tests will load Llama 3.2 1B Instruct model and run inference using respective engines.

1️⃣ Check using llama.cpp


$ ./tests/llamacpp_inference.py

2️⃣ Check using vLLM


$ ./tests/vllm_inference.py

3️⃣ Check using Ollama

Open a new terminal in WSL2 Ubuntu and run the below command which will connect to the running Docker container and start ollama server in it:


$ workspace/tests/ollama_run_server.sh

In the original terminal where the Docker container's shell is accessible, run the below command to push an LLM model to the ollama server and chat with it:


$ ./tests/ollama_chat.sh

Use /exit to exit the chat and return to the shell prompt.

4️⃣ Check using KoboldCpp

Run the below command and open http://localhost:5001 in a web browser to chat using the KoboldCpp UI


$ ./tests/koboldcpp_inference.sh

Known Issues

When saving the trained model to GGUF for the first time, Unsloth will clone llama.cpp repo into the current work dir, build it and utilize the resulting llama-quantize binary along with the model conversion python scripts to perform GGUF conversion. If this step errors out or freezes then manually execute the below commands to build the llama-quantize binary:

$ MAX_JOBS=6
$ CUDA_ARCH="86"
$ cd llama.cpp
$ cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH}
$ LD_LIBRARY_PATH="/usr/local/cuda/compat:$LD_LIBRARY_PATH" cmake --build build --config Release -j ${MAX_JOBS}
$ ln -sf build/bin/llama-quantize llama-quantize
$ cd ..

Then rerun the LLM fine-tuning script and GGUF conversion will finish without erroring out.

About

Docker environment for fine-tuning LLMs on Windows PC locally

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors