Volapuk (pronounced Vo La Puke) is a convenient Docker image that provides a ready-made development environment to experiment with Large Language Models (LLMs) and fine-tune Llama-style LLMs locally on a Windows PC equipped with an Nvidia RTX GeForce 30 series GPU.
The name Volapuk is a homage to the world's first auxlang. Partly inspired by another LLM fine-tuning environment Kolo, Volapuk is composed of and named after the most popular tools being used, as of 2025, for serving and fine-tuning LLMs locally:
- V - vLLM, a high performance engine for serving LLMs and interacting with LLMs
- O - Ollama for serving and interacting with LLMs
- L - llama.cpp and its python binding llama-cpp-python for serving, interacting with GGUF model format LLMs and for quantizing, converting model checkpoints to GGUF format
- A - Ampere microarchitecture Nvidia GPU (GeForce RTX 30 series), such as, RTX 3050/3060/3070/3080/3090 with CUDA compute capability = 8.6
- P - PyTorch, a Deep Learning library and Machine Learning (ML) framework that vLLM and Unsloth rely upon
- U - Unsloth, a VRAM-efficient, opinionated LLM fine-tuning framework
- K - KoboldCpp, a local UI front-end for serving and chatting with LLMs
In addition, Volapuk includes:
- Jupyter Lab - An interactive python notebook environment
- Synthetic Data Kit - A library from Meta to generate synthetic Q&A pairs to be used as training dataset
- uv - A Rust-based python package resolver, installer and virtual environment manager
- CUDA 12.8 - Toolkit containing CUDA libraries, runtime and nvcc (Nvidia C Compiler)
- Operating System: Windows 11 with WSL2 running Ubuntu >= 20.04 LTS
- Containerization Software: Docker Desktop for Windows
- Graphics Card: Nvidia GeForce RTX 30* GPU with at least 12GB of VRAM and driver supporting CUDA 12.8 or later
- Memory: 16GB or more of system RAM
- Storage: 100GB or more
Ensure WSL 2 is installed.
Ensure Ubuntu 20.04 LTS or later has been installed as the distro in WSL2.
Ensure Docker Desktop is installed. Go to Docker Desktop's Settings → Resources → WSL integration → Check the option Enable Integration with WSL distro.
Ensure Nvidia GeForce Game Ready Driver is installed.
Launch WSL2 Ubuntu terminal, run the following command and verify that the CUDA version reported is >= 12.8.
$ nvidia-smi
Fri Nov 21 15:22:16 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.02 Driver Version: 581.42 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 36% 28C P8 7W / 170W | 627MiB / 12288MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 23 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
To build the Docker image, run:
$ ./build_volapuk.sh
[+] Building 2496.7s (16/16) FINISHED
.....
.....
Check that the Docker image has been generated:
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
volapuk latest e520e5ee3300 4 hours ago 38.8GB
To start the container, run:
$ ./run_volapuk.sh
Within the current work directory in WSL2 Ubuntu, where the above command is run, the subdirectory named workspace will be mounted as /workspace in the container.
🤖💬 A Llama chatbot trained on Tiny Tapeout SKY 25a datasheet
Launch the demo Jupyter notebook TT_finetune_Llama3_2_1B.ipynb by running the below command and opening http://127.0.0.1:8888/lab in a web browser:
$ ./run_jupyter.sh
Step through the cells in the Jupyter notebook which will:
- Download Llama-3.2-1B Instruct model
- Generate synthetic data from Tiny Tapeout SKY 25a shuttle's datasheet
- Fine-tune the instruct model by training it on the synthetic data
- Run inference using the fine-tuned model
- Save the fine-tuned model
The resulting Llama 3.2 1B LLM chatbot will be able to answer questions pertaining to Tiny Tapeout SKY 25a shuttle's datasheet.
Once the synthetic data has been generated fine-tune an LLM model on that data non-interactively using the below command:
$ ./run_unsloth_lora_peft.py
The below tests will load Llama 3.2 1B Instruct model and run inference using respective engines.
$ ./tests/llamacpp_inference.py
$ ./tests/vllm_inference.py
Open a new terminal in WSL2 Ubuntu and run the below command which will connect to the running Docker container and start ollama server in it:
$ workspace/tests/ollama_run_server.sh
In the original terminal where the Docker container's shell is accessible, run the below command to push an LLM model to the ollama server and chat with it:
$ ./tests/ollama_chat.sh
Use /exit to exit the chat and return to the shell prompt.
Run the below command and open http://localhost:5001 in a web browser to chat using the KoboldCpp UI
$ ./tests/koboldcpp_inference.sh
When saving the trained model to GGUF for the first time, Unsloth will clone llama.cpp repo into the current work dir, build it and utilize the resulting llama-quantize binary along with the model conversion python scripts to perform GGUF conversion.
If this step errors out or freezes then manually execute the below commands to build the llama-quantize binary:
$ MAX_JOBS=6
$ CUDA_ARCH="86"
$ cd llama.cpp
$ cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH}
$ LD_LIBRARY_PATH="/usr/local/cuda/compat:$LD_LIBRARY_PATH" cmake --build build --config Release -j ${MAX_JOBS}
$ ln -sf build/bin/llama-quantize llama-quantize
$ cd ..Then rerun the LLM fine-tuning script and GGUF conversion will finish without erroring out.