Project to help you get started deploying Generative AI models locally using LLaMA C++. LLaMA C++ enables LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
If you haven't already done so, install Miniforge. Miniforge provides minimal installers for Conda and Mamba specific to conda-forge, with the following features pre-configured:
- Packages in the base environment are obtained from the
conda-forge
channel. - The
conda-forge
channel is set as the default (and only) channel.
Conda/mamba will be the primary package managers used to install the required Python dependencies. For convenience, a script is included that will download and install Miniforge, Conda, and Mamba. You can run the script using the following command.
./bin/install-miniforge.sh
After adding any necessary dependencies that should be installed via conda/mamba
to the environment.yml
file and any
dependencies that should be installed via pip
to the requirements.txt
file you create the Conda environment in a
sub-directory ./env
of your project directory by running the following shell script.
./bin/create-conda-env.sh
In order to support Metal GPU acceleration you need to install a few extra dependecies. These dependencies are added to the
environment-metal-gpu.yml
and requirements-metal-gpu.txt
files. Create the Conda environment in a sub-directory ./env
of
your project directory by running the following shell script.
./bin/create-conda-env.sh environment-metal-gpu.yml
In order to support Metal GPU acceleration you need to install a few extra dependecies. These dependencies are added to the
environment-nvidia-gpu.yml
and requirements-nvidia-gpu.txt
files. Create the Conda environment in a sub-directory ./env
of
your project directory by running the following shell script.
./bin/create-conda-env.sh environment-nvidia-gpu.yml
For convenience there is an installer scripts which can be used to download pre-compiled LLaMA C++ binaries for various
OS and CPU architectures and install the binaries into the bin/
directory of the Conda environment. You can find the
latest release for LLaMA C++ on GitHub and pass the link to the zip archive for you specific release to the script as a command line argument.
DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=lllama-b3868-bin-macos-arm64.zip
./bin/install-llama-cpp.sh "$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=llama-b3868-bin-macos-x64.zip
./bin/install-llama-cpp.sh "$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=llama-b3868-bin-ubuntu-x64.zip
./bin/install-llama-cpp.sh "$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=llama-b3868-bin-win-avx512-x64.zip
./bin/install-llama-cpp.sh "$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
Install XCode. Then run the following command to install XCode Command Line Tools.
xcode-select --install
After creating the Conda environment you can build LLaMA C++ by running the following command.
conda run --prefix ./env --live-stream ./bin/build-llama-cpp.sh
This command does the following.
- Properly configures the Conda environment.
- Clones [LLaMA C++](https://github.com/ggerganov/llama.cpp into
./src/llama-cpp
. - Builds LLaMA C++ with support for CPU acceleration using OpenBlas in
./build/llama-cpp
. - Installs the binaries into the
bin/
directory of the Conda environment. - Removes the
./src/llama-cpp
directory as it is no longer needed. - Removes the
./build/llama-cpp
directory as it is no longer needed.
After creating the Conda environment you can build LLaMA C++ by running the following command.
conda run --prefix ./env --live-stream ./bin/build-llama-cpp-metal-gpu.sh
After creating the Conda environment you can build LLaMA C++ by with support for GPU acceleration by running the following command.
conda run --prefix ./env --live-stream ./bin/build-llama-cpp-nvidia-gpu.sh
For a detailed discussion of additional NVIDIA GPU compilation options that might improve performance on particular GPU architectures see the LLaMA C++ build documentation.
Once the new environment has been created you can activate the environment with the following command.
conda activate ./env
Note that the ./env
directory is not under version control as it can always be re-created as
necessary.
This project is supported by funding from King Abdullah University of Science and Technology (KAUST) - Center of Excellence for Generative AI, under award number 5940.