LLAMA Engine is a simple C++/CUDA library designed to run LLAMA2 and LLAMA3.
This project is a fork of the following projects:
- LLAMA2 implementation
- LLAMA3 implementation
- CUDA kernel - The CUDA code has been copied as it is.
- 🚀 C++/CUDA Integration: Seamlessly integrates C++ and CUDA for high-performance computations.
- 🦙 Support for LLAMA2 and LLAMA3: Specifically developed for running LLAMA2 and LLAMA3 models.
- 👌 Easy to Use: Simple API for quick setup and execution.
To install LLAMA Engine, follow these steps:
Note: This is a header-only project, so you can easily adopt it by including the
includedirectory in your project's build.
-
Clone the repository:
git clone git@github.com:hcyang1012/llama_engine.git
-
Include the
includedirectory in your project's build system. -
Build the project:
cd build cmake .. -DUSE_CUDA=OFF # Build without CUDA cmake .. -DUSE_CUDA=ON # Build with CUDA make
Convert Meta's model to the required binary format. Refer to these projects:
Tested models:
- LLAMA2: stories15M model from llama2.c.
- LLAMA3: llama3.1 8B-Instruct
Here is a basic example of how to use LLAMA Engine:
#include <llama.hpp>
#include <iostream>
int main() {
// Set up the configuration for LLAMA2
llama::LlamaConfig llama2_config = {
.checkpoint_path = "path/to/llama2/checkpoint",
.tokenizer_path = "path/to/llama2/tokenizer.bin",
.device_type = llama::DeviceType::CPU // or llama::DeviceType::CUDA
};
llama::Llama2<float> llama2(llama2_config);
// Generate text using LLAMA2
const char* prompt = "Hello, how are you?";
llama::RunConfig run_config = {
.temperature = 1.0f,
.topp = 0.9f,
.rng_seed = 42
};
llama2.Generate(prompt, 256, run_config);
// Set up the configuration for LLAMA3
llama::LlamaConfig llama3_config = {
.checkpoint_path = "path/to/llama3/checkpoint",
.tokenizer_path = "path/to/llama3/tokenizer_llama3.bin",
.device_type = llama::DeviceType::CPU // or llama::DeviceType::CUDA
};
llama::Llama3<float> llama3(llama3_config);
// Generate text using LLAMA3
llama3.Generate(prompt, 256, run_config);
return 0;
}Special thanks to the authors of the original projects:
- karpathy
- jameswdelancey
- likejazz
For any questions or suggestions, feel free to open an issue or contact us at heecheol.yang@outlook.com
Happy coding! 🎉 ```