Skip to content

Commit 6e10eef

Browse files
authored
Merge pull request SciSharp#742 from Lamothe/readme
Updated READMEs.
2 parents 4192bea + 4c4cc54 commit 6e10eef

File tree

2 files changed

+25
-25
lines changed

2 files changed

+25
-25
lines changed

LLama.SemanticKernel/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LLamaSharp.SemanticKernel
22

3-
LLamaSharp.SemanticKernel are connections for [SemanticKernel](https://github.com/microsoft/semantic-kernel): an SDK for integrating various LLM interfaces into a single implementation. With this, you can add local LLaMa queries as another connection point with your existing connections.
3+
LLamaSharp.SemanticKernel are connections for [SemanticKernel](https://github.com/microsoft/semantic-kernel): an SDK for integrating various LLM interfaces into a single implementation. With this, you can add local LLaMA queries as another connection point with your existing connections.
44

55
For reference on how to implement it, view the following examples:
66

README.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
![logo](Assets/LLamaSharpLogo.png)
1+
![logo](Assets/LLamaSharpLogo.png)
22

33
[![Discord](https://img.shields.io/discord/1106946823282761851?label=Discord)](https://discord.gg/7wNVU65ZDY)
44
[![QQ Group](https://img.shields.io/static/v1?label=QQ&message=加入QQ群&color=brightgreen)](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=sN9VVMwbWjs5L0ATpizKKxOcZdEPMrp8&authKey=RLDw41bLTrEyEgZZi%2FzT4pYk%2BwmEFgFcrhs8ZbkiVY7a4JFckzJefaYNW6Lk4yPX&noverify=0&group_code=985366726)
@@ -11,7 +11,7 @@
1111
[![LLamaSharp Badge](https://img.shields.io/nuget/v/LLamaSharp.Backend.OpenCL?label=LLamaSharp.Backend.OpenCL)](https://www.nuget.org/packages/LLamaSharp.Backend.OpenCL)
1212

1313

14-
**LLamaSharp is a cross-platform library to run 🦙LLaMA/LLaVA model (and others) on your local device. Based on [llama.cpp](https://github.com/ggerganov/llama.cpp), inference with LLamaSharp is efficient on both CPU and GPU. With the higher-level APIs and RAG support, it's convenient to deploy LLM (Large Language Model) in your application with LLamaSharp.**
14+
**LLamaSharp is a cross-platform library to run 🦙LLaMA/LLaVA model (and others) on your local device. Based on [llama.cpp](https://github.com/ggerganov/llama.cpp), inference with LLamaSharp is efficient on both CPU and GPU. With the higher-level APIs and RAG support, it's convenient to deploy LLMs (Large Language Models) in your application with LLamaSharp.**
1515

1616
**Please star the repo to show your support for this project!🤗**
1717

@@ -59,9 +59,9 @@
5959

6060
## 🔗Integrations & Examples
6161

62-
There are integrations for the following libraries, making it easier to develop your APP. Integrations for semantic-kernel and kernel-memory are developed in LLamaSharp repository, while others are developed in their own repositories.
62+
There are integrations for the following libraries, making it easier to develop your APP. Integrations for semantic-kernel and kernel-memory are developed in the LLamaSharp repository, while others are developed in their own repositories.
6363

64-
- [semantic-kernel](https://github.com/microsoft/semantic-kernel): an SDK that integrates LLM like OpenAI, Azure OpenAI, and Hugging Face.
64+
- [semantic-kernel](https://github.com/microsoft/semantic-kernel): an SDK that integrates LLMs like OpenAI, Azure OpenAI, and Hugging Face.
6565
- [kernel-memory](https://github.com/microsoft/kernel-memory): a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for RAG ([Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation)), synthetic memory, prompt engineering, and custom semantic memory processing.
6666
- [BotSharp](https://github.com/SciSharp/BotSharp): an open source machine learning framework for AI Bot platform builder.
6767
- [Langchain](https://github.com/tryAGI/LangChain): a framework for developing applications powered by language models.
@@ -82,20 +82,20 @@ The following examples show how to build APPs with LLamaSharp.
8282

8383
### Installation
8484

85-
To gain high performance, LLamaSharp interacts with a native library compiled from c++, which is called `backend`. We provide backend packages for Windows, Linux and MAC with CPU, Cuda, Metal and OpenCL. You **don't** need to handle anything about c++ but just install the backend packages.
85+
To gain high performance, LLamaSharp interacts with native libraries compiled from c++, these are called `backends`. We provide backend packages for Windows, Linux and Mac with CPU, CUDA, Metal and OpenCL. You **don't** need to compile any c++, just install the backend packages.
8686

87-
If no published backend match your device, please open an issue to let us know. If compiling c++ code is not difficult for you, you could also follow [this guide](./docs/ContributingGuide.md) to compile a backend and run LLamaSharp with it.
87+
If no published backend matches your device, please open an issue to let us know. If compiling c++ code is not difficult for you, you could also follow [this guide](./docs/ContributingGuide.md) to compile a backend and run LLamaSharp with it.
8888

8989
1. Install [LLamaSharp](https://www.nuget.org/packages/LLamaSharp) package on NuGet:
9090

9191
```
9292
PM> Install-Package LLamaSharp
9393
```
9494

95-
2. Install one or more of these backends, or use self-compiled backend.
95+
2. Install one or more of these backends, or use a self-compiled backend.
9696

97-
- [`LLamaSharp.Backend.Cpu`](https://www.nuget.org/packages/LLamaSharp.Backend.Cpu): Pure CPU for Windows & Linux & MAC. Metal (GPU) support for MAC.
98-
- [`LLamaSharp.Backend.Cuda11`](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda11): CUDA11 for Windows & Linux.
97+
- [`LLamaSharp.Backend.Cpu`](https://www.nuget.org/packages/LLamaSharp.Backend.Cpu): Pure CPU for Windows, Linux & Mac. Metal (GPU) support for Mac.
98+
- [`LLamaSharp.Backend.Cuda11`](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda11): CUDA 11 for Windows & Linux.
9999
- [`LLamaSharp.Backend.Cuda12`](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12): CUDA 12 for Windows & Linux.
100100
- [`LLamaSharp.Backend.OpenCL`](https://www.nuget.org/packages/LLamaSharp.Backend.OpenCL): OpenCL for Windows & Linux.
101101

@@ -104,18 +104,18 @@ PM> Install-Package LLamaSharp
104104

105105
### Model preparation
106106

107-
There are two popular format of model file of LLM now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:
107+
There are two popular formats of model file of LLMs, these are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses a `GGUF` format file, which can be converted from these two formats. To get a `GGUF` file, there are two options:
108108

109-
1. Search model name + 'gguf' in [Huggingface](https://huggingface.co), you will find lots of model files that have already been converted to GGUF format. Please take care of the publishing time of them because some old ones could only work with old version of LLamaSharp.
109+
1. Search model name + 'gguf' in [Huggingface](https://huggingface.co), you will find lots of model files that have already been converted to GGUF format. Please take note of the publishing time of them because some old ones may only work with older versions of LLamaSharp.
110110

111-
2. Convert PyTorch or Huggingface format to GGUF format yourself. Please follow the instructions of [this part of llama.cpp readme](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize) to convert them with the python scripts.
111+
2. Convert PyTorch or Huggingface format to GGUF format yourself. Please follow the instructions from [this part of llama.cpp readme](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize) to convert them with python scripts.
112112

113-
Generally, we recommend downloading models with quantization rather than fp16, because it significantly reduce the required memory size while only slightly impact on its generation quality.
113+
Generally, we recommend downloading models with quantization rather than fp16, because it significantly reduces the required memory size while only slightly impacting the generation quality.
114114

115115

116116
### Example of LLaMA chat session
117117

118-
Here is a simple example to chat with bot based on LLM in LLamaSharp. Please replace the model path with yours.
118+
Here is a simple example to chat with a bot based on a LLM in LLamaSharp. Please replace the model path with yours.
119119

120120
```cs
121121
using LLama.Common;
@@ -174,24 +174,24 @@ For more examples, please refer to [LLamaSharp.Examples](./LLama.Examples).
174174

175175
#### Why GPU is not used when I have installed CUDA
176176

177-
1. If you are using backend packages, please make sure you have installed the cuda backend package which matches the cuda version of your device. Please note that before LLamaSharp v0.10.0, only one backend package should be installed.
178-
2. Add `NativeLibraryConfig.Instance.WithLogs(LLamaLogLevel.Info)` to the very beginning of your code. The log will show which native library file is loaded. If the CPU library is loaded, please try to compile the native library yourself and open an issue for that. If the CUDA library is loaded, please check if `GpuLayerCount > 0` when loading the model weight.
177+
1. If you are using backend packages, please make sure you have installed the CUDA backend package which matches the CUDA version install on your system. Please note that before LLamaSharp v0.10.0, only one backend package should be installed at a time.
178+
2. Add `NativeLibraryConfig.Instance.WithLogCallback(delegate (LLamaLogLevel level, string message) { Console.Write($"{level}: {message}"); } )` to the very beginning of your code. The log will show which native library file is loaded. If the CPU library is loaded, please try to compile the native library yourself and open an issue for that. If the CUDA library is loaded, please check if `GpuLayerCount > 0` when loading the model weight.
179179

180180
#### Why the inference is slow
181181

182-
Firstly, due to the large size of LLM models, it requires more time to generate outputs than other models, especially when you are using models larger than 30B.
182+
Firstly, due to the large size of LLM models, it requires more time to generate output than other models, especially when you are using models larger than 30B parameters.
183183

184184
To see if that's a LLamaSharp performance issue, please follow the two tips below.
185185

186186
1. If you are using CUDA, Metal or OpenCL, please set `GpuLayerCount` as large as possible.
187-
2. If it's still slower than you expect it to be, please try to run the same model with same setting in [llama.cpp examples](https://github.com/ggerganov/llama.cpp/tree/master/examples). If llama.cpp outperforms LLamaSharp significantly, it's likely a LLamaSharp BUG and please report us for that.
187+
2. If it's still slower than you expect it to be, please try to run the same model with same setting in [llama.cpp examples](https://github.com/ggerganov/llama.cpp/tree/master/examples). If llama.cpp outperforms LLamaSharp significantly, it's likely a LLamaSharp BUG and please report that to us.
188188

189189

190-
#### Why the program crashes before any output is generated
190+
#### Why is the program crashing before any output is generated
191191

192192
Generally, there are two possible cases for this problem:
193193

194-
1. The native library (backend) you are using is not compatible with the LLamaSharp version. If you compiled the native library yourself, please make sure you have checkouted llama.cpp to the corresponding commit of LLamaSharp, which could be found at the bottom of README.
194+
1. The native library (backend) you are using is not compatible with the LLamaSharp version. If you compiled the native library yourself, please make sure you have checked-out llama.cpp to the corresponding commit of LLamaSharp, which can be found at the bottom of README.
195195
2. The model file you are using is not compatible with the backend. If you are using a GGUF file downloaded from huggingface, please check its publishing time.
196196

197197
#### Why my model is generating output infinitely
@@ -201,15 +201,15 @@ Please set anti-prompt or max-length when executing the inference.
201201

202202
## 🙌Contributing
203203

204-
Any contribution is welcomed! There's a TODO list in [LLamaSharp Dev Project](https://github.com/orgs/SciSharp/projects/5) and you could pick an interesting one to start. Please read the [contributing guide](./CONTRIBUTING.md) for more information.
204+
All contributions are welcome! There's a TODO list in [LLamaSharp Dev Project](https://github.com/orgs/SciSharp/projects/5) and you can pick an interesting one to start. Please read the [contributing guide](./CONTRIBUTING.md) for more information.
205205

206-
You can also do one of the followings to help us make LLamaSharp better:
206+
You can also do one of the following to help us make LLamaSharp better:
207207

208208
- Submit a feature request.
209-
- Star and share LLamaSharp to let others know it.
209+
- Star and share LLamaSharp to let others know about it.
210210
- Write a blog or demo about LLamaSharp.
211211
- Help to develop Web API and UI integration.
212-
- Just open an issue about the problem you met!
212+
- Just open an issue about the problem you've found!
213213

214214
## Join the community
215215

0 commit comments

Comments
 (0)