|
| 1 | +# Large Language Model (LLM) Retrieval-Augmented Generation (RAG) Development Guide |
| 2 | + |
| 3 | +## Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) |
| 4 | + |
| 5 | +Large Language Models (LLM), also known as large-scale language models, are AI models designed to understand and generate human language. |
| 6 | + |
| 7 | +LLM typically refers to language models with hundreds of billions (or more) of parameters, trained on vast amounts of text data, gaining a deep understanding of language. Currently, well-known LLMs abroad include GPT-3.5, GPT-4, PaLM, Claude, and LLaMA, while domestically, there are models like Wenxin Yiyan, Xunfei Spark, Tongyi Qianwen, ChatGLM, and Baichuan. |
| 8 | + |
| 9 | +To explore performance limits, many researchers have started training increasingly larger language models, such as GPT-3 with 175 billion parameters and PaLM with 540 billion parameters. Although these large language models use similar architectures and pre-training tasks as smaller models (like BERT with 330 million parameters and GPT-2 with 1.5 billion parameters), they exhibit strikingly different capabilities, especially in solving complex tasks, showing remarkable potential, known as "emergent abilities." Taking GPT-3 and GPT-2 as examples, GPT-3 can solve few-shot tasks by learning context, while GPT-2 performs poorly in this regard. Therefore, the research community has named these large models "Large Language Models (LLM)." A notable application of LLM is ChatGPT, a bold attempt to use the GPT series LLM for human-like conversational applications, demonstrating very smooth and natural performance. |
| 10 | + |
| 11 | +Large Language Models (LLM) have stronger capabilities compared to traditional language models, yet they may still fail to provide accurate answers in some cases. To address the challenges faced by large language models in text generation and to improve model performance and output quality, researchers have proposed a new model architecture: Retrieval-Augmented Generation (RAG). This architecture cleverly integrates relevant information retrieved from a vast knowledge base and uses it to guide large language models in generating more accurate answers, significantly enhancing the accuracy and depth of responses. RAG has been successful in multiple fields, including question-answering systems, dialogue systems, document summarization, and document generation. |
| 12 | + |
| 13 | +## Building RAG Applications in KDP |
| 14 | + |
| 15 | +In KDP, it is convenient to utilize local or online large models, combined with user-specific data, to build RAG applications. Below, we illustrate how to build a RAG application using the Text to SQL scenario as an example. |
| 16 | + |
| 17 | +SQL language is widely used in the field of data analysis. Although SQL is relatively close to natural language, it still has some usage barriers for business professionals: |
| 18 | + |
| 19 | +- Must learn SQL syntax |
| 20 | +- Must understand table structures, clearly knowing which business data is in which tables |
| 21 | + |
| 22 | +With large models, can we leverage their capabilities, combined with private table structure information, to assist us in data analysis? For example, directly asking, "Who sold the most iPhones last month?" |
| 23 | + |
| 24 | +The answer is affirmative. |
| 25 | + |
| 26 | +To simplify development details, we directly use [Vanna](https://github.com/vanna-ai/vanna) to implement Text to SQL. For more flexible construction, consider using other tools like [LangChain](https://github.com/langchain-ai/langchain). |
| 27 | + |
| 28 | +### Component Dependencies |
| 29 | + |
| 30 | +Please install the following components in KDP: |
| 31 | + |
| 32 | +- ollama |
| 33 | +- milvus |
| 34 | +- jupyterlab |
| 35 | + |
| 36 | +Ollama is used to run large models locally, Milvus is used to store vectorized data, and JupyterLab is the development environment. |
| 37 | + |
| 38 | +### Running Large Models Locally |
| 39 | + |
| 40 | +We will use [phi3](https://ollama.com/library/phi3) as an example to run a large model. |
| 41 | + |
| 42 | +```shell |
| 43 | +kubectl exec -it $(kubectl get pods -l app.kubernetes.io/name=ollama -n kdp-data -o jsonpath='{.items[0].metadata.name}') -n kdp-data -- ollama pull phi3:3.8b |
| 44 | +``` |
| 45 | + |
| 46 | +Once successfully started, you can access the phi3 large model via `http://ollama:11434`. |
| 47 | + |
| 48 | +### Preparing the Development Environment |
| 49 | + |
| 50 | +Execute the following commands in the Terminal of jupyterlab: |
| 51 | + |
| 52 | +```bash |
| 53 | +# Enter the current user directory, which should be mounted to a PV to prevent environment loss after pod restart |
| 54 | +cd ~ |
| 55 | +# Create a Python virtual environment named vanna |
| 56 | +python -m venv vanna |
| 57 | +# Activate this virtual environment |
| 58 | +source vanna/bin/activate |
| 59 | +# Install necessary pip packages |
| 60 | +pip install -i https://mirrors.aliyun.com/pypi/simple/ 'vanna[milvus,ollama]' pyhive thrift ipykernel ipywidgets |
| 61 | +# Add the current virtual environment to the Jupyterlab kernel list |
| 62 | +python -m ipykernel install --user --name=vanna |
| 63 | +``` |
| 64 | + |
| 65 | +After execution, you should see a kernel named `vanna` in the Jupyterlab Launcher after a short while. |
| 66 | + |
| 67 | + |
| 68 | + |
| 69 | +### Building Text to SQL with Vanna |
| 70 | + |
| 71 | +#### Extending BaseEmbeddingFunction |
| 72 | + |
| 73 | +Since Vanna does not have a built-in embedding function for milvus by default, we need to extend `BaseEmbeddingFunction`. Create a Notebook with the `vanna` kernel and enter the following code: |
| 74 | + |
| 75 | +```python |
| 76 | +from vanna.ollama import Ollama |
| 77 | +from vanna.milvus import Milvus_VectorStore |
| 78 | +from milvus_model.base import BaseEmbeddingFunction |
| 79 | +from pymilvus import MilvusClient |
| 80 | +from ollama import Client |
| 81 | +from typing import List, Optional |
| 82 | +import numpy as np |
| 83 | + |
| 84 | +class OllamaEmbeddingFunction(BaseEmbeddingFunction): |
| 85 | + def __init__( |
| 86 | + self, |
| 87 | + model_name: str, |
| 88 | + host: Optional[str] = None |
| 89 | + ): |
| 90 | + self.model_name = model_name |
| 91 | + self.client = Client(host=host) |
| 92 | + |
| 93 | + def encode_queries(self, queries: List[str]) -> List[np.array]: |
| 94 | + return self._encode(queries) |
| 95 | + |
| 96 | + def __call__(self, texts: List[str]) -> List[np.array]: |
| 97 | + return self._encode(texts) |
| 98 | + |
| 99 | + def encode_documents(self, documents: List[str]) -> List[np.array]: |
| 100 | + return self._encode(documents) |
| 101 | + |
| 102 | + def _encode(self, texts: List[str]): |
| 103 | + return [np.array(self.client.embeddings(model=self.model_name, prompt=text)['embedding']) for text in texts] |
| 104 | + |
| 105 | +class MyVanna(Milvus_VectorStore, Ollama): |
| 106 | + def __init__(self, config=None): |
| 107 | + fn = OllamaEmbeddingFunction(model_name=config['embedding_model'], host=config['ollama_host']) |
| 108 | + milvus = MilvusClient(uri=config['milvus_host']) |
| 109 | + config['embedding_function'] = fn |
| 110 | + config['milvus_client'] = milvus |
| 111 | + Milvus_VectorStore.__init__(self, config=config) |
| 112 | + Ollama.__init__(self, config=config) |
| 113 | + |
| 114 | +vn = MyVanna(config={'ollama_host': 'http://ollama:11434', 'model': 'phi3:3.8b', 'embedding_model': 'phi3:3.8b', 'milvus_host': 'http://milvus:19530'}) |
| 115 | +``` |
| 116 | + |
| 117 | +#### Scenario with a Single Table |
| 118 | + |
| 119 | +Assume there is already a table in Hive with the following create table statement: |
| 120 | + |
| 121 | +```sql |
| 122 | +CREATE TABLE IF NOT EXISTS test_table (id bigint, data string); |
| 123 | +``` |
| 124 | + |
| 125 | +In the Notebook, we continue by writing the following code: |
| 126 | + |
| 127 | +```python |
| 128 | +vn.connect_to_hive(host='hive-server2-0.hive-server2.kdp-data.svc.cluster.local', |
| 129 | + dbname='default', |
| 130 | + port=10000, |
| 131 | + auth='NOSASL', |
| 132 | + user='root') |
| 133 | + |
| 134 | +vn.train(ddl='CREATE TABLE IF NOT EXISTS test_table (id bigint, data string)') |
| 135 | + |
| 136 | +# Ask question |
| 137 | +# You will see an output similar to this SQL: |
| 138 | +# SELECT id |
| 139 | +# FROM minio_test_2 |
| 140 | +# ORDER BY data DESC |
| 141 | +# LIMIT 3 |
| 142 | +# And a chart display |
| 143 | +vn.ask("What are the top 3 ids of test_table?") |
| 144 | +``` |
| 145 | + |
| 146 | +#### Scenario with Multiple Tables |
| 147 | + |
| 148 | +Vanna provides an example SQLite database [Chinook.sqlite](https://vanna.ai/Chinook.sqlite). After downloading it, upload it to the same directory as the notebook in jupyterlab. Write the following code: |
| 149 | + |
| 150 | +```python |
| 151 | +vn.connect_to_sqlite('Chinook.sqlite') |
| 152 | + |
| 153 | +# Traverse all DDL statements to train the table structure |
| 154 | +df_ddl = vn.run_sql("SELECT type, sql FROM sqlite_master WHERE sql is not null") |
| 155 | +for ddl in df_ddl['sql'].to_list(): |
| 156 | + vn.train(ddl=ddl) |
| 157 | + |
| 158 | +# Ask question |
| 159 | +vn.ask(question="What are the top 10 billing countries by total billing?", allow_llm_to_see_data=True) |
| 160 | +``` |
| 161 | + |
| 162 | +If you choose to use another database, you can adjust the code to train specific create table statements. |
| 163 | + |
| 164 | +For more examples, please refer to the [official documentation](https://vanna.ai/docs/). |
0 commit comments