|
12 | 12 | },
|
13 | 13 | {
|
14 | 14 | "cell_type": "markdown",
|
15 |
| - "id": "e49f1e0d", |
| 15 | + "id": "9a3d6f34", |
16 | 16 | "metadata": {},
|
17 | 17 | "source": [
|
18 | 18 | "# NomicEmbeddings\n",
|
19 | 19 | "\n",
|
20 |
| - "This notebook covers how to get started with Nomic embedding models.\n", |
| 20 | + "This will help you get started with Nomic embedding models using LangChain. For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n", |
21 | 21 | "\n",
|
22 |
| - "## Installation" |
| 22 | + "## Overview\n", |
| 23 | + "### Integration details\n", |
| 24 | + "\n", |
| 25 | + "import { ItemTable } from \"@theme/FeatureTables\";\n", |
| 26 | + "\n", |
| 27 | + "<ItemTable category=\"text_embedding\" item=\"Nomic\" />\n", |
| 28 | + "\n", |
| 29 | + "## Setup\n", |
| 30 | + "\n", |
| 31 | + "To access Nomic embedding models you'll need to create a/an Nomic account, get an API key, and install the `langchain-nomic` integration package.\n", |
| 32 | + "\n", |
| 33 | + "### Credentials\n", |
| 34 | + "\n", |
| 35 | + "Head to [https://atlas.nomic.ai/](https://atlas.nomic.ai/) to sign up to Nomic and generate an API key. Once you've done this set the `NOMIC_API_KEY` environment variable:" |
23 | 36 | ]
|
24 | 37 | },
|
25 | 38 | {
|
26 | 39 | "cell_type": "code",
|
27 |
| - "execution_count": null, |
28 |
| - "id": "4c3bef91", |
| 40 | + "execution_count": 2, |
| 41 | + "id": "36521c2a", |
29 | 42 | "metadata": {},
|
30 | 43 | "outputs": [],
|
31 | 44 | "source": [
|
32 |
| - "# install package\n", |
33 |
| - "!pip install -U langchain-nomic" |
| 45 | + "import getpass\n", |
| 46 | + "import os\n", |
| 47 | + "\n", |
| 48 | + "if not os.getenv(\"NOMIC_API_KEY\"):\n", |
| 49 | + " os.environ[\"NOMIC_API_KEY\"] = getpass.getpass(\"Enter your Nomic API key: \")" |
34 | 50 | ]
|
35 | 51 | },
|
36 | 52 | {
|
37 | 53 | "cell_type": "markdown",
|
38 |
| - "id": "2b4f3e15", |
| 54 | + "id": "c84fb993", |
39 | 55 | "metadata": {},
|
40 | 56 | "source": [
|
41 |
| - "## Environment Setup\n", |
42 |
| - "\n", |
43 |
| - "Make sure to set the following environment variables:\n", |
44 |
| - "\n", |
45 |
| - "- `NOMIC_API_KEY`\n", |
46 |
| - "\n", |
47 |
| - "## Usage" |
| 57 | + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:" |
48 | 58 | ]
|
49 | 59 | },
|
50 | 60 | {
|
51 | 61 | "cell_type": "code",
|
52 |
| - "execution_count": null, |
53 |
| - "id": "62e0dbc3", |
54 |
| - "metadata": { |
55 |
| - "tags": [] |
56 |
| - }, |
| 62 | + "execution_count": 3, |
| 63 | + "id": "39a4953b", |
| 64 | + "metadata": {}, |
57 | 65 | "outputs": [],
|
58 | 66 | "source": [
|
59 |
| - "from langchain_nomic.embeddings import NomicEmbeddings\n", |
| 67 | + "# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n", |
| 68 | + "# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")" |
| 69 | + ] |
| 70 | + }, |
| 71 | + { |
| 72 | + "cell_type": "markdown", |
| 73 | + "id": "d9664366", |
| 74 | + "metadata": {}, |
| 75 | + "source": [ |
| 76 | + "### Installation\n", |
60 | 77 | "\n",
|
61 |
| - "embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\")" |
| 78 | + "The LangChain Nomic integration lives in the `langchain-nomic` package:" |
62 | 79 | ]
|
63 | 80 | },
|
64 | 81 | {
|
65 | 82 | "cell_type": "code",
|
66 |
| - "execution_count": null, |
67 |
| - "id": "12fcfb4b", |
| 83 | + "execution_count": 2, |
| 84 | + "id": "64853226", |
68 | 85 | "metadata": {},
|
69 |
| - "outputs": [], |
| 86 | + "outputs": [ |
| 87 | + { |
| 88 | + "name": "stdout", |
| 89 | + "output_type": "stream", |
| 90 | + "text": [ |
| 91 | + "Note: you may need to restart the kernel to use updated packages.\n" |
| 92 | + ] |
| 93 | + } |
| 94 | + ], |
70 | 95 | "source": [
|
71 |
| - "embeddings.embed_query(\"My query to look up\")" |
| 96 | + "%pip install -qU langchain-nomic" |
| 97 | + ] |
| 98 | + }, |
| 99 | + { |
| 100 | + "cell_type": "markdown", |
| 101 | + "id": "45dd1724", |
| 102 | + "metadata": {}, |
| 103 | + "source": [ |
| 104 | + "## Instantiation\n", |
| 105 | + "\n", |
| 106 | + "Now we can instantiate our model object and generate chat completions:" |
72 | 107 | ]
|
73 | 108 | },
|
74 | 109 | {
|
75 | 110 | "cell_type": "code",
|
76 |
| - "execution_count": null, |
77 |
| - "id": "1f2e6104", |
| 111 | + "execution_count": 10, |
| 112 | + "id": "9ea7a09b", |
78 | 113 | "metadata": {},
|
79 | 114 | "outputs": [],
|
80 | 115 | "source": [
|
81 |
| - "embeddings.embed_documents(\n", |
82 |
| - " [\"This is a content of the document\", \"This is another document\"]\n", |
| 116 | + "from langchain_nomic import NomicEmbeddings\n", |
| 117 | + "\n", |
| 118 | + "embeddings = NomicEmbeddings(\n", |
| 119 | + " model=\"nomic-embed-text-v1.5\",\n", |
| 120 | + " # dimensionality=256,\n", |
| 121 | + " # Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka)\n", |
| 122 | + " # to enable variable-length embeddings with a single model.\n", |
| 123 | + " # This means that you can specify the dimensionality of the embeddings at inference time.\n", |
| 124 | + " # The model supports dimensionality from 64 to 768.\n", |
| 125 | + " # inference_mode=\"remote\",\n", |
| 126 | + " # One of `remote`, `local` (Embed4All), or `dynamic` (automatic). Defaults to `remote`.\n", |
| 127 | + " # api_key=... , # if using remote inference,\n", |
| 128 | + " # device=\"cpu\",\n", |
| 129 | + " # The device to use for local embeddings. Choices include\n", |
| 130 | + " # `cpu`, `gpu`, `nvidia`, `amd`, or a specific device name. See\n", |
| 131 | + " # the docstring for `GPT4All.__init__` for more info. Typically\n", |
| 132 | + " # defaults to CPU. Do not use on macOS.\n", |
83 | 133 | ")"
|
84 | 134 | ]
|
85 | 135 | },
|
| 136 | + { |
| 137 | + "cell_type": "markdown", |
| 138 | + "id": "77d271b6", |
| 139 | + "metadata": {}, |
| 140 | + "source": [ |
| 141 | + "## Indexing and Retrieval\n", |
| 142 | + "\n", |
| 143 | + "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n", |
| 144 | + "\n", |
| 145 | + "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`." |
| 146 | + ] |
| 147 | + }, |
86 | 148 | {
|
87 | 149 | "cell_type": "code",
|
88 |
| - "execution_count": null, |
89 |
| - "id": "46739f68", |
| 150 | + "execution_count": 5, |
| 151 | + "id": "d817716b", |
| 152 | + "metadata": {}, |
| 153 | + "outputs": [ |
| 154 | + { |
| 155 | + "data": { |
| 156 | + "text/plain": [ |
| 157 | + "'LangChain is the framework for building context-aware reasoning applications'" |
| 158 | + ] |
| 159 | + }, |
| 160 | + "execution_count": 5, |
| 161 | + "metadata": {}, |
| 162 | + "output_type": "execute_result" |
| 163 | + } |
| 164 | + ], |
| 165 | + "source": [ |
| 166 | + "# Create a vector store with a sample text\n", |
| 167 | + "from langchain_core.vectorstores import InMemoryVectorStore\n", |
| 168 | + "\n", |
| 169 | + "text = \"LangChain is the framework for building context-aware reasoning applications\"\n", |
| 170 | + "\n", |
| 171 | + "vectorstore = InMemoryVectorStore.from_texts(\n", |
| 172 | + " [text],\n", |
| 173 | + " embedding=embeddings,\n", |
| 174 | + ")\n", |
| 175 | + "\n", |
| 176 | + "# Use the vectorstore as a retriever\n", |
| 177 | + "retriever = vectorstore.as_retriever()\n", |
| 178 | + "\n", |
| 179 | + "# Retrieve the most similar text\n", |
| 180 | + "retrieved_documents = retriever.invoke(\"What is LangChain?\")\n", |
| 181 | + "\n", |
| 182 | + "# show the retrieved document's content\n", |
| 183 | + "retrieved_documents[0].page_content" |
| 184 | + ] |
| 185 | + }, |
| 186 | + { |
| 187 | + "cell_type": "markdown", |
| 188 | + "id": "e02b9855", |
90 | 189 | "metadata": {},
|
91 |
| - "outputs": [], |
92 | 190 | "source": [
|
93 |
| - "# async embed query\n", |
94 |
| - "await embeddings.aembed_query(\"My query to look up\")" |
| 191 | + "## Direct Usage\n", |
| 192 | + "\n", |
| 193 | + "Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.\n", |
| 194 | + "\n", |
| 195 | + "You can directly call these methods to get embeddings for your own use cases.\n", |
| 196 | + "\n", |
| 197 | + "### Embed single texts\n", |
| 198 | + "\n", |
| 199 | + "You can embed single texts or documents with `embed_query`:" |
95 | 200 | ]
|
96 | 201 | },
|
97 | 202 | {
|
98 | 203 | "cell_type": "code",
|
99 |
| - "execution_count": null, |
100 |
| - "id": "e48632ea", |
| 204 | + "execution_count": 6, |
| 205 | + "id": "0d2befcd", |
101 | 206 | "metadata": {},
|
102 |
| - "outputs": [], |
| 207 | + "outputs": [ |
| 208 | + { |
| 209 | + "name": "stdout", |
| 210 | + "output_type": "stream", |
| 211 | + "text": [ |
| 212 | + "[0.024642944, 0.029083252, -0.14013672, -0.09082031, 0.058898926, -0.07489014, -0.0138168335, 0.0037\n" |
| 213 | + ] |
| 214 | + } |
| 215 | + ], |
103 | 216 | "source": [
|
104 |
| - "# async embed documents\n", |
105 |
| - "await embeddings.aembed_documents(\n", |
106 |
| - " [\"This is a content of the document\", \"This is another document\"]\n", |
107 |
| - ")" |
| 217 | + "single_vector = embeddings.embed_query(text)\n", |
| 218 | + "print(str(single_vector)[:100]) # Show the first 100 characters of the vector" |
108 | 219 | ]
|
109 | 220 | },
|
110 | 221 | {
|
111 | 222 | "cell_type": "markdown",
|
112 |
| - "id": "7a331dc3", |
| 223 | + "id": "1b5a7d03", |
113 | 224 | "metadata": {},
|
114 | 225 | "source": [
|
115 |
| - "### Custom Dimensionality\n", |
| 226 | + "### Embed multiple texts\n", |
116 | 227 | "\n",
|
117 |
| - "Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka) to enable variable-length embeddings with a single model. This means that you can specify the dimensionality of the embeddings at inference time. The model supports dimensionality from 64 to 768." |
| 228 | + "You can embed multiple texts with `embed_documents`:" |
118 | 229 | ]
|
119 | 230 | },
|
120 | 231 | {
|
121 | 232 | "cell_type": "code",
|
122 |
| - "execution_count": null, |
123 |
| - "id": "993f65c8", |
| 233 | + "execution_count": 7, |
| 234 | + "id": "2f4d6e97", |
| 235 | + "metadata": {}, |
| 236 | + "outputs": [ |
| 237 | + { |
| 238 | + "name": "stdout", |
| 239 | + "output_type": "stream", |
| 240 | + "text": [ |
| 241 | + "[0.012771606, 0.023727417, -0.12365723, -0.083740234, 0.06530762, -0.07110596, -0.021896362, -0.0068\n", |
| 242 | + "[-0.019058228, 0.04058838, -0.15222168, -0.06842041, -0.012130737, -0.07128906, -0.04534912, 0.00522\n" |
| 243 | + ] |
| 244 | + } |
| 245 | + ], |
| 246 | + "source": [ |
| 247 | + "text2 = (\n", |
| 248 | + " \"LangGraph is a library for building stateful, multi-actor applications with LLMs\"\n", |
| 249 | + ")\n", |
| 250 | + "two_vectors = embeddings.embed_documents([text, text2])\n", |
| 251 | + "for vector in two_vectors:\n", |
| 252 | + " print(str(vector)[:100]) # Show the first 100 characters of the vector" |
| 253 | + ] |
| 254 | + }, |
| 255 | + { |
| 256 | + "cell_type": "markdown", |
| 257 | + "id": "98785c12", |
124 | 258 | "metadata": {},
|
125 |
| - "outputs": [], |
126 | 259 | "source": [
|
127 |
| - "embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\", dimensionality=256)\n", |
| 260 | + "## API Reference\n", |
128 | 261 | "\n",
|
129 |
| - "embeddings.embed_query(\"My query to look up\")" |
| 262 | + "For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n" |
130 | 263 | ]
|
131 | 264 | }
|
132 | 265 | ],
|
|
146 | 279 | "name": "python",
|
147 | 280 | "nbconvert_exporter": "python",
|
148 | 281 | "pygments_lexer": "ipython3",
|
149 |
| - "version": "3.10.5" |
| 282 | + "version": "3.9.6" |
150 | 283 | }
|
151 | 284 | },
|
152 | 285 | "nbformat": 4,
|
|
0 commit comments