Skip to content

Commit 54341b7

Browse files
eyurtsevisahers1
authored andcommitted
docs: Update nomic AI embeddings integration docs (langchain-ai#25308)
Issue: langchain-ai#24856 --------- Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com> Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
1 parent 535281a commit 54341b7

File tree

2 files changed

+190
-51
lines changed

2 files changed

+190
-51
lines changed

docs/docs/integrations/text_embedding/nomic.ipynb

Lines changed: 184 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -12,121 +12,254 @@
1212
},
1313
{
1414
"cell_type": "markdown",
15-
"id": "e49f1e0d",
15+
"id": "9a3d6f34",
1616
"metadata": {},
1717
"source": [
1818
"# NomicEmbeddings\n",
1919
"\n",
20-
"This notebook covers how to get started with Nomic embedding models.\n",
20+
"This will help you get started with Nomic embedding models using LangChain. For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n",
2121
"\n",
22-
"## Installation"
22+
"## Overview\n",
23+
"### Integration details\n",
24+
"\n",
25+
"import { ItemTable } from \"@theme/FeatureTables\";\n",
26+
"\n",
27+
"<ItemTable category=\"text_embedding\" item=\"Nomic\" />\n",
28+
"\n",
29+
"## Setup\n",
30+
"\n",
31+
"To access Nomic embedding models you'll need to create a/an Nomic account, get an API key, and install the `langchain-nomic` integration package.\n",
32+
"\n",
33+
"### Credentials\n",
34+
"\n",
35+
"Head to [https://atlas.nomic.ai/](https://atlas.nomic.ai/) to sign up to Nomic and generate an API key. Once you've done this set the `NOMIC_API_KEY` environment variable:"
2336
]
2437
},
2538
{
2639
"cell_type": "code",
27-
"execution_count": null,
28-
"id": "4c3bef91",
40+
"execution_count": 2,
41+
"id": "36521c2a",
2942
"metadata": {},
3043
"outputs": [],
3144
"source": [
32-
"# install package\n",
33-
"!pip install -U langchain-nomic"
45+
"import getpass\n",
46+
"import os\n",
47+
"\n",
48+
"if not os.getenv(\"NOMIC_API_KEY\"):\n",
49+
" os.environ[\"NOMIC_API_KEY\"] = getpass.getpass(\"Enter your Nomic API key: \")"
3450
]
3551
},
3652
{
3753
"cell_type": "markdown",
38-
"id": "2b4f3e15",
54+
"id": "c84fb993",
3955
"metadata": {},
4056
"source": [
41-
"## Environment Setup\n",
42-
"\n",
43-
"Make sure to set the following environment variables:\n",
44-
"\n",
45-
"- `NOMIC_API_KEY`\n",
46-
"\n",
47-
"## Usage"
57+
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
4858
]
4959
},
5060
{
5161
"cell_type": "code",
52-
"execution_count": null,
53-
"id": "62e0dbc3",
54-
"metadata": {
55-
"tags": []
56-
},
62+
"execution_count": 3,
63+
"id": "39a4953b",
64+
"metadata": {},
5765
"outputs": [],
5866
"source": [
59-
"from langchain_nomic.embeddings import NomicEmbeddings\n",
67+
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
68+
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"id": "d9664366",
74+
"metadata": {},
75+
"source": [
76+
"### Installation\n",
6077
"\n",
61-
"embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\")"
78+
"The LangChain Nomic integration lives in the `langchain-nomic` package:"
6279
]
6380
},
6481
{
6582
"cell_type": "code",
66-
"execution_count": null,
67-
"id": "12fcfb4b",
83+
"execution_count": 2,
84+
"id": "64853226",
6885
"metadata": {},
69-
"outputs": [],
86+
"outputs": [
87+
{
88+
"name": "stdout",
89+
"output_type": "stream",
90+
"text": [
91+
"Note: you may need to restart the kernel to use updated packages.\n"
92+
]
93+
}
94+
],
7095
"source": [
71-
"embeddings.embed_query(\"My query to look up\")"
96+
"%pip install -qU langchain-nomic"
97+
]
98+
},
99+
{
100+
"cell_type": "markdown",
101+
"id": "45dd1724",
102+
"metadata": {},
103+
"source": [
104+
"## Instantiation\n",
105+
"\n",
106+
"Now we can instantiate our model object and generate chat completions:"
72107
]
73108
},
74109
{
75110
"cell_type": "code",
76-
"execution_count": null,
77-
"id": "1f2e6104",
111+
"execution_count": 10,
112+
"id": "9ea7a09b",
78113
"metadata": {},
79114
"outputs": [],
80115
"source": [
81-
"embeddings.embed_documents(\n",
82-
" [\"This is a content of the document\", \"This is another document\"]\n",
116+
"from langchain_nomic import NomicEmbeddings\n",
117+
"\n",
118+
"embeddings = NomicEmbeddings(\n",
119+
" model=\"nomic-embed-text-v1.5\",\n",
120+
" # dimensionality=256,\n",
121+
" # Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka)\n",
122+
" # to enable variable-length embeddings with a single model.\n",
123+
" # This means that you can specify the dimensionality of the embeddings at inference time.\n",
124+
" # The model supports dimensionality from 64 to 768.\n",
125+
" # inference_mode=\"remote\",\n",
126+
" # One of `remote`, `local` (Embed4All), or `dynamic` (automatic). Defaults to `remote`.\n",
127+
" # api_key=... , # if using remote inference,\n",
128+
" # device=\"cpu\",\n",
129+
" # The device to use for local embeddings. Choices include\n",
130+
" # `cpu`, `gpu`, `nvidia`, `amd`, or a specific device name. See\n",
131+
" # the docstring for `GPT4All.__init__` for more info. Typically\n",
132+
" # defaults to CPU. Do not use on macOS.\n",
83133
")"
84134
]
85135
},
136+
{
137+
"cell_type": "markdown",
138+
"id": "77d271b6",
139+
"metadata": {},
140+
"source": [
141+
"## Indexing and Retrieval\n",
142+
"\n",
143+
"Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
144+
"\n",
145+
"Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`."
146+
]
147+
},
86148
{
87149
"cell_type": "code",
88-
"execution_count": null,
89-
"id": "46739f68",
150+
"execution_count": 5,
151+
"id": "d817716b",
152+
"metadata": {},
153+
"outputs": [
154+
{
155+
"data": {
156+
"text/plain": [
157+
"'LangChain is the framework for building context-aware reasoning applications'"
158+
]
159+
},
160+
"execution_count": 5,
161+
"metadata": {},
162+
"output_type": "execute_result"
163+
}
164+
],
165+
"source": [
166+
"# Create a vector store with a sample text\n",
167+
"from langchain_core.vectorstores import InMemoryVectorStore\n",
168+
"\n",
169+
"text = \"LangChain is the framework for building context-aware reasoning applications\"\n",
170+
"\n",
171+
"vectorstore = InMemoryVectorStore.from_texts(\n",
172+
" [text],\n",
173+
" embedding=embeddings,\n",
174+
")\n",
175+
"\n",
176+
"# Use the vectorstore as a retriever\n",
177+
"retriever = vectorstore.as_retriever()\n",
178+
"\n",
179+
"# Retrieve the most similar text\n",
180+
"retrieved_documents = retriever.invoke(\"What is LangChain?\")\n",
181+
"\n",
182+
"# show the retrieved document's content\n",
183+
"retrieved_documents[0].page_content"
184+
]
185+
},
186+
{
187+
"cell_type": "markdown",
188+
"id": "e02b9855",
90189
"metadata": {},
91-
"outputs": [],
92190
"source": [
93-
"# async embed query\n",
94-
"await embeddings.aembed_query(\"My query to look up\")"
191+
"## Direct Usage\n",
192+
"\n",
193+
"Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.\n",
194+
"\n",
195+
"You can directly call these methods to get embeddings for your own use cases.\n",
196+
"\n",
197+
"### Embed single texts\n",
198+
"\n",
199+
"You can embed single texts or documents with `embed_query`:"
95200
]
96201
},
97202
{
98203
"cell_type": "code",
99-
"execution_count": null,
100-
"id": "e48632ea",
204+
"execution_count": 6,
205+
"id": "0d2befcd",
101206
"metadata": {},
102-
"outputs": [],
207+
"outputs": [
208+
{
209+
"name": "stdout",
210+
"output_type": "stream",
211+
"text": [
212+
"[0.024642944, 0.029083252, -0.14013672, -0.09082031, 0.058898926, -0.07489014, -0.0138168335, 0.0037\n"
213+
]
214+
}
215+
],
103216
"source": [
104-
"# async embed documents\n",
105-
"await embeddings.aembed_documents(\n",
106-
" [\"This is a content of the document\", \"This is another document\"]\n",
107-
")"
217+
"single_vector = embeddings.embed_query(text)\n",
218+
"print(str(single_vector)[:100]) # Show the first 100 characters of the vector"
108219
]
109220
},
110221
{
111222
"cell_type": "markdown",
112-
"id": "7a331dc3",
223+
"id": "1b5a7d03",
113224
"metadata": {},
114225
"source": [
115-
"### Custom Dimensionality\n",
226+
"### Embed multiple texts\n",
116227
"\n",
117-
"Nomic's `nomic-embed-text-v1.5` model was [trained with Matryoshka learning](https://blog.nomic.ai/posts/nomic-embed-matryoshka) to enable variable-length embeddings with a single model. This means that you can specify the dimensionality of the embeddings at inference time. The model supports dimensionality from 64 to 768."
228+
"You can embed multiple texts with `embed_documents`:"
118229
]
119230
},
120231
{
121232
"cell_type": "code",
122-
"execution_count": null,
123-
"id": "993f65c8",
233+
"execution_count": 7,
234+
"id": "2f4d6e97",
235+
"metadata": {},
236+
"outputs": [
237+
{
238+
"name": "stdout",
239+
"output_type": "stream",
240+
"text": [
241+
"[0.012771606, 0.023727417, -0.12365723, -0.083740234, 0.06530762, -0.07110596, -0.021896362, -0.0068\n",
242+
"[-0.019058228, 0.04058838, -0.15222168, -0.06842041, -0.012130737, -0.07128906, -0.04534912, 0.00522\n"
243+
]
244+
}
245+
],
246+
"source": [
247+
"text2 = (\n",
248+
" \"LangGraph is a library for building stateful, multi-actor applications with LLMs\"\n",
249+
")\n",
250+
"two_vectors = embeddings.embed_documents([text, text2])\n",
251+
"for vector in two_vectors:\n",
252+
" print(str(vector)[:100]) # Show the first 100 characters of the vector"
253+
]
254+
},
255+
{
256+
"cell_type": "markdown",
257+
"id": "98785c12",
124258
"metadata": {},
125-
"outputs": [],
126259
"source": [
127-
"embeddings = NomicEmbeddings(model=\"nomic-embed-text-v1.5\", dimensionality=256)\n",
260+
"## API Reference\n",
128261
"\n",
129-
"embeddings.embed_query(\"My query to look up\")"
262+
"For detailed documentation on `NomicEmbeddings` features and configuration options, please refer to the [API reference](https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html).\n"
130263
]
131264
}
132265
],
@@ -146,7 +279,7 @@
146279
"name": "python",
147280
"nbconvert_exporter": "python",
148281
"pygments_lexer": "ipython3",
149-
"version": "3.10.5"
282+
"version": "3.9.6"
150283
}
151284
},
152285
"nbformat": 4,

docs/src/theme/FeatureTables.js

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,12 @@ const FEATURE_TABLES = {
340340
package: "langchain-cohere",
341341
apiLink: "https://api.python.langchain.com/en/latest/embeddings/langchain_cohere.embeddings.CohereEmbeddings.html#langchain_cohere.embeddings.CohereEmbeddings"
342342
},
343+
{
344+
name: "Nomic",
345+
link: "cohere",
346+
package: "langchain-nomic",
347+
apiLink: "https://api.python.langchain.com/en/latest/embeddings/langchain_nomic.embeddings.NomicEmbeddings.html#langchain_nomic.embeddings.NomicEmbeddings"
348+
},
343349
]
344350
},
345351
document_retrievers: {

0 commit comments

Comments
 (0)