NoIndexException: Index not found when initializing Chroma from a persisted directory #3030

murasz · 2023-04-17T20:14:33Z

murasz
Apr 17, 2023

I am facing a problem when trying to use the Chroma vector store with a persisted index. I have already loaded a document, created embeddings for it, and saved those embeddings in Chroma. The script ran perfectly with LLM and also created the necessary files in the persistence directory (.chroma\index). The files include:

chroma-collections.parquet
chroma-embeddings.parquet
id_to_uuid_3508d87c-12d1-4bbe-ae7f-69a0ec3c6616.pkl
index_3508d87c-12d1-4bbe-ae7f-69a0ec3c6616.bin
index_metadata_3508d87c-12d1-4bbe-ae7f-69a0ec3c6616.pkl
uuid_to_id_3508d87c-12d1-4bbe-ae7f-69a0ec3c6616.pkl

However, when I try to initialize the Chroma instance using the persist_directory to utilize the previously saved embeddings, I encounter a NoIndexException error, stating "Index not found, please create an instance before querying".

Here is a snippet of the code I am using in a Jupyter notebook:

# Section 1
import os
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains.question_answering import load_qa_chain

# Load environment variables
%reload_ext dotenv
%dotenv info.env
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Section 2 - Initialize Chroma without an embedding function
persist_directory = '.chroma\\index'
db = Chroma(persist_directory=persist_directory)

# Section 3
# Load chat model and question answering chain
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=.5, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="stuff")

# Section 4
# Run the chain on a sample query
query = "The Question - Can you also cite the information you give after your answer?"
docs = db.similarity_search(query)
response = chain.run(input_documents=docs, question=query)
print(response)

Please help me understand what might be causing this problem and suggest possible solutions. Additionally, I am curious if these pre-existing embeddings could be reused without incurring the same cost for generating Ada embeddings again, as the documents I am working with have lots of pages. Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NoIndexException: Index not found when initializing Chroma from a persisted directory #3030

{{title}}

Replies: 0 comments

Select a reply

NoIndexException: Index not found when initializing Chroma from a persisted directory #3030

murasz Apr 17, 2023

Replies: 0 comments

murasz
Apr 17, 2023