[Question]: Embedding before index? #8411

Data-drone · 2023-09-10T08:40:36Z

Data-drone
Sep 10, 2023

Question Validation

I have searched both the documentation and discord for an answer.

Question

What is the process if I want to externally embed a cache of documents before constructing an index on top?

Or is it recommended to use llama_index structures to do all that and just plug in a VectorDB?

@dosu-bot · 2023-09-10T08:49:50Z

dosubot[bot]
bot Sep 10, 2023

🤖

Hello,

Based on your question, it seems you're interested in embedding a cache of documents before constructing an index on top. In LlamaIndex, you can indeed use the provided structures for all operations and integrate a VectorDB. LlamaIndex offers multiple integration points with vector stores or vector databases. It can use a vector store itself as an index, and this index can store documents and be used to answer queries. LlamaIndex can also load data from vector stores, similar to any other data connector. This data can then be used within LlamaIndex data structures.

Here is an example of how to use a custom vector store for index construction and querying:

from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores import EpsillaVectorStore

# construct vector store and customize storage context
storage_context = StorageContext.from_defaults(
    vector_store = EpsillaVectorStore(client=epsilla_client)
)

# Load documents and build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

In this example, EpsillaVectorStore is used as the vector store. You can replace it with any other supported vector store based on your requirements. The VectorStoreIndex is constructed upon any collection of documents, and the vector store within the index is used to store embeddings for the input text chunks. Once constructed, the index can be used for querying.

LlamaIndex supports a wide range of vector stores including but not limited to ElasticsearchStore, RedisVectorStore, RocksetVectorStore, FaissVectorStore, PineconeVectorStore, WeaviateVectorStore, QdrantVectorStore, CassandraVectorStore, ChromaVectorStore, MetalVectorStore, OpensearchVectorStore, MilvusVectorStore, DeepLakeVectorStore, MyScaleVectorStore, LanceDBVectorStore, TairVectorStore, DocArrayInMemoryVectorStore, DocArrayHnswVectorStore, SupabaseVectorStore, PGVectorStore, ZepVectorStore, AwaDBVectorStore, BagelVectorStore, Neo4jVectorStore, CognitiveSearchVectorStore, and EpsillaVectorStore. You can choose any of these based on your specific needs and requirements.

For more information, you can refer to the LlamaIndex documentation.

I hope this helps! If you have any more questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

logan-markewich · 2023-09-10T18:18:16Z

logan-markewich
Sep 10, 2023
Maintainer

You can construct the documents or nodes ahead of time and attach embeddings

node = TextNode(text="...", embedding=[...])

document = Document(text="...", embedding=[...])

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Embedding before index? #8411

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

[Question]: Embedding before index? #8411

Data-drone Sep 10, 2023

Question Validation

Question

Replies: 2 comments

dosubot[bot] bot Sep 10, 2023

Sources

logan-markewich Sep 10, 2023 Maintainer

Data-drone
Sep 10, 2023

dosubot[bot]
bot Sep 10, 2023

logan-markewich
Sep 10, 2023
Maintainer