Replies: 1 comment
-
This paper answered all my questions |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First a big shout out and thanks to Langchain you got me here
I'm working with two distinct types of data sets: 1) unstructured data, such as content from PDFs, and 2) structured data from a large Excel file. Given my limited experience in this domain, I'm grappling with how best to architecturally handle and utilize these data sets for efficient querying and task completion.
For the unstructured data, I've seen approaches where tools like LangChain and third party integrations are used to vectorize the content into vector databases like Pinecone, or even directly with Neo4j through its LangChain vectorized integration. This method seems promising for making unstructured data more interactive, allowing me to use tools like OpenAI to "chat" with my PDFs.
As for the structured data, constructing a knowledge graph with Neo4j . Has involve importing the Excel data to create nodes and relationships that effectively represent my knowledge base. Intriguingly, there are also modules that enable interactive querying of this structured knowledge graph using OpenAI, similar to the unstructured data.
My architectural dilemma revolves around how best to combine or manage these data sets for optimal querying and task fulfillment. Specifically, I'm weighing the options of:
Integrating both unstructured and structured data into a single graph database like Neo4j, which could simplify data management and querying. OR keeping the databases separate for each type of data, which might allow for more specialized processing and optimization for each data type.Furthermore, I'm pondering the sequence in which to direct questions or tasks to these data sets, whether it should be from vectorized unstructured data to the knowledge graph of structured data, or vice versa.
I've successfully managed to code and create datasets in different vector stores, experimenting with various LLMs and setups. This has given me a solid foundation to explore the integration and utilization of both structured and unstructured data. Layered on top of this technical groundwork, I'm now faced with a strategic decision: whether to combine these datasets and, if so, how to best leverage them for high-level success scenarios others have encountered.
Moreover, I'm contemplating the method of querying these datasets through LLMs. The question is whether a simple sequential chain approach is most effective, where a query is passed from one LLM setup (and its associated dataset) to the next in a linear fashion, or if a more advanced strategy involving agents—each equipped with its own specialized body of knowledge—is preferable. This consideration includes evaluating the potential benefits of such agents working either independently or in concert to provide more nuanced, accurate, and comprehensive responses.
Given the complexity of these considerations I'm keen to understand the high-level success scenarios others have achieved under similar circumstances. I'm looking for guidance on whether the integration of datasets and the strategic use of LLMs in querying can enhance the effectiveness of my project, and if so, how best to structure this integration. The choice between a sequential querying chain and the deployment of knowledgeable agents is particularly pressing, as it could significantly influence the project's architectural framework and its ultimate success.
Thanks in advance and also thanks to the following articles
introduction-to-pydantic-and-langchain-document-tagging
knowledge-graphs-from-text
Deep dive into Neo4j Langchain vector Index
Beta Was this translation helpful? Give feedback.
All reactions