Opensearch Nested Hybrid Search POC

This repo contains a minimal reproduction of the issues with hybrid search with nested fields.

For the basic example, see basic_example.py

We are evaluating against the latest version:

docker run -d \
  --name danswer_opensearch \
  -p 9200:9200 -p 9600:9600 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=D@nswer_1ndex" \
  opensearchproject/opensearch:latest

The gist of the problem is: To be able to do RAG well, often times it's useful to find specific sections of documents to pass to the LLM. Passing in whole documents has problems such as

cost
less accurate retrieval at scale due to inability to do hybrid search on chunks
lost context since many embedding models don't handle massive context
higher probability of LLM hallucinating or picking up on wrong sections/documents

A temporary workaround of making chunks as documents in opensearch has many other downsides

latency/complexity, fetching a full document or adjacent chunks is now multiple calls
cases where retrieving 50 hits could actually be less than 50 as multiple of them are chunks of the same doc
some pipelines work at a document level and doesn't support this hack of treating chunks as documents
updates to any field in the actual document is now reflected as a change in multiple opensearch documents
implementation complexity and parallelization may cause unexpected issues that we don't fully see yet

This seems to be a common request, mentioned in a couple recent issues including:

opensearch-project/neural-search#718

opensearch-project/ml-commons#2612

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
basic_example.py		basic_example.py
examples.py		examples.py
full_example.py		full_example.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Opensearch Nested Hybrid Search POC

About

Releases

Packages

Languages

danswer-ai/Opensearch-POC

Folders and files

Latest commit

History

Repository files navigation

Opensearch Nested Hybrid Search POC

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages