Skip to content

A practical demonstration of migrating vector search workloads from Elasticsearch to Qdrant, including sample data, migration scripts, and validation tools.

License

Notifications You must be signed in to change notification settings

mahimairaja/blog-migrate-elasticsearch-to-qdrant

Repository files navigation

Elasticsearch to Qdrant Migration Guide

A practical demonstration of migrating vector search workloads from Elasticsearch to Qdrant, including sample data, migration scripts, and validation tools.

Overview

If you're running vector search on Elasticsearch and starting to feel the limitations—high memory usage, increasing latencies, or scaling concerns—this repository shows you how to migrate to Qdrant, a purpose-built vector database.

This isn't just documentation. It's a working example you can run locally, experiment with, and adapt to your production needs.

Features

  • Docker Compose setup for running both Elasticsearch and Qdrant locally
  • Sample movie dataset with embeddings for testing
  • Python scripts for populating Elasticsearch and validating Qdrant
  • Migration workflow using Qdrant's official Docker-based migration tool

Prerequisites

  • Docker and Docker Compose
  • Python 3.8+
  • OpenAI API key (for generating embeddings)

Quick Start

1. Spin Up the Infrastructure [Optional]

You can use this for local development and testing.

$ docker-compose up -d

This starts:

  • Elasticsearch on localhost:9200
  • Kibana on localhost:5601 (for viewing Elasticsearch data)
  • Qdrant on localhost:6333 (API) and localhost:6334 (gRPC)

2. Populate Elasticsearch with Sample Data

First, install the Python dependencies:

$ uv sync

Create a .env file with your OpenAI API key:

OPENAI_API_KEY=<your_openai_api_key>
ELASTICSEARCH_URL=http://localhost:9200
INDEX_NAME=movies
EMBEDDING_MODEL=text-embedding-3-small

Run the data population script to populate the Elasticsearch index with the sample data:

$ uv run python optional/write-to-elasticsearch.py

This creates a movies index in Elasticsearch with 5 sample movies, each with a 1536-dimensional embedding generated from the movie title.

3. Run the Migration

Pull the Qdrant migration tool:

docker pull registry.cloud.qdrant.io/library/qdrant-migration

Run the migration from Elasticsearch to Qdrant:

docker run --net=host --rm -it \
  registry.cloud.qdrant.io/library/qdrant-migration elasticsearch \
  --elasticsearch.url 'http://localhost:9200' \
  --elasticsearch.index 'movies' \
  --qdrant.url 'http://localhost:6334' \
  --qdrant.collection 'movies-migrated' \
  --migration.batch-size 64

4. Validate the Migration

Update your .env file with Qdrant details:

QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=movies-migrated

Run the validation script:

python test-qdrant-client.py

This searches for "The Shawshank Redemption" in your migrated Qdrant collection and returns the top 3 similar movies.

Project Structure

.
├── docker-compose.yml              # Infrastructure setup
├── optional/
│   ├── sample_data.json              # 5 sample movies
│   └── write-to-elasticsearch.py  # Populate Elasticsearch
├── test-qdrant-client.py               # Validate Qdrant search
└── README.md

Understanding the Migration

Data Flow

  1. Source (Elasticsearch): Movies stored with metadata (title, genre, director, year) and dense vectors (1536 dimensions, cosine similarity)
  2. Migration Tool: Streams data in batches from Elasticsearch to Qdrant
  3. Target (Qdrant): Same data now optimized for vector-native search with better performance characteristics

Key Configuration Points

Elasticsearch Mapping (write-to-elasticsearch.py):

  • Uses dense_vector type with cosine similarity
  • 1536 dimensions (OpenAI's text-embedding-3-small)
  • Indexed for vector search

Migration Command:

  • --migration.batch-size 64: Adjust based on your memory and network
  • --elasticsearch.url: Your source Elasticsearch instance
  • --qdrant.url: Your target Qdrant instance (port 6334 for gRPC)

Qdrant Validation (test-qdrant-client.py):

  • Converts queries to embeddings using the same model
  • Searches using query_points with semantic similarity
  • Returns results with full payload metadata

Adapting to Your Use Case

For Larger Datasets

If you're migrating millions of vectors:

  1. Increase batch size in the migration command (try 256 or 512)
  2. Run migration on a host with good network connectivity to both systems
  3. Monitor memory usage on both source and target during migration
  4. Consider quantization in Qdrant to reduce memory footprint

For Production Migrations

  1. Test with a subset first: Migrate a single collection to validate the process
  2. Plan for deltas: Handle new writes that happen during migration
  3. Set up monitoring: Track query latency and error rates in Qdrant
  4. Keep fallback options: Maintain your Elasticsearch instance in read-only mode for a period after cutover

For Different Embedding Models

Update the EMBEDDING_MODEL and VECTOR_SIZE in your scripts:

EMBEDDING_MODEL = "text-embedding-ada-002"  # or your model
VECTOR_SIZE = 1536  # adjust to match your model's dimensions

Common Issues

Migration fails with connection errors:

  • Ensure both Elasticsearch and Qdrant are running: docker-compose ps
  • Check network connectivity: curl http://localhost:9200 and curl http://localhost:6333

Vector dimension mismatch:

  • Verify that your Elasticsearch vectors match the declared dimensions
  • Check that you're using the same embedding model for queries

Slow migration:

  • Increase batch size in the migration command
  • Run on a host closer to your Elasticsearch instance
  • Check that neither system is resource-constrained

Related Resources

For the full article on why and how to migrate from Elasticsearch to Qdrant, check out the accompanying blog post: Moving from Elasticsearch to Qdrant: What You Need to Know

License

MIT License - feel free to use this as a starting point for your own migration projects.

Contributing

Found an issue or have a suggestion? Open an issue or submit a pull request. Real-world migration experiences are especially valuable.

About

A practical demonstration of migrating vector search workloads from Elasticsearch to Qdrant, including sample data, migration scripts, and validation tools.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages