feed.xml

<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="https://kolchfa-aws.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://kolchfa-aws.github.io/" rel="alternate" type="text/html" /><updated>2024-09-12T21:46:35+00:00</updated><id>https://kolchfa-aws.github.io/feed.xml</id><title type="html">OpenSearch</title><subtitle></subtitle><entry><title type="html">OpenSearch Project Roadmap 2024–2025</title><link href="https://kolchfa-aws.github.io/blog/opensearch-project-roadmap-2024-2025/" rel="alternate" type="text/html" title="OpenSearch Project Roadmap 2024–2025" /><published>2024-09-12T00:00:00+00:00</published><updated>2024-09-12T21:41:34+00:00</updated><id>https://kolchfa-aws.github.io/blog/opensearch-project-roadmap-2024-2025</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/opensearch-project-roadmap-2024-2025/">&lt;p&gt;OpenSearch is an open-source product suite comprising a search engine, an ingestion system, language clients, and a user interface for analytics. Our goal at the &lt;a href=&quot;https://github.com/opensearch-project&quot;&gt;OpenSearch Project&lt;/a&gt; is to make OpenSearch the preferred open-source solution for search, vector databases, log analytics, and security analytics and to establish it as the preferred backend for generative AI applications. OpenSearch contributors and maintainers are innovating in all these areas at a fast pace. With &lt;a href=&quot;https://metrics.opensearch.org/_dashboards/app/dashboards#/view/f1ad21c0-e323-11ee-9a74-07cd3b4ff414&quot;&gt;more than 1,400 unique contributors&lt;/a&gt; working across &lt;a href=&quot;https://github.com/orgs/opensearch-project/repositories?q=visibility%3Apublic+archived%3Afalse&quot;&gt;110+ public GitHub repositories&lt;/a&gt; on a daily basis, OpenSearch is a rapidly growing open-source project.&lt;/p&gt;

&lt;p&gt;To steer the project’s development effectively, we have revamped the project roadmap to provide better transparency into both short- and long-term enhancements. This will help the community provide feedback more easily, assist with prioritization, foster collaboration, and ensure that contributor efforts align with the community’s needs. To achieve this, the OpenSearch Project recently &lt;a href=&quot;https://github.com/opensearch-project/.github/issues/196&quot;&gt;introduced a new public process&lt;/a&gt; for developing a &lt;strong&gt;theme-based, community-driven&lt;/strong&gt; &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/206&quot;&gt;&lt;strong&gt;OpenSearch roadmap board&lt;/strong&gt;&lt;/a&gt;, which we are excited to share today. The roadmap board will provide the community with visibility into the project’s high-level technological direction and will facilitate the sharing of feedback.&lt;/p&gt;

&lt;p&gt;In this blog post, we will outline the OpenSearch roadmap for 2024–2025, focusing on the key areas that foster innovation among OpenSearch contributors. These innovation areas are categorized into the following nine main themes:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-1-vector-database-and-generative-ai&quot;&gt;Vector Database and Generative AI&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-2-search&quot;&gt;Search&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-3-ease-of-use&quot;&gt;Ease of Use&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-4-observability-log-analytics-and-security-analytics&quot;&gt;Observability, Log Analytics, and Security Analytics&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-5-cost-performance-and-scalability&quot;&gt;Cost, Performance, and Scalability&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-6-stability-availability-and-resiliency&quot;&gt;Stability, Availability, and Resiliency&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-7-security&quot;&gt;Security&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-8-modular-architecture&quot;&gt;Modular Architecture&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;#roadmap-theme-9-releases-and-project-health&quot;&gt;Releases and Project Health&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the rest of this post, we will first &lt;a href=&quot;#roadmap-summary&quot;&gt;summarize the key innovation areas&lt;/a&gt; in the context of the roadmap themes. For readers interested in a comprehensive understanding, we have a &lt;a href=&quot;#roadmap-details&quot;&gt;section dedicated to each theme&lt;/a&gt; containing information about key innovations and links to the relevant GitHub RFCs/METAs for the features.&lt;/p&gt;

&lt;h2 id=&quot;roadmap-summary&quot;&gt;Roadmap summary&lt;/h2&gt;

&lt;p&gt;As a technology, OpenSearch innovates in three main areas: search, streaming data, and vectors. Search use cases employ lexical and semantic means to match end user queries to the catalog of information, stored in indexes, that drives your application. &lt;em&gt;Streaming data&lt;/em&gt; includes a wide range of real-time data types, such as raw log data, observability trace data, security event data, metric data, and other event data like Internet of Things (IoT) events. Vector data includes the outputs of embedding-generating large language models (LLMs), vectors produced by machine learning (ML) models, and encodings of media like audio and video.&lt;/p&gt;

&lt;p&gt;OpenSearch’s roadmap is aligned vertically in some cases and horizontally in others, depending on the workloads it supports. Features relevant to &lt;strong&gt;search workloads&lt;/strong&gt; are described in &lt;a href=&quot;#roadmap-theme-1-vector-database-and--generative-ai&quot;&gt;theme 1&lt;/a&gt; and &lt;a href=&quot;#roadmap-theme-2-search&quot;&gt;theme 2&lt;/a&gt;. Features relevant to &lt;strong&gt;vector workloads&lt;/strong&gt; are described in &lt;a href=&quot;#roadmap-theme-1-vector-database-and--generative-ai&quot;&gt;theme 1&lt;/a&gt;. Features relevant to &lt;strong&gt;streaming data workloads&lt;/strong&gt; are described in &lt;a href=&quot;#_roadmap-theme-4-observability-log-analytics-and-security-analytics&quot;&gt;theme 4&lt;/a&gt;. Features relevant to &lt;strong&gt;all three workload types&lt;/strong&gt; are described in &lt;a href=&quot;#roadmap-theme-3-ease-of-use&quot;&gt;theme 3&lt;/a&gt;, and &lt;a href=&quot;#roadmap-theme-5-cost-performance-and-scalability&quot;&gt;themes 5–9&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Theme 1 (Vector Database and Generative AI)&lt;/strong&gt; is centered on price performance and ease of use for vector workloads, creating new features that help reduce costs through quantization, disk storage, and GPU utilization. Ease-of-use features will make it easier to get started with and use embedding vectors to improve search results. &lt;strong&gt;Theme 2 (Search)&lt;/strong&gt; focuses on enhancing the query capabilities of core search, building a new query engine with query planning, tight integrations with Lucene innovations, improving search relevance, and searching across external data sources with Data Prepper. &lt;strong&gt;Theme 3 (Ease of Use)&lt;/strong&gt; encompasses building a richer dashboard experience and serverless dashboards that feature simplified installation, migration, and multi-data-source support. &lt;strong&gt;Theme 4 (Observability, Log Analytics, and Security Analytics)&lt;/strong&gt; emphasizes integrating with industry standards, such as OpenTelemetry, to unify workflows across metrics, logs, and traces; providing a richer SQL-PPL experience; positioning Discover as the main entry point for analytical workflows; improving Data Prepper for various analytics use cases; and developing well-integrated security analytics workflows. &lt;strong&gt;Theme 5 (Cost, Performance, and Scalability)&lt;/strong&gt; includes improving core search engine performance, scaling shard management, providing context-aware templates for different workloads, moving to remote-store-backed tiered storage, and scaling cluster management. &lt;strong&gt;Theme 6 (Stability, Availability, and Resiliency)&lt;/strong&gt; includes features involving query visibility, query resiliency, workload management, and cluster management resilience. &lt;strong&gt;Theme 7 (Security)&lt;/strong&gt; centers on providing constructs that are secure by default and adopting a streamlined plugin security model as the plugin ecosystem grows. &lt;strong&gt;Theme 8 (Modular Architecture)&lt;/strong&gt; involves modularizing the OpenSearch codebase to suit different deployments and moving to a decoupled, service-oriented architecture. &lt;strong&gt;Theme 9 (Releases and Project Health)&lt;/strong&gt; dives into initiatives for faster automated releases, with streamlined continuous integration/continuous delivery (CI/CD) and metrics dashboards to measure community health and operations.&lt;/p&gt;

&lt;h2 id=&quot;roadmap-details&quot;&gt;Roadmap details&lt;/h2&gt;

&lt;p&gt;In the following sections, we cover each theme in detail. You can find the associated RFCs and METAs on the &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/206/views/11&quot;&gt;new roadmap board&lt;/a&gt;. We would love for you to get involved with the OpenSearch community by contributing to innovation in these areas or by providing your feedback.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-1-vector-database-and-generative-ai&quot;&gt;Roadmap Theme 1: Vector Database and Generative AI&lt;/h3&gt;

&lt;p&gt;The OpenSearch roadmap includes several innovations to OpenSearch’s vector database and ML functionality. These innovations focus on enhancing vector search and making ML-powered applications and integrations more flexible and easier to build. AI advancements are transforming the search experience for end users of all skill levels. By integrating AI models, OpenSearch delivers more relevant search results to all users. Experienced builders can apply additional techniques such as query rewriting, result reranking, personalization, semantic search, summarization, and retrieval-augmented generation (RAG) in order to further enhance search result accuracy. Many of these techniques rely on a vector database. With the current rise of generative AI, OpenSearch is gaining traction as a vector database solution powered by k-NN indexes. Our planned innovations will make OpenSearch vector database features easy to use and more efficient while lowering operational costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector search price performance&lt;/strong&gt;: To further improve the price performance of vector search, we are planning several key initiatives, such as offering a &lt;a href=&quot;https://github.com/opensearch-project/k-NN/issues/1779&quot;&gt;disk-optimized approximate nearest neighbor (ANN) solution&lt;/a&gt; that uses quantized vectors to provide up to 32x compression and a 70% cost reduction, while still maintaining recall and requiring no pretraining. We are optimizing memory footprint functionality using techniques like iterative product quantization (PQ) and data types like &lt;a href=&quot;https://github.com/opensearch-project/k-NN/issues/1764&quot;&gt;binary vectors&lt;/a&gt;. Additionally, we are implementing smart routing capabilities that allow you to organize indexes by semantic similarity for doubled query throughput, enabling multi-tenancy and smart filtering for high-recall ANN search at the tenant level and using GPUs to significantly accelerate index build times for k-NN indexes, with a 10–40x better price/performance ratio compared to CPU-based infrastructure. We also plan to further lower costs by storing full-precision vectors on cold storage systems like Amazon Simple Storage Service (Amazon S3). The smart routing capabilities will place neighboring embeddings on the same node, improving query efficiency. The multi-tenancy and smart filtering features will cater to use cases requiring granular filtering of large datasets with stringent recall targets, enhancing efficiency and cost effectiveness. OpenSearch already provides memory footprint reduction techniques, such as PQ using HNSWPQ and IVFPQ and scalar quantization (SQ) in byte and fp16 formats. We are now investing in additional techniques to further compress vectors while maintaining recall similar to that provided when using full-precision vectors. The upcoming innovations are expected to significantly improve the price performance of vector search, making it more accessible and cost effective for a wide range of applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Out-of-the-box (OOB) experience&lt;/strong&gt;: OpenSearch aims to enhance the OOB experience of vector search. While the community appreciates the wide variety of tools and algorithms provided for tuning clusters according to workloads, having too many options can make it challenging for users to choose the right configuration. To address this, OpenSearch’s AutoTune feature will recommend the optimal hyperparameter values for a given workload based on metrics such as recall, latency, and throughput. Additionally, we plan to introduce smarter defaults to automatically tune indexing threads and enable &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/6798&quot;&gt;concurrent segment search&lt;/a&gt; based on traffic patterns and hardware resources. By simplifying the tuning process and providing intelligent defaults, OpenSearch will make it easier for users to achieve optimal performance without the need for extensive manual configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Neural search&lt;/strong&gt;: Ingestion performance has been a significant barrier to the adoption of neural search, especially for users who work with large-scale datasets. To address this, in version 2.16 we introduced online batch inference support that reduces communication overhead. We will further enhance ingestion performance by supporting &lt;a href=&quot;https://github.com/opensearch-project/ml-commons/issues/2891&quot;&gt;offline batch inference&lt;/a&gt;. By using the offline batch processing capabilities of inference services like Amazon SageMaker, Amazon Bedrock, OpenAI, and Cohere, users will be able to directly process batch requests from preferred storage locations such as Amazon S3. This will significantly boost ingestion throughput while simultaneously reducing costs. Offline batch inference eliminates real-time communication with remote services, unlocking the full potential of neural search. We want to allow users to efficiently process large datasets and use advanced search capabilities at scale without compromising performance or incurring excessive costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Neural sparse search&lt;/strong&gt;: Neural sparse search provides yet another semantic search option for builders. Sparse encoding models create a reduced token set in which related tokens have semantically similar weights. A neural sparse index uses Lucene’s inverted index to store tokens and weights, providing fast, token-based recall and fast scoring through dot products. The OpenSearch 2.13 release included self-pretrained &lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v2-distill&quot;&gt;sparse encoders&lt;/a&gt; on Hugging Face. Further optimizations will enhance both model effectiveness and efficiency:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;More powerful models&lt;/strong&gt;: OpenSearch will continue tuning neural sparse models to boost both relevance and efficiency.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Weight quantization&lt;/strong&gt;: Compressing the payload of sparse term weights will considerably reduce index sizes, providing an economic solution comparable to BM25.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Multilingual support&lt;/strong&gt;: In addition to English, neural sparse models will support at least three more languages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Development process for ML-powered search&lt;/strong&gt;: Enhancing the builder experience and streamlining the development process for ML-powered search is our top priority. To achieve this, we will introduce a &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/4755&quot;&gt;low-code search flow builder&lt;/a&gt; within OpenSearch Dashboards, enabling the creation and customization of AI-enhanced search capabilities with minimal coding effort. Additionally, we will extend both the model-serving framework in ML Commons and its search pipeline functionality, allowing users to seamlessly integrate various third-party models, such as OpenAI or Cohere embedding models. This will provide greater flexibility and enable builders to use the most suitable solution for their specific use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ML connector certification program&lt;/strong&gt;: To keep up with the rapid evolution of ML and the emergence of new inference services, we are launching a self-service certification program through which the community and service providers can contribute blueprints for their preferred inference models. OpenSearch already provides OOB blueprints for popular services such as &lt;a href=&quot;https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/cohere_connector_embedding_blueprint.md&quot;&gt;Cohere&lt;/a&gt; and &lt;a href=&quot;https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/openai_connector_embedding_blueprint.md&quot;&gt;OpenAI&lt;/a&gt;. However, adding a new blueprint requires a manual code review and merging process, as shown in &lt;a href=&quot;https://github.com/opensearch-project/ml-commons/pull/1991&quot;&gt;this pull request for adding a blueprint for the Cohere chat model&lt;/a&gt;. The new certification program encourages users to submit blueprints for their favorite models and have them verified and approved through automated pipelines. Once approved, these blueprints will be distributed alongside OpenSearch version releases, benefiting the entire community and ensuring that OpenSearch remains current with the latest advancements in the field.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenSearch Assistant Toolkit&lt;/strong&gt;: The &lt;a href=&quot;https://github.com/opensearch-project/dashboards-assistant/issues/18&quot;&gt;OpenSearch Assistant Toolkit&lt;/a&gt; helps create AI-powered assistants for OpenSearch Dashboards. Its main goal is to simplify interactions with OpenSearch features and enhance their accessibility. For example, using natural language queries allows for interaction with OpenSearch without the need to learn a custom query language. The toolkit empowers OpenSearch users to build their own AI-powered applications tailored to their customized use cases. It contains built-in skills that will allow builders to use LLMs to create new visualizations based on their data, summarize their data, and help configure anomaly detectors. The OpenSearch Assistant will guide both novice and experienced users, simplifying complex tasks and making it easier to effectively navigate OpenSearch. For more information, see &lt;a href=&quot;https://www.youtube.com/watch?v=VTiJtGI2Sr4&quot;&gt;this video&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-2-search&quot;&gt;Roadmap Theme 2: Search&lt;/h3&gt;

&lt;p&gt;OpenSearch is designed to offer a highly scalable, reliable, and fast search experience, built to handle large-scale data environments while delivering accurate and relevant results. The community is committed to evolving OpenSearch’s core search capabilities to meet modern workload standards and business needs. As part of our ongoing investments in the core search engine, the roadmap focuses on the following key advancements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced query capabilities&lt;/strong&gt;: The OpenSearch community continues to push the boundaries of query capabilities. Features like &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/1133&quot;&gt;derived fields&lt;/a&gt;, &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/5639&quot;&gt;wildcard fields&lt;/a&gt;, and &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/pull/14774&quot;&gt;bitmap filtering&lt;/a&gt; offer greater flexibility in search queries, allowing users to extract more precise insights from their data. The adoption of new ranking techniques and algorithms such as &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/3996&quot;&gt;combined_fields (BM25F)&lt;/a&gt; improves search result relevance, contributing to a more refined search experience. We plan to introduce query &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/10250&quot;&gt;categorization&lt;/a&gt; and &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/11429&quot;&gt;insights&lt;/a&gt;, providing fine-grained monitoring to identify problematic queries, diagnose bottlenecks, and optimize performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sophisticated query engine&lt;/strong&gt;: We are committed to further enhancing the core query engine, with plans to integrate advanced capabilities from the &lt;a href=&quot;https://github.com/opensearch-project/sql&quot;&gt;SQL plugin&lt;/a&gt; directly into OpenSearch. This effort is aimed at unifying query planning and distributed execution across different query languages, bringing OpenSearch query domain-specific language (DSL), SQL, and &lt;a href=&quot;https://github.com/opensearch-project/sql/tree/main/ppl&quot;&gt;Piped Processing Language (PPL)&lt;/a&gt; into closer parity. This integration will support more sophisticated query optimizations and distributed executions, unlocking more efficient data processing at scale. The introduction of &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/15185&quot;&gt;join support&lt;/a&gt; in the core engine will offer users a powerful method of combining and analyzing datasets. These capabilities are crucial for those dealing with relational-style data, enabling greater query complexity without sacrificing performance. A key step in improving the query engine is separating the search coordinator logic from the shard-level Lucene search logic. This separation will allow the search coordinator to focus on complex distributed logic (including joins) and process results from a variety of data sources (including future support for non-Lucene data sources like relational databases and Parquet files).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query performance&lt;/strong&gt;: In terms of broader query engine speed and scale, OpenSearch is moving toward &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/15237&quot;&gt;writer/searcher separation&lt;/a&gt;, which will provide a more modular and adaptable framework for managing indexing and search processes. Efforts like &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/15257&quot;&gt;Star Tree index&lt;/a&gt; and the introduction of &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/10684&quot;&gt;Protobuf&lt;/a&gt; for search execution and communication further reduce costs and improve performance, enabling the platform to efficiently handle even larger data volumes. The roadmap includes several key advancements in query processing, such as improving &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/13566&quot;&gt;range query performance&lt;/a&gt; through &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/pull/13788&quot;&gt;approximation&lt;/a&gt; techniques, accelerating aggregations such as date histograms, &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/15136&quot;&gt;enhancing concurrent segment search&lt;/a&gt;, developing multi-level request caching with &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/13566&quot;&gt;tiered caching&lt;/a&gt;, and integrating Rust and SIMD operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contributions to core dependencies&lt;/strong&gt;: As part of our community-driven effort to optimize OpenSearch’s underlying architecture, we continue to contribute to the Lucene search library. A notable example includes ongoing work on &lt;a href=&quot;https://github.com/apache/lucene/pull/13521&quot;&gt;BKD doc ID encoding&lt;/a&gt;, which will improve indexing and query performance. These contributions ensure that OpenSearch remains on the cutting edge of search technology, benefiting from the latest Lucene advancements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid search enhancements&lt;/strong&gt;: OpenSearch continues to enhance search relevance through hybrid search, which combines text and vector queries. In addition to the existing score-based normalization and combination techniques, OpenSearch plans to launch a rank-based approach called &lt;a href=&quot;https://github.com/opensearch-project/neural-search/issues/865&quot;&gt;&lt;em&gt;reciprocal rank fusion&lt;/em&gt;&lt;/a&gt;. This approach will combine search results based on their rank, allowing users to make informed choices by considering the score distribution. Moreover, hybrid search will be augmented with &lt;a href=&quot;https://github.com/opensearch-project/neural-search/issues/280&quot;&gt;pagination and profiling capabilities&lt;/a&gt;, enabling users to debug scores at different stages of score normalization and combination. These enhancements will further improve the search experience, providing more accurate and insightful results while offering greater transparency into the ranking process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User behavior insights&lt;/strong&gt;: Search users are turning to AI to improve search relevance and reduce manual effort. However, it is challenging to train and tune opaque models without a data feedback loop. To help users gain search insights and build a tuning feedback loop, we are launching &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/12084&quot;&gt;User Behavior Insights&lt;/a&gt; (UBI). UBI consists of a standard data schema, server-side collection components, query-side collection components, and analytics dashboards. This will provide a standard way for users to record and analyze search behavior and train and fine-tune models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ingestion from other databases&lt;/strong&gt;: OpenSearch can ingest data from Amazon DynamoDB and Amazon DocumentDB databases using Data Prepper, which enables using OpenSearch as a search engine for these sources. Data Prepper is continuing to add support for new database types, with the immediate goal of supporting SQL databases. With this new source type, the community can search even more databases, &lt;a href=&quot;https://github.com/opensearch-project/data-prepper/issues/4561&quot;&gt;including Amazon Aurora and Amazon Relational Database Service (Amazon RDS)/MySQL databases&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-3-ease-of-use&quot;&gt;Roadmap Theme 3: Ease of Use&lt;/h3&gt;

&lt;p&gt;OpenSearch Dashboards provides an intuitive interface and powerful visualization and analytics tools for OpenSearch users. Additionally, OpenSearch Dashboards contains a rich set of features and tools that enable advanced analytics use cases. These easy-to-use tools simplify data exploration, monitoring, and management for both OpenSearch administrators and end users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Richer dashboard experience&lt;/strong&gt;: We are planning dynamic and interactive features to make data visualization more intuitive and powerful. Additionally, we aim to enable &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388&quot;&gt;multiple data sources&lt;/a&gt;, allowing seamless integration and operations, such as cross-source alerting, within a unified interface. As part of this effort, we plan to introduce a &lt;em&gt;dataset&lt;/em&gt; concept, which extends the index pattern concept in OpenSearch Dashboards and enables working with different types of data sources, such as relational databases or Prometheus. This will allow users to seamlessly access and visualize data from a variety of sources within the OpenSearch Dashboards interface. We are also introducing a &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/4615&quot;&gt;&lt;em&gt;workspace&lt;/em&gt;&lt;/a&gt; concept in OpenSearch Dashboards. Workspaces will streamline user workflows by providing curated vertical experiences for search, observability, and security analytics. Additionally, workspaces will enhance collaboration on workspace assets and improve data connections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Serverless dashboards and migration&lt;/strong&gt;: Our strategy for OpenSearch Dashboards also includes &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/5804&quot;&gt;decoupling release and distribution&lt;/a&gt; from the OpenSearch engine. We are aiming to allow OpenSearch Dashboards to run as a standalone application, independent from the OpenSearch installation. OpenSearch Dashboards will have its own authentication and access control based on workspaces, and we’ll provide options for using a dedicated database for OpenSearch Dashboards saved objects. To simplify configuration and customization, we envision implementing &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/7111&quot;&gt;one-click installation and setup&lt;/a&gt;, allowing users to get started quickly. We also plan to streamline &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/5877&quot;&gt;plugin management&lt;/a&gt; to enable users to extend OpenSearch Dashboards without restarting the application. We aim to develop a &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/5757&quot;&gt;migration toolkit&lt;/a&gt; to assist users in seamlessly transitioning data from older versions of OpenSearch Dashboards or other tools like Grafana. We’ll also implement an &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/7035&quot;&gt;interactive onboarding experience&lt;/a&gt; to guide new users through key features and setup steps. Additionally, we plan to integrate live help powered by generative AI, which will offer real-time assistance within the platform, and to enhance the platform’s resilience with improved health and status monitoring. We also plan to focus on improving the overall performance of OpenSearch Dashboards. This will include &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/4630&quot;&gt;optimizing the loading times&lt;/a&gt; of the application and visualizations, ensuring a smooth and responsive user experience. We will analyze the current performance bottlenecks and implement targeted optimizations to reduce latency and improve the responsiveness of OpenSearch Dashboards, especially when working with large or complex datasets.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-4-observability-log-analytics-and-security-analytics&quot;&gt;Roadmap Theme 4: Observability, Log Analytics, and Security Analytics&lt;/h3&gt;

&lt;p&gt;The OpenSearch Project continues to enhance its observability and security analytics capabilities. We are dedicated to creating a more cohesive and user-friendly experience while expanding functionality and improving performance. Our roadmap for 2024–2025 focuses on delivering a more unified, powerful, and intuitive experience while maintaining the cost effectiveness and scalability our users expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenTelemetry support&lt;/strong&gt;: OpenSearch has enhanced its observability features by incorporating support for the OpenTelemetry Protocol (OTLP), enabling the ingestion of metrics, logs, and traces. OTLP, a vendor-neutral protocol, standardizes telemetry data transmission, making it easier to send various types of observability data (traces, metrics, and logs) directly to OpenSearch. This integration with OpenTelemetry allows developers and operations teams to seamlessly ingest traces, metrics, and logs within a unified workflow, promoting a more efficient and standardized approach to collecting and analyzing observability data across complex, distributed systems. With robust support for OpenTelemetry and OTLP, OpenSearch offers a powerful platform for storing, analyzing, and visualizing essential observability data, simplifying system performance monitoring and issue troubleshooting across your entire infrastructure. To address the challenges of managing, monitoring, and analyzing traces, metrics, and logs, OpenSearch introduced a new &lt;a href=&quot;https://github.com/opensearch-project/simple-schema&quot;&gt;schema&lt;/a&gt; compatible with OpenTelemetry. This schema supports predefined dashboards through an &lt;a href=&quot;https://github.com/opensearch-project/opensearch-catalog/releases&quot;&gt;OpenSearch catalog&lt;/a&gt; for common systems like NGINX, HAProxy, and Kubernetes. Additionally, it enables cross-index querying of data containing shared structures from different telemetry data producers. OpenSearch is dedicated to continuously enhancing its schema to support emerging observability use cases and to develop more advanced correlation and alerting solutions. To further explore OpenSearch capabilities, see &lt;a href=&quot;https://github.com/opensearch-project/opentelemetry-demo?tab=readme-ov-file#running-this-demo&quot;&gt;this demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-effective, scalable analytics using Apache Spark&lt;/strong&gt;: Many community members are opting to store data on cost-optimized cloud storage outside of OpenSearch, either because it is cost prohibitive to store in OpenSearch or because the amount of data raises scalability concerns. To analyze data outside of OpenSearch, users were forced to switch between tools or create one-off ingestion pipelines. &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/14524&quot;&gt;OpenSearch’s integration with Apache Spark&lt;/a&gt; allows you to analyze data outside of OpenSearch, potentially reducing storage costs by up to 90%. OpenSearch has added support for &lt;a href=&quot;https://github.com/opensearch-project/opensearch-spark&quot;&gt;indexing data on cloud storage using Spark Streaming&lt;/a&gt;. Naturally, analysts want to join data across OpenSearch indexes and the cloud. Our upcoming Iceberg-compatible &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/8639&quot;&gt;table format&lt;/a&gt; will enable complex joins between OpenSearch indexes and cloud storage, enhancing your ability to analyze data across platforms. Additionally, the table enhances Iceberg by incorporating index capabilities, enabling the creation of search indexes on text fields, vector indexes, and geographical indexes. During query execution, these indexes will be automatically used to optimize full-text, neural, and geographical searches. Initially, this feature may be based on a customized Iceberg version that is fully compatible with Iceberg and named &lt;em&gt;OpenSearch Table&lt;/em&gt;. As this feature is integrated into Iceberg, it will become available to all query engines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unified query experience—bridging PPL and SQL&lt;/strong&gt;: By the end of 2024, we’ll consolidate SQL and PPL into a common interface within Discover. This unification will allow analysts to work more efficiently, using their preferred language without switching between tools. We’re also including autocomplete and auto-suggest functionality to make query building easier. Looking ahead to 2025, we’re planning to significantly enhance both OpenSearch’s PPL and SQL capabilities. For &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/214/views/2&quot;&gt;PPL&lt;/a&gt;, we’re introducing over 30 new PPL commands and functions, including &lt;a href=&quot;https://github.com/opensearch-project/sql/issues/2913&quot;&gt;JOINs&lt;/a&gt;, lookups, and JSON search capabilities. These additions will empower you to perform more sophisticated analyses, especially in observability and security contexts. Our SQL engine is also undergoing a &lt;a href=&quot;https://github.com/opensearch-project/sql/issues/2674&quot;&gt;major upgrade&lt;/a&gt;, with a focus on standardization and interoperability. You can look forward to support for vector search, geographical search, and advanced SQL queries, unlocking even more powerful analytics possibilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discover—your central hub for analytics&lt;/strong&gt;: We’re positioning &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch-Dashboards/issues/8069&quot;&gt;Discover as the primary entry point&lt;/a&gt; to your analytics workflows. Soon, you’ll be able to seamlessly transition from refining queries to creating visualizations, performing trace analytics, generating reports, or setting up alerts—all without leaving the Discover interface. This interconnected approach will streamline your workflow, saving time and reducing context switching. While we know the community is interested in the workflows we highlighted, we will build the functionality generically so that the community can easily plug in custom workflows that meet their needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced observability tools&lt;/strong&gt;: OpenSearch is working on several new observability features to enhance the existing capabilities and user experience. These include the development of a &lt;a href=&quot;https://github.com/opensearch-project/opensearch-catalog/issues/123&quot;&gt;correlation zones framework&lt;/a&gt;, which aims to simplify and automate site reliability engineers’ (SREs) daily tasks by identifying critical issues more efficiently. The framework will categorize anomalies and incidents into correlation zones, reducing the need for constant monitoring and allowing SREs to focus on significant segments. Additionally, OpenSearch is optimizing its &lt;a href=&quot;https://github.com/opensearch-project/dashboards-observability/issues/2141&quot;&gt;Trace Analytics&lt;/a&gt; plugin by adding improved storage capabilities, UI enhancements, query performance, and seamless integration with other OpenSearch Dashboards plugins. This includes the ability to store configurations, support for custom indexes and cross-cluster queries, and better correlation between logs, traces, and metrics. OpenSearch is also working on adding support for &lt;a href=&quot;https://github.com/opensearch-project/dashboards-observability/issues/2139&quot;&gt;PromQL&lt;/a&gt; in dashboards, enabling users to query Prometheus data sources directly and further expanding its observability capabilities and data integration options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Prepper&lt;/strong&gt;: Data Prepper allows the community to ingest traces, logs, and metrics into OpenSearch. Currently, the primary means for ingesting these signals are through OpenTelemetry over gRPC, HTTP, and Apache Kafka and through loading from Amazon S3. The community has looked for other ways to ingest data into OpenSearch, and Data Prepper is planning to support those. First, an &lt;a href=&quot;https://github.com/opensearch-project/data-prepper/issues/1082&quot;&gt;Amazon Kinesis source&lt;/a&gt; will allow the community to pull data from Amazon Kinesis, which is popular for streaming data. Second, Data Prepper is planning to &lt;a href=&quot;https://github.com/opensearch-project/data-prepper/issues/4180&quot;&gt;provide a new OpenSearch API source&lt;/a&gt; for ingesting data using existing OpenSearch APIs. This API will initially accept requests made using the &lt;a href=&quot;https://github.com/opensearch-project/data-prepper/issues/248&quot;&gt;OpenSearch Bulk API&lt;/a&gt; and will support other document update APIs in the future. Third, Data Prepper will &lt;a href=&quot;https://github.com/opensearch-project/data-prepper/issues/1986&quot;&gt;support Apache Kafka&lt;/a&gt; as a sink. While users can currently read from Apache Kafka using Data Prepper, there is growing interest in using Data Prepper as an ingestion tool for Kafka clusters. One of Data Prepper’s major use cases is observability and analytics, and both the maintainers and community continue to improve upon Data Prepper capabilities for these important use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security analytics&lt;/strong&gt;: Our mission is to empower security and operations teams to quickly discover and isolate threats or operational issues, minimizing the impact on business operations and protecting confidential data. OpenSearch users ingest security and operations data into their clusters for real-time security threat detection and correlation, security event investigation, and operational trend visualization to generate meaningful insights. &lt;a href=&quot;https://github.com/opensearch-project/security-analytics&quot;&gt;Security Analytics&lt;/a&gt; provides a prebuilt library of over 3,300 threat detection rules for common security event logs, a threat intelligence framework, a real-time detection rules engine, alerting capabilities for notifying incident response teams, and a correlation rules engine for identifying associations across events. In the coming year, we will create a unified experience so that users can move faster to find and address threats. We will support security insights without creating detectors, expand support for new security log types, add new threat intelligence feed integrations, and simplify the data mapping workflows. We will integrate generative AI features into existing workflows to enable users of all skill levels to easily configure threat detection, create security rules, and obtain security insights and remediation steps. In addition, we will improve investigation workflows that will enable users to query and analyze historical logs for compliance and investigation purposes. Native integrations with incident response and case management systems, such as ServiceNow and PagerDuty, will help users monitor updates from a centralized location.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-5-cost-performance-and-scalability&quot;&gt;Roadmap Theme 5: Cost, Performance, and Scalability&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Search performance and a new query engine&lt;/strong&gt;: As data volumes increase in size and workloads become more complex, price performance remains a top priority for OpenSearch users. OpenSearch recently implemented significant engine performance enhancements, as highlighted in a &lt;a href=&quot;https://opensearch.org/blog/opensearch-performance-2.14/&quot;&gt;previous blog post&lt;/a&gt;. Compared to OpenSearch 1.0, recent OpenSearch versions demonstrate a 50% improvement for text queries, a 40% improvement for multi-term queries, a 100x boost for term queries, and a 50x boost for date histograms. These advancements stem from the engine performance optimizations outlined in our &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/153&quot;&gt;performance roadmap&lt;/a&gt;. The roadmap also includes future initiatives such as &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/12257&quot;&gt;document reordering&lt;/a&gt;, &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/12390&quot;&gt;query rewriting&lt;/a&gt;, &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/11959&quot;&gt;dynamic pruning&lt;/a&gt;, and count-only caching. Additionally, the OpenSearch community is now taking the initiative to evolve the core engine in order to embrace new technologies like custom engines, parallelization, and composable architectures—all within an open-source framework. This includes rearchitecting the engine toward &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/14596&quot;&gt;indexing and search separation&lt;/a&gt; and offering a more modular and adaptable system. Additionally, faster interconnections using an efficient binary format for client-server communication, such as &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/15190&quot;&gt;gRPC&lt;/a&gt;, and node-to-node messaging through &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/6844&quot;&gt;Protobuf&lt;/a&gt;, have yielded promising early results. While actively contributing to core Lucene, we’re also focused on building a &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/14637&quot;&gt;cloud-native architecture&lt;/a&gt; to further enhance engine performance at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application-based context templates&lt;/strong&gt;: &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/12683&quot;&gt;Application-based context templates&lt;/a&gt; provide predefined, use-case-specific templates that package the right configuration for the specific use case. For example, an index created based on the &lt;a href=&quot;https://github.com/opensearch-project/opensearch-system-templates/blob/main/src/main/resources/org/opensearch/system/applicationtemplates/v1/logs.json&quot;&gt;logs template&lt;/a&gt; is configured with the Zstd compression codec and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log_byte_size&lt;/code&gt; merge policy. This configuration helps reduce disk utilization and enhances overall performance. Multi-field indexes aim to provide constant query latency when a query searches across multiple fields. The first implementation of a multi-field index is available as a &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/12498&quot;&gt;Star Tree index&lt;/a&gt;. The roadmap includes plans to introduce additional context-specific templates, such as those for metrics, traces, and events. It also aims to enhance existing templates with specialized optimizations, including the Star Tree index.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling shard management&lt;/strong&gt;: &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/12918&quot;&gt;Shard splitting&lt;/a&gt; aims to provide the capability to scale shards based on size or throughput with zero downtime for read and write traffic. In search use cases, it can be difficult to predict the number of primary shards in advance. As a result, the OpenSearch cluster can become “hot,” impacting performance. This can yield insufficient resources on the node hosting the shard, potentially triggering Lucene’s hard limit on the number of documents (2B) in a Lucene index. Today, there are two options available to solve this problem: document reindexing or index splitting. With document reindexing, the entire index is reindexed into a new index with a larger number of primary shards. This is a very slow process that requires additional compute and I/O. With index splitting, the index is first marked as read-only, and then all its shards are split, causing write downtime for users. Additionally, the Split API does not provide the granularity of splitting at the shard level, so a single hot shard cannot be scaled independently. In-place shard splitting will address these limitations and provide a more holistic way to scale shards. One challenge of running a bigger cluster is optimally allocating a large number of shards while honoring a set of placement constraints. Because all placement decisions are executed sequentially, the cluster manager is unable to prioritize other critical operations, such as index creation and settings updates, which can eventually time out. To address this issue, all placement decisions are &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/15872&quot;&gt;optimized&lt;/a&gt; and &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/pull/14848&quot;&gt;bounded&lt;/a&gt; so they finish early, preventing starvation of critical tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remote-backed storage and automatic storage tiering&lt;/strong&gt;: OpenSearch already offers remote store indexes, which improve durability and indexing performance. Building on this architecture, we plan to deliver an end-to-end &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/3739&quot;&gt;multi-tier storage experience&lt;/a&gt;, which will provide users with an optimal balance of cost and performance. The &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/12809&quot;&gt;warm tier&lt;/a&gt; will handle more storage per compute while maintaining the interactive &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/13806&quot;&gt;search experience on the warm data&lt;/a&gt; without requiring all data to be locally available. The on-demand cold tier experience will provide compute and storage separation, allowing users to store large amounts of data that can be made searchable when needed. Additionally, we’ll introduce new use-case-specific index templates to simplify index configuration for users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pull-based ingestion&lt;/strong&gt;: Native &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/10610&quot;&gt;pull-based ingestion&lt;/a&gt; that pulls events from an external event stream provides further benefits compared to the current push-based model. These benefits include better handling of ingestion throughput spikes and removing the need for the translog in the indexing nodes. OpenSearch can be extended to support pull-based indexing, which can also present the possibility of priority-based ingestion. Time-sensitive and critical updates can be isolated from lower-priority events, and ingestion spikes can be handled by throttling low-priority events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next-generation snapshots for remote-backed clusters&lt;/strong&gt;: &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/15057&quot;&gt;Snapshots v2&lt;/a&gt; aims to enhance the scalability of snapshots for remote-backed clusters and reduce dependence on per-shard state updates in the cluster manager. The new snapshots rely on a timestamp-based pinning strategy, where instead of resolving shard-level files at snapshot time, the timestamp for the snapshot is pinned and the resolution is deferred until restore time. This approach makes the snapshot process much faster, allowing snapshot operations to finish within a couple of minutes, even for larger clusters, while significantly reducing the computational load associated with data backup. Timestamp pinning serves as the fundamental building block for future features, such as &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/1147&quot;&gt;Point-In-Time-Restore (PITR)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling admin APIs&lt;/strong&gt;: For large cluster configurations, cluster manager nodes become scaling bottlenecks as multiple admin APIs obtain the cluster state from the active cluster manager node, even if the latest state is present locally or present in a remote store. With the ongoing optimizations, the coordinator node can &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/pull/12252/&quot;&gt;serve the admin APIs without relaying the request&lt;/a&gt; to the cluster manager node in most cases. Also, for APIs like CAT Shards and CAT Snapshots, the response size increases as the cluster expands to 100K shards or more. We plan to introduce &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/14258&quot;&gt;pagination&lt;/a&gt; and &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/13908&quot;&gt;cancellation&lt;/a&gt; for these APIs to ensure that they continue to operate efficiently regardless of the metadata size. We are implementing multiple optimizations to the Stats and Cluster APIs that will eliminate redundant processing and perform &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/pull/14426&quot;&gt;pre-aggregation&lt;/a&gt; on the data node before responding to the coordinator node receiving the user request.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-6-stability-availability-and-resiliency&quot;&gt;Roadmap Theme 6: Stability, Availability, and Resiliency&lt;/h3&gt;

&lt;p&gt;OpenSearch is designed to provide capabilities for search and analytics at scale by using the underlying Lucene search engine that also powers other distributed systems. The OpenSearch Project has dedicated time and effort to improving stability and resiliency and making the service highly available. The following are some of the planned key efforts.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/7334&quot;&gt;&lt;strong&gt;Coordinator-level latency visibility&lt;/strong&gt;&lt;/a&gt;: This initiative provides users visibility into the different phases of search request execution in OpenSearch. This is particularly useful for statistically identifying possible changes in a workload by monitoring latency metrics across different phases. Coordinator slow logs were recently introduced to give users the ability to capture “slow” requests along with a breakdown of time spent in different search phases, something that was otherwise only available for the query and fetch phases.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/11429&quot;&gt;&lt;strong&gt;Query insights&lt;/strong&gt;&lt;/a&gt;: We recently introduced the ability for users to access computationally expensive queries (top N queries). We plan to integrate OpenSearch with external metrics collectors, like OpenTelemetry, to deliver more comprehensive analytics. Currently, queries can be analyzed according to various metrics, such as latency, CPU, memory utilization, and even query structure. Support for visualizing the execution profile will help users easily identify bottlenecks in their workload execution. Given a sufficient level of insight data, we will use AI/ML to build recommendation systems, which will eventually be able to automatically manage cluster settings for users with minimal intervention on their part.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/1329&quot;&gt;&lt;strong&gt;Query resiliency&lt;/strong&gt;&lt;/a&gt;: One significant risk to cluster stability is runaway queries that continuously consume memory, leading to out-of-memory states and potentially catastrophic outcomes. Search backpressure introduces a mechanism to automatically identify and terminate such problematic queries when an OpenSearch host is low on memory or CPU. Existing mechanisms like circuit breakers and thread pool size thresholds provide a generic solution, but they do not specifically target the problematic queries. New search backpressure and hard cancellation techniques are designed to address these limitations.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/11061&quot;&gt;&lt;strong&gt;Workload management&lt;/strong&gt;&lt;/a&gt;: An OpenSearch installation often contains a large number of tenants, all of which experience the same quality of service (QoS). However, this potentially means that an inexperienced tenant can consume more than the desired amount of cluster resources, which can lead to a degraded experience for other tenants. Admission control and search backpressure provide a best-effort assurance for cluster stability but do not guarantee a consistent QoS. With the introduction of query groups, system administrators of OpenSearch clusters will be able to provide tenant-based performance isolation for search workloads, manage tenant-based query groups, and enforce resource-based limits on tenant workloads. This enhancement will allow system administrators to prioritize execution of some workloads over others, thereby further improving QoS guarantee levels.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/13257&quot;&gt;&lt;strong&gt;Cluster state management&lt;/strong&gt;&lt;/a&gt;: The cluster manager node manages all admin operations in a cluster. These operations include creating and deleting indexes, updating fields in an existing index, taking snapshots, and adding and removing nodes. The metadata about indexes and other entities—data streams, templates, aliases, snapshots, and custom entities stored by plugins—is stored in a data structure called the cluster state. Any change to the cluster state is processed by the cluster manager node and persisted to that node’s local disk. Starting with version 2.12, OpenSearch added support for storing the cluster state remotely in order to provide durability guarantees. With the introduction of a remote cluster state, replacing all cluster manager nodes will not result in any data loss in remote store clusters. The cluster manager node processes any cluster state updates and then sends the updated state to all the follower nodes in the cluster. As the state and number of follower nodes grow, the overhead on the cluster manager node increases significantly because the cluster manager node is responsible for publishing the updated state to every node in the cluster. This impacts the cluster’s stability and availability. To reduce strain on the cluster manager node, we are proposing to use the remote store for cluster state publication. The cluster manager node will publish the entire cluster state to the remote store to be downloaded by each follower node. The published cluster state will include ephemeral entities like the &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/14164&quot;&gt;shard routing table&lt;/a&gt;, which stores the mapping of the shards assigned to each data node in the cluster. The cluster manager node will only communicate that a new state is available and provide the remote location of the new state, instead of publishing the entire cluster state. Publishing the state remotely will reduce memory, CPU, and transport thread overhead on the cluster manager node during cluster state changes. This approach will also allow on-demand downloading of entities on the data or coordinator nodes instead of requiring all nodes to maintain the full cluster state. This will align with our vision of a more cloud-native architecture. Remote publication will be generally available in OpenSearch 2.17 and is planned to be further enhanced in future version releases.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/data-prepper/issues/3857&quot;&gt;&lt;strong&gt;Data Prepper pipeline DLQ&lt;/strong&gt;&lt;/a&gt;: Data Prepper provides resilience when OpenSearch is down by buffering data and eventually writing to a dead-letter queue (DLQ) if the cluster remains unavailable. Currently supported DLQ targets are local files and Amazon S3. One current limitation is that data is only sent to the DLQ if it fails to write to the sink. Other failures, such as during processing in the pipeline, do not case data to be sent to the DLQ. With the proposed pipeline DLQ, Data Prepper will be able to send failed events to the DLQ or continue to send them downstream, allowing the pipeline author to decide. This will improve the resiliency of data throughout the pipeline. Additionally, the pipeline DLQ will be a pipeline just like any other and will be able to write to any supported Data Prepper sink, such as Apache Kafka.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-7-security&quot;&gt;Roadmap Theme 7: Security&lt;/h3&gt;

&lt;p&gt;Security is a Tier 0 prerequisite for modern workloads. In OpenSearch, security features are primarily implemented by the Security plugin, which offers a rich set of capabilities. These include various authentication backends (SAML, JWT, LDAP), authorization primitives, fine-grained access control (document-level and file-level security, or DLS/FLS), and encryption in transit. OpenSearch has &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/206/views/11?sliceBy%5Bvalue%5D=Security&quot;&gt;rapidly developed new plugin capabilities&lt;/a&gt;, attracting increased interest from the community. This growth also raises critical security implications. Importantly, security should not come at the cost of performance. To address these challenges, OpenSearch is focusing on the following initiatives to strengthen its security posture.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/security/issues/4500&quot;&gt;&lt;strong&gt;Plugin resource permissions&lt;/strong&gt;&lt;/a&gt;: We are developing a mechanism for sharing plugin resources that supports existing use cases while allowing more granular control over resource sharing. Examples include model groups in the ML Commons plugin, anomaly detectors in the Time Series Analytics plugin, and detectors in the Alerting plugin.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/security/issues/4439&quot;&gt;&lt;strong&gt;Plugin isolation&lt;/strong&gt;&lt;/a&gt;: OpenSearch is moving toward a zero-trust model for plugins. Cluster administrators will have full &lt;a href=&quot;https://github.com/opensearch-project/security/issues/2860&quot;&gt;visibility into all permissions&lt;/a&gt; requested by a plugin before installation.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/security/issues/3870&quot;&gt;&lt;strong&gt;Optimized privilege evaluation&lt;/strong&gt;&lt;/a&gt;: Performance is a key focus for OpenSearch. We’ve identified areas within the Security plugin that can yield significant performance improvements, especially for clusters with numerous indexes or roles mapped to users.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/security/issues/4009&quot;&gt;&lt;strong&gt;API tokens&lt;/strong&gt;&lt;/a&gt;: API tokens introduce a new way to interact with OpenSearch clusters by associating permissions directly with a token. Cluster administrators will have full visibility into and control over the issued tokens and their usage.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/opensearch-project/security-dashboards-plugin/issues/2070&quot;&gt;&lt;strong&gt;Ease of use&lt;/strong&gt;&lt;/a&gt;: We aim to simplify security setup for cluster administrators. Many useful security features remain underused because they are not exposed through OpenSearch Dashboards. To address this, we will add security dashboard pages where administrators can configure rate limiters to protect clusters from unauthenticated actors.&lt;/p&gt;

&lt;p&gt;Looking ahead, security primitives like &lt;a href=&quot;https://github.com/opensearch-project/security/issues/4702&quot;&gt;authorization could be extracted and made pluggable&lt;/a&gt;, allowing integration with newer open standards for policy evaluation, such as Open Policy Agent (OPA) or Cedar.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-8-modular-architecture&quot;&gt;Roadmap Theme 8: Modular Architecture&lt;/h3&gt;

&lt;p&gt;OpenSearch is working toward well-supported modularity in order to enable &lt;strong&gt;rapid development of properly encapsulated features and flexible deployment architectures&lt;/strong&gt; for cloud-native use cases. Historically, OpenSearch has been deployed and operated as a cluster model, in which all functions (such as replication and durability) were implemented within the cluster. While the project has grown organically, offering many extension points through plugins, it still relies on a monolithic server module at its core, with tight coupling across the architecture. As the project grows within a globally distributed community, this monolithic architecture will become an unsustainable bottleneck. Innovations such as the next-generation query engine are not possible with tightly coupled components. Additionally, the Java Security Manager is pending deprecation and removal from the Java runtime, and the recommended replacement technique, (&lt;a href=&quot;https://inside.java/2021/04/23/security-and-sandboxing-post-securitymanager/&quot;&gt;shallow sandboxing&lt;/a&gt;), relies on using &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/1588&quot;&gt;newer language features that require properly modularized code&lt;/a&gt;. The overall goal of the &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/5910&quot;&gt;modularity effort&lt;/a&gt; is to allow the same core OpenSearch code to run across all variants (for example, on-premises clusters and large managed serverless offerings) while providing strong encapsulation of cluster functions. This will facilitate more independent development and innovation across the project.&lt;/p&gt;

&lt;h3 id=&quot;roadmap-theme-9-releases-and-project-health&quot;&gt;Roadmap Theme 9: Releases and Project Health&lt;/h3&gt;

&lt;p&gt;With contributions ranging from code enhancements to feature requests across all roadmap themes, the OpenSearch community is working together to maintain the stability of the codebase while ensuring that CI/CD pipelines remain green across all active branches. This provides a reliable foundation for both new and existing contributors, reduces bugs, and safeguards feature integrity. Key repository health metrics are publicly available on the &lt;a href=&quot;https://metrics.opensearch.org/_dashboards/app/dashboards#/view/f1ad21c0-e323-11ee-9a74-07cd3b4ff414?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-4y,to:now))&amp;amp;_a=(description:&apos;OpenSearch%20Ops%20Metrics&apos;,filters:!(),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),query:(language:kuery,query:&apos;&apos;),timeRestore:!t,title:&apos;OpenSearch%20Ops%20Metrics&apos;,viewMode:view)&quot;&gt;Ops Dashboard&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The OpenSearch &lt;a href=&quot;https://github.com/opensearch-project/.github/blob/main/RELEASING.md&quot;&gt;release process&lt;/a&gt; is fully automated, including a one-click release system for products such as OpenSearch Benchmark. Each product adheres to &lt;a href=&quot;https://opensearch.org/blog/what-is-semver/&quot;&gt;semantic versioning (semver)&lt;/a&gt;, ensuring that breaking changes only occur in major versions. Releases follow a structured &lt;a href=&quot;https://opensearch.org/releases.html#release-schedule&quot;&gt;schedule&lt;/a&gt;, starting with a code freeze and release candidate generation, and are driven by automated workflows that eliminate the need for manual sign-offs. We’re also building a &lt;a href=&quot;https://opensearch.org/release-dashboard&quot;&gt;Central Release Dashboard&lt;/a&gt; to streamline and provide visibility into the release pipeline from beginning to end.&lt;/p&gt;

&lt;h2 id=&quot;get-involved&quot;&gt;Get involved&lt;/h2&gt;

&lt;p&gt;We recognize that community engagement is crucial to the success of all the innovations mentioned in this post. We invite the open-source community to review our roadmap, provide feedback, and contribute to the OpenSearch Project. Your insights and contributions will be invaluable in helping us to achieve these goals and continue improving OpenSearch.&lt;/p&gt;

&lt;p&gt;You can &lt;a href=&quot;https://github.com/opensearch-project/.github/blob/main/FEATURES.md&quot;&gt;propose new ideas and features&lt;/a&gt; at any time by creating a GitHub issue and following our &lt;a href=&quot;https://github.com/opensearch-project/.github/blob/main/.github/ISSUE_TEMPLATE/FEATURE_REQUEST_TEMPLATE.md&quot;&gt;feature request template&lt;/a&gt;. Once proposed, the feature can be included in the &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/206/views/11&quot;&gt;public roadmap&lt;/a&gt; by adding corresponding labels (such as Meta, RFC, or Roadmap), which are automatically populated for all the repositories and are categorized by themes for clarity. If you have any questions or suggestions for improving our processes, please feel free to reach out or contribute directly through &lt;a href=&quot;https://github.com/opensearch-project&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We encourage you to actively participate in our project because your involvement will help shape the future of OpenSearch. By engaging with our community, sharing your ideas, and contributing to development, you’ll play a crucial role in driving innovation and improving the project. Thank you for your continued support and commitment to open source!&lt;/p&gt;</content><author><name>pallp</name></author><category term="community-updates" /><summary type="html">OpenSearch is a rapidly growing open-source product suite comprising a search engine, an ingestion system, language clients, and a user interface for analytics. OpenSearch contributors and maintainers are innovating in all these areas at a fast pace. To steer the project&apos;s development effectively, we have revamped the project roadmap to provide better transparency into both short- and long-term enhancements. In this blog post, we are excited to share the new theme-based, community-driven OpenSearch Project Roadmap for 2024–2025.</summary></entry><entry><title type="html">Data Prepper 2.9.0 is ready for download</title><link href="https://kolchfa-aws.github.io/blog/Data-Prepper-2.9.0-is-ready-for-download/" rel="alternate" type="text/html" title="Data Prepper 2.9.0 is ready for download" /><published>2024-08-29T18:30:00+00:00</published><updated>2024-09-05T16:18:17+00:00</updated><id>https://kolchfa-aws.github.io/blog/Data-Prepper-2.9.0-is-ready-for-download</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/Data-Prepper-2.9.0-is-ready-for-download/">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;You can download Data Prepper 2.9.0 today.
This release includes a number of core improvements as well as improvements to many popular processors.&lt;/p&gt;

&lt;h2 id=&quot;expression-improvements&quot;&gt;Expression improvements&lt;/h2&gt;

&lt;p&gt;Data Prepper continues to improve support for expressions to allow you more control over conditions that you use for routing and conditional processing.
In this release, Data Prepper adds support for set operations.
These operations allow you to write conditions that check whether a value is in a set of possible values.
This can be especially useful for routing, where you need to route data depending on the originating system.&lt;/p&gt;

&lt;p&gt;Additionally, Data Prepper has a new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;startsWith&lt;/code&gt; function that determines whether a string value starts with another string.&lt;/p&gt;

&lt;h2 id=&quot;default-route&quot;&gt;Default route&lt;/h2&gt;

&lt;p&gt;Data Prepper has offered sink routing since version 2.0.
With this capability, pipeline authors can use Data Prepper expressions to route events to different sinks in order to meet their requirements.
One challenge experienced by pipeline authors has been how to handle events that do not match any existing routes.
A common solution to this challenge has been to create a route that is the inverse of other routes.
However, this required copying and inverting the other conditions, which could be difficult to handle and even more difficult to maintain.&lt;/p&gt;

&lt;p&gt;Now Data Prepper supports a special route named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_default&lt;/code&gt;.
By applying this route to a sink, pipeline authors can ensure that events that do not match any other routes will be sent to a default sink of their choosing.&lt;/p&gt;

&lt;p&gt;For example, consider a simple situation in which you want to route frontend and backend events to different sinks.
You can define two sinks for these events and then define your routes.
But what if you receive events that do not match?
The following sample pipeline shows an approach to handling events that do not match either the frontend or backend routes:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;routes:
  - frontend: &apos;/service == &quot;front-end&quot;`
  - backend: &apos;/service == &quot;back-end&quot;`
sink:
 - opensearch:
      routes:
         - front-end
 - opensearch:
      routes:
         - back-end
  - opensearch:
       routes:
          - _default
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;performance&quot;&gt;Performance&lt;/h2&gt;

&lt;p&gt;The Data Prepper maintainers have been working toward improving the performance of Data Prepper.
This release includes a number of internal improvements that speed up processing for many processors.
You don’t need to do anything other than update your version to experience these improvements.&lt;/p&gt;

&lt;p&gt;Data Prepper 2.9 also offers some new features that you can use to help reduce out-of-memory errors or circuit breaker trips.
Many pipelines involve extracting source data from a string into a structure.
Some examples are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grok&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parse_json&lt;/code&gt;.
When you use these processors, you more than double the size of each event that you process.
Because the events flowing through the system consume the largest portion of memory usage, this will greatly increase your memory requirements.&lt;/p&gt;

&lt;p&gt;Many pipeline authors may use these processors and then remove the source data in a second processor.
This is a good approach when you don’t need to store the original string in your sink.
But it doesn’t always make the memory used by the string available for garbage collection when you need it.
The reason for this is that Data Prepper pipelines operate on batches of data.
As these batches of data move through the pipeline, the pipeline will expand the memory usage in one processor and then attempt to reduce it in the next.
Because the memory expansion happens in batches, Data Prepper may expand many thousands of events before starting to remove the source data.&lt;/p&gt;

&lt;p&gt;See the following example pipeline, which runs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grok&lt;/code&gt; and then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_entries&lt;/code&gt;.
With a configured &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;batch_size&lt;/code&gt; of 100,000, Data Prepper will expand 100,000 events before deleting the messages.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;my-pipeline:
  buffer:
    bounded_blocking:
      batch_size: 100000
  processor:
    - grok:
        match:
          message: [&quot;...&quot;]
    - delete_entries:
        with_keys: [&quot;message&quot;]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To help with this memory usage issue, Data Prepper now provides a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_source&lt;/code&gt; flag on some of these processors, including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grok&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parse_json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Returning to the preceding example, you could both simplify the pipeline and reduce the amount of memory used in between processors:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;my-pipeline:
  buffer:
    bounded_blocking:
      batch_size: 100000
  processor:
    - grok:
        match:
          message: [&quot;...&quot;]
        delete_source: true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you observe this pattern of the source being deleted in a separate processor, configure your pipeline to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;delete_source&lt;/code&gt; in order to improve your overall memory usage.&lt;/p&gt;

&lt;h2 id=&quot;getting-started&quot;&gt;Getting started&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;To download Data Prepper, visit the &lt;a href=&quot;https://opensearch.org/downloads.html&quot;&gt;OpenSearch downloads&lt;/a&gt; page.&lt;/li&gt;
  &lt;li&gt;For instructions on how to get started with Data Prepper, see &lt;a href=&quot;https://opensearch.org/docs/latest/data-prepper/getting-started/&quot;&gt;Getting started with Data Prepper&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;To learn more about the work in progress for Data Prepper 2.10 and other releases, see the &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/221&quot;&gt;Data Prepper roadmap&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;thanks-to-our-contributors&quot;&gt;Thanks to our contributors!&lt;/h2&gt;

&lt;p&gt;The following community members contributed to this release. Thank you!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/chenqi0805&quot;&gt;chenqi0805&lt;/a&gt; – Qi Chen&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/danhli&quot;&gt;danhli&lt;/a&gt; – Daniel Li&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dinujoh&quot;&gt;dinujoh&lt;/a&gt; – Dinu John&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dlvenable&quot;&gt;dlvenable&lt;/a&gt; – David Venable&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/graytaylor0&quot;&gt;graytaylor0&lt;/a&gt; – Taylor Gray&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/ivan-tse&quot;&gt;ivan-tse&lt;/a&gt; – Ivan Tse&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jayeshjeh&quot;&gt;jayeshjeh&lt;/a&gt; – Jayesh Parmar&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/joelmarty&quot;&gt;joelmarty&lt;/a&gt; – Joël Marty&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/kkondaka&quot;&gt;kkondaka&lt;/a&gt; – Krishna Kondaka&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/mishavay-aws&quot;&gt;mishavay-aws&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/oeyh&quot;&gt;oeyh&lt;/a&gt; – Hai Yan&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/san81&quot;&gt;san81&lt;/a&gt; – Santhosh Gandhe&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/sb2k16&quot;&gt;sb2k16&lt;/a&gt; – Souvik Bose&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/shenkw1&quot;&gt;shenkw1&lt;/a&gt; – Katherine Shen&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/srikanthjg&quot;&gt;srikanthjg&lt;/a&gt; – Srikanth Govindarajan&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/timo-mue&quot;&gt;timo-mue&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name>dvenable</name></author><category term="releases" /><summary type="html">Data Prepper 2.9.0 contains core improvements to expressions, routing, performance, and more.</summary></entry><entry><title type="html">Boosting vector search performance with concurrent segment search</title><link href="https://kolchfa-aws.github.io/blog/boost-vector-search-with-css/" rel="alternate" type="text/html" title="Boosting vector search performance with concurrent segment search" /><published>2024-08-27T00:00:00+00:00</published><updated>2024-08-27T22:38:15+00:00</updated><id>https://kolchfa-aws.github.io/blog/boost-vector-search-with-css</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/boost-vector-search-with-css/">&lt;p&gt;In OpenSearch, data is stored in shards, which are further divided into segments. When you execute a search query, it runs sequentially across all segments of each shard involved in the query. As the number of segments increases, this sequential execution can increase &lt;em&gt;query latency&lt;/em&gt; (the time it takes to retrieve the results) because the query has to wait for each segment run to complete before moving on to the next one. This delay becomes especially noticeable if some segments take longer to process queries than others.&lt;/p&gt;

&lt;style&gt;

table { 
    font-size: 16px; 
}

h3 {
    font-size: 22px;
}

h4 {
    font-size: 20px;
}

th {
    background-color: #f5f7f7;
}​

&lt;/style&gt;

&lt;p&gt;Introduced in OpenSearch version 2.12, &lt;em&gt;concurrent segment search&lt;/em&gt; addresses this issue by enabling parallel execution of queries across multiple segments within a shard. By using available computing resources, this feature reduces overall query latency, particularly for larger datasets with many segments. Concurrent segment search is designed to provide more consistent and predictable latencies. It achieves this consistency by reducing the impact of variations in segment performance or the number of segments on query execution time.&lt;/p&gt;

&lt;p&gt;In this blog post, we’ll explore the impact of concurrent segment search on vector search workloads.&lt;/p&gt;

&lt;h2 id=&quot;enabling-concurrent-segment-search&quot;&gt;Enabling concurrent segment search&lt;/h2&gt;

&lt;p&gt;By default, concurrent segment search is disabled in OpenSearch. For our experiments, we enabled it for all indexes in the cluster by using the following dynamic cluster setting:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;PUT&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;_cluster/settings&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
   &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;persistent&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;search.concurrent_segment_search.enabled&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
   &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To achieve concurrent segment searches, OpenSearch divides the segments within each shard into multiple slices, with each slice processed in parallel on a separate thread. The number of slices determines the degree of parallelism that OpenSearch can provide. You can either use Lucene’s default slicing mechanism or set the maximum slice count manually. For detailed instructions on updating the slice count, see &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/concurrent-segment-search/#slicing-mechanisms&quot;&gt;Slicing mechanisms&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;performance-results&quot;&gt;Performance results&lt;/h2&gt;

&lt;p&gt;We performed our tests on an &lt;a href=&quot;https://opensearch.org/versions/opensearch-2-15-0.html&quot;&gt;OpenSearch 2.15&lt;/a&gt; cluster using the OpenSearch Benchmark &lt;a href=&quot;https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch&quot;&gt;vector search workload&lt;/a&gt;. We used the Cohere dataset with two different configurations to evaluate the performance improvements of vector search queries when running the workload with concurrent segment search disabled, enabled with default settings, and enabled with different max slice counts.&lt;/p&gt;

&lt;h3 id=&quot;cluster-setup&quot;&gt;Cluster setup&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;3 data nodes (r5.4xlarge: 128 GB RAM, 16 vCPUs, 250 GB disk space)&lt;/li&gt;
  &lt;li&gt;3 cluster manager nodes (r5.xlarge: 32 GB RAM, 4 vCPUs, 50 GB disk space)&lt;/li&gt;
  &lt;li&gt;1 OpenSearch workload client (c5.4xlarge: 32 GB RAM, 16 vCPUs)&lt;/li&gt;
  &lt;li&gt;1 and 4 search clients&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool size: 32&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;index-settings&quot;&gt;Index settings&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;m&lt;/code&gt;&lt;/th&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ef_construction&lt;/code&gt;&lt;/th&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ef_search&lt;/code&gt;&lt;/th&gt;
      &lt;th&gt;Number of shards&lt;/th&gt;
      &lt;th&gt;Replica count&lt;/th&gt;
      &lt;th&gt;Space type&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
      &lt;td&gt;100&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;inner product&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;configuration&quot;&gt;Configuration&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Dimension&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Vector count&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Search query count&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Refresh interval&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;768&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;10M&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;10K&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;1s (default)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;service-time-comparison&quot;&gt;Service time comparison&lt;/h3&gt;

&lt;p&gt;We conducted the following experiments:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;#experiment-1-concurrent-search-disabled&quot;&gt;Concurrent search disabled&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Concurrent search enabled:
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#experiment-2-concurrent-search-enabled-max-slice-count--0-default&quot;&gt;Max slice count = 0 (default)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#experiment-3-concurrent-search-enabled-max-slice-count--2&quot;&gt;Max slice count = 2&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#experiment-4-concurrent-search-enabled-max-slice-count--4&quot;&gt;Max slice count = 4&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#experiment-5-concurrent-search-enabled-max-slice-count--8&quot;&gt;Max slice count = 8&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The following sections present the results of these experiments.&lt;/p&gt;

&lt;h4 id=&quot;experiment-1-concurrent-search-disabled&quot;&gt;Experiment 1: Concurrent search disabled&lt;/h4&gt;

&lt;table border=&quot;1&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;k-NN engine&lt;/th&gt;
      &lt;th&gt;Segment count&lt;/th&gt;
      &lt;th&gt;Num search clients&lt;/th&gt;
      &lt;th colspan=&quot;3&quot;&gt;Service time (ms)&lt;/th&gt;
      &lt;th&gt;Max CPU %&lt;/th&gt;
      &lt;th&gt;% JVM heap used&lt;/th&gt;
      &lt;th&gt;Recall&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;p50&lt;/th&gt;
      &lt;th&gt;p90&lt;/th&gt;
      &lt;th&gt;p99&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Lucene&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
      &lt;td&gt;37&lt;/td&gt;
      &lt;td&gt;45&lt;/td&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;53.48&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;36&lt;/td&gt;
      &lt;td&gt;43&lt;/td&gt;
      &lt;td&gt;51&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;42&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;NMSLIB&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;383&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;35&lt;/td&gt;
      &lt;td&gt;41&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;47.5&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;35&lt;/td&gt;
      &lt;td&gt;41&lt;/td&gt;
      &lt;td&gt;46&lt;/td&gt;
      &lt;td&gt;36&lt;/td&gt;
      &lt;td&gt;48.06&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Faiss&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;29&lt;/td&gt;
      &lt;td&gt;37&lt;/td&gt;
      &lt;td&gt;42&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;47.85&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;36&lt;/td&gt;
      &lt;td&gt;40&lt;/td&gt;
      &lt;td&gt;44&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;46.38&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;experiment-2-concurrent-search-enabled-max-slice-count--0-default&quot;&gt;Experiment 2: Concurrent search enabled, max slice count = 0 (default)&lt;/h4&gt;

&lt;table border=&quot;1&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;k-NN engine&lt;/th&gt;
      &lt;th&gt;Segment count&lt;/th&gt;
      &lt;th&gt;Num search clients&lt;/th&gt;
      &lt;th colspan=&quot;3&quot;&gt;Service time (ms)&lt;/th&gt;
      &lt;th&gt;Max CPU %&lt;/th&gt;
      &lt;th&gt;% JVM heap used&lt;/th&gt;
      &lt;th&gt;Recall&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;p50&lt;/th&gt;
      &lt;th&gt;p90&lt;/th&gt;
      &lt;th&gt;p99&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Lucene&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;17&lt;/td&gt;
      &lt;td&gt;47&lt;/td&gt;
      &lt;td&gt;47.99&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;37&lt;/td&gt;
      &lt;td&gt;81&lt;/td&gt;
      &lt;td&gt;45.95&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;NMSLIB&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;383&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;47.28&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;75&lt;/td&gt;
      &lt;td&gt;44.76&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Faiss&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;13&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;34&lt;/td&gt;
      &lt;td&gt;46.04&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;25&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;33&lt;/td&gt;
      &lt;td&gt;76&lt;/td&gt;
      &lt;td&gt;47.72&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;experiment-3-concurrent-search-enabled-max-slice-count--2&quot;&gt;Experiment 3: Concurrent search enabled, max slice count = 2&lt;/h4&gt;

&lt;table border=&quot;1&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;k-NN engine&lt;/th&gt;
      &lt;th&gt;Segment count&lt;/th&gt;
      &lt;th&gt;Num search clients&lt;/th&gt;
      &lt;th colspan=&quot;3&quot;&gt;Service time (ms)&lt;/th&gt;
      &lt;th&gt;Max CPU %&lt;/th&gt;
      &lt;th&gt;% JVM heap used&lt;/th&gt;
      &lt;th&gt;Recall&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;p50&lt;/th&gt;
      &lt;th&gt;p90&lt;/th&gt;
      &lt;th&gt;p99&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Lucene&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;19&lt;/td&gt;
      &lt;td&gt;41&lt;/td&gt;
      &lt;td&gt;52.91&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;34&lt;/td&gt;
      &lt;td&gt;42&lt;/td&gt;
      &lt;td&gt;88&lt;/td&gt;
      &lt;td&gt;51.65&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;NMSLIB&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;383&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;25&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;44.97&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
      &lt;td&gt;33&lt;/td&gt;
      &lt;td&gt;60&lt;/td&gt;
      &lt;td&gt;41.06&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Faiss&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
      &lt;td&gt;22&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;19&lt;/td&gt;
      &lt;td&gt;46.42&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;67&lt;/td&gt;
      &lt;td&gt;37.23&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;experiment-4-concurrent-search-enabled-max-slice-count--4&quot;&gt;Experiment 4: Concurrent search enabled, max slice count = 4&lt;/h4&gt;

&lt;table border=&quot;1&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;k-NN engine&lt;/th&gt;
      &lt;th&gt;Segment count&lt;/th&gt;
      &lt;th&gt;Num search clients&lt;/th&gt;
      &lt;th colspan=&quot;3&quot;&gt;Service time (ms)&lt;/th&gt;
      &lt;th&gt;Max CPU %&lt;/th&gt;
      &lt;th&gt;% JVM heap used&lt;/th&gt;
      &lt;th&gt;Recall&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;p50&lt;/th&gt;
      &lt;th&gt;p90&lt;/th&gt;
      &lt;th&gt;p99&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Lucene&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;13.6&lt;/td&gt;
      &lt;td&gt;15.9&lt;/td&gt;
      &lt;td&gt;17.6&lt;/td&gt;
      &lt;td&gt;49&lt;/td&gt;
      &lt;td&gt;53.37&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;33&lt;/td&gt;
      &lt;td&gt;41&lt;/td&gt;
      &lt;td&gt;86&lt;/td&gt;
      &lt;td&gt;50.12&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;NMSLIB&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;383&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;29&lt;/td&gt;
      &lt;td&gt;51.12&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;21&lt;/td&gt;
      &lt;td&gt;25&lt;/td&gt;
      &lt;td&gt;31&lt;/td&gt;
      &lt;td&gt;72&lt;/td&gt;
      &lt;td&gt;42.63&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Faiss&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;17&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
      &lt;td&gt;41.1&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;37&lt;/td&gt;
      &lt;td&gt;77&lt;/td&gt;
      &lt;td&gt;47.19&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;experiment-5-concurrent-search-enabled-max-slice-count--8&quot;&gt;Experiment 5: Concurrent search enabled, max slice count = 8&lt;/h4&gt;

&lt;table border=&quot;1&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;k-NN engine&lt;/th&gt;
      &lt;th&gt;Segment count&lt;/th&gt;
      &lt;th&gt;Num search clients&lt;/th&gt;
      &lt;th colspan=&quot;3&quot;&gt;Service time (ms)&lt;/th&gt;
      &lt;th&gt;Max CPU %&lt;/th&gt;
      &lt;th&gt;% JVM heap used&lt;/th&gt;
      &lt;th&gt;Recall&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;p50&lt;/th&gt;
      &lt;th&gt;p90&lt;/th&gt;
      &lt;th&gt;p99&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Lucene&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;18&lt;/td&gt;
      &lt;td&gt;43&lt;/td&gt;
      &lt;td&gt;45.37&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;34&lt;/td&gt;
      &lt;td&gt;43&lt;/td&gt;
      &lt;td&gt;87&lt;/td&gt;
      &lt;td&gt;48.79&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;NMSLIB&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;383&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;12&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;41&lt;/td&gt;
      &lt;td&gt;45.21&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;25&lt;/td&gt;
      &lt;td&gt;29&lt;/td&gt;
      &lt;td&gt;75&lt;/td&gt;
      &lt;td&gt;45.87&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;2&quot;&gt;Faiss&lt;/td&gt;
      &lt;td rowspan=&quot;2&quot;&gt;381&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;17&lt;/td&gt;
      &lt;td&gt;44&lt;/td&gt;
      &lt;td&gt;48.68&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
      &lt;td&gt;79&lt;/td&gt;
      &lt;td&gt;47.19&lt;/td&gt;
      &lt;td&gt;0.97&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;comparing-results&quot;&gt;Comparing results&lt;/h3&gt;

&lt;p&gt;For simplicity, we’ll focus on the p90 metric with a single search client because this metric captures the performance of long-running vector search queries.&lt;/p&gt;

&lt;h4 id=&quot;service-time-comparison-p90&quot;&gt;Service time comparison (p90)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;k-NN engine&lt;/th&gt;
      &lt;th&gt;Concurrent segment search disabled&lt;/th&gt;
      &lt;th&gt;Concurrent segment search enabled (Lucene default number of slices)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;Concurrent segment search with max slice count = 2&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;Concurrent segment search with max slice count = 4&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;Concurrent segment search with max slice count = 8&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Lucene&lt;/td&gt;
      &lt;td&gt;37&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;59.5&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;56.8&lt;/td&gt;
      &lt;td&gt;15.9&lt;/td&gt;
      &lt;td&gt;57&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;56.8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;NMSLIB&lt;/td&gt;
      &lt;td&gt;35&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;60&lt;/td&gt;
      &lt;td&gt;23&lt;/td&gt;
      &lt;td&gt;34.3&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;57.1&lt;/td&gt;
      &lt;td&gt;12&lt;/td&gt;
      &lt;td&gt;65.7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Faiss&lt;/td&gt;
      &lt;td&gt;37&lt;/td&gt;
      &lt;td&gt;14&lt;/td&gt;
      &lt;td&gt;62.2&lt;/td&gt;
      &lt;td&gt;22&lt;/td&gt;
      &lt;td&gt;40.5&lt;/td&gt;
      &lt;td&gt;15&lt;/td&gt;
      &lt;td&gt;59.5&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;56.8&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;cpu-utilization-comparison&quot;&gt;CPU utilization comparison&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;k-NN engine&lt;/th&gt;
      &lt;th&gt;Concurrent segment search disabled&lt;/th&gt;
      &lt;th&gt;Concurrent segment search enabled (Lucene default number of slices)&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
      &lt;th&gt;Concurrent segment search with max slice count = 2&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
      &lt;th&gt;Concurrent segment search with max slice count = 4&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
      &lt;th&gt;Concurrent segment search with max slice count = 8&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Lucene&lt;/td&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;47&lt;/td&gt;
      &lt;td&gt;36&lt;/td&gt;
      &lt;td&gt;41&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
      &lt;td&gt;49&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;43&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;NMSLIB&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;38&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;29&lt;/td&gt;
      &lt;td&gt;19&lt;/td&gt;
      &lt;td&gt;41&lt;/td&gt;
      &lt;td&gt;31&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Faiss&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;34&lt;/td&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;19&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;30&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
      &lt;td&gt;44&lt;/td&gt;
      &lt;td&gt;34&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;As demonstrated by our performance benchmarks, enabling concurrent segment search with the default slice count delivers at least a &lt;strong&gt;60% improvement&lt;/strong&gt; in vector search service time while requiring only &lt;strong&gt;25–35% more CPU&lt;/strong&gt;. This increase in CPU utilization is expected because concurrent segment search runs on more CPU threads—the number of threads is equal to twice the number of CPU cores.&lt;/p&gt;

&lt;p&gt;We observed a similar improvement in service time when using multiple concurrent search clients. However, maximum CPU utilization also doubled, as expected, because of the increased number of active search threads running concurrently.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Our experiments clearly show that enabling concurrent segment search with the default slice count improves vector search query performance, albeit at the cost of higher CPU utilization. We recommend testing your workload to determine whether the additional parallelization achieved by increasing the slice count outweighs the additional processing overhead.&lt;/p&gt;

&lt;p&gt;Before running concurrent segment search, we recommend force-merging segments into a single segment to achieve better performance. The major disadvantage of this approach is that the time required for force-merging increases as segments grow larger. Thus, we recommend reducing the number of segments in accordance with your use case.&lt;/p&gt;

&lt;p&gt;By combining vector search with concurrent segment search, you can improve query performance and optimize search operations. To get started with concurrent segment search, explore the &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/concurrent-segment-search/&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;</content><author><name>vijay</name></author><category term="technical-posts" /><category term="search" /><summary type="html">In OpenSearch, data is stored in shards, which are further divided into segments. When you execute a search query, it runs sequentially across all segments of each shard involved in the query. As the number of segments increases, this sequential execution can increase query latency (the time it takes to retrieve the results) because the query has to wait for each segment run to complete before moving on to the next one. This delay becomes especially noticeable if some segments take longer to process queries than others.</summary></entry><entry><title type="html">Improving search efficiency and accuracy with the newest v2 neural sparse models</title><link href="https://kolchfa-aws.github.io/blog/neural-sparse-v2-models/" rel="alternate" type="text/html" title="Improving search efficiency and accuracy with the newest v2 neural sparse models" /><published>2024-08-21T00:00:00+00:00</published><updated>2024-08-21T20:01:58+00:00</updated><id>https://kolchfa-aws.github.io/blog/neural-sparse-v2-models</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/neural-sparse-v2-models/">&lt;p&gt;Neural sparse search is a novel and efficient method for semantic retrieval, &lt;a href=&quot;https://opensearch.org/blog/improving-document-retrieval-with-sparse-semantic-encoders/&quot;&gt;introduced in OpenSearch 2.11&lt;/a&gt;. Sparse encoding models encode text into (token, weight) entries, allowing OpenSearch to build indexes and perform searches using Lucene’s inverted index. Neural sparse search is efficient and generalizes well in out-of-domain (OOD) scenarios. We are excited to announce the release of our v2 series neural sparse models:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;v2-distill model&lt;/strong&gt;: This model &lt;strong&gt;reduces model parameters by 50%&lt;/strong&gt;, resulting in lower memory requirements and costs. It &lt;strong&gt;increases ingestion throughput by 1.39 on GPU and 1.74x on CPU&lt;/strong&gt;. The v2-distill architecture supports both the doc-only and bi-encoder modes.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;v2-mini model&lt;/strong&gt;: This model &lt;strong&gt;reduces model parameters by 75%&lt;/strong&gt;, also reducing memory requirements and costs. It &lt;strong&gt;increases ingestion throughput by 1.74x on GPUs and 4.18x on CPUs&lt;/strong&gt;. The v2-mini architecture supports the doc-only mode.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally, all v2 models achieve &lt;strong&gt;better search relevance&lt;/strong&gt;. The following table compares search relevance between the v1 and v2 models. All v2 models are now available in both &lt;a href=&quot;https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models&quot;&gt;OpenSearch&lt;/a&gt; and &lt;a href=&quot;https://huggingface.co/opensearch-project&quot;&gt;Hugging Face&lt;/a&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Model&lt;/th&gt;
      &lt;th&gt;Requires no inference for retrieval&lt;/th&gt;
      &lt;th&gt;Model parameters&lt;/th&gt;
      &lt;th&gt;AVG NDCG@10&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1&quot;&gt;opensearch-neural-sparse-encoding-v1&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;133M&lt;/td&gt;
      &lt;td&gt;0.524&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v2-distill&quot;&gt;opensearch-neural-sparse-encoding-v2-distill&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td&gt;67M&lt;/td&gt;
      &lt;td&gt;0.528&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1&quot;&gt;opensearch-neural-sparse-encoding-doc-v1&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;✔️&lt;/td&gt;
      &lt;td&gt;133M&lt;/td&gt;
      &lt;td&gt;0.490&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill&quot;&gt;opensearch-neural-sparse-encoding-doc-v2-distill&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;✔️&lt;/td&gt;
      &lt;td&gt;67M&lt;/td&gt;
      &lt;td&gt;0.504&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini&quot;&gt;opensearch-neural-sparse-encoding-doc-v2-mini&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;✔️&lt;/td&gt;
      &lt;td&gt;23M&lt;/td&gt;
      &lt;td&gt;0.497&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;from-v1-to-v2-series-models&quot;&gt;From v1 to v2 series models&lt;/h2&gt;

&lt;p&gt;The transition from v1 to v2 models in OpenSearch represents a significant advancement in neural sparse search capabilities.&lt;/p&gt;

&lt;h3 id=&quot;limitations-of-v1-models&quot;&gt;Limitations of v1 models&lt;/h3&gt;

&lt;p&gt;For neural sparse search, the sparse encoding model is critical because it influences document scoring and ranking, directly impacting search relevance. The model’s inference speed also affects ingestion throughput and client-side search latency in bi-encoder mode. When we released neural sparse search in OpenSearch, we also launched two neural sparse models supporting doc-only and bi-encoder mode, respectively.&lt;/p&gt;

&lt;p&gt;The primary challenge for v1 models is their large size. The v1 series models are based on the BERT base model, a 12-layer transformer with 133 million parameters. Compared to popular dense embedding models like &lt;a href=&quot;https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tas-b&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;all-MiniLM-L6-v2&lt;/code&gt;&lt;/a&gt;, the inference cost of the v1 models is 2x to 4x higher. Therefore, reducing model parameters without compromising search accuracy became essential for the v2 series.&lt;/p&gt;

&lt;h3 id=&quot;knowledge-distillation-from-an-ensemble-of-heterogeneous-teacher-models&quot;&gt;Knowledge distillation from an ensemble of heterogeneous teacher models&lt;/h3&gt;

&lt;p&gt;In neural models, performance is closely tied to the number of parameters. When using a smaller architecture, the training algorithm must be enhanced to offset any performance drop. For retrieval tasks, common techniques include &lt;strong&gt;pretraining&lt;/strong&gt; and &lt;strong&gt;knowledge distillation&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Pretraining&lt;/strong&gt;: This involves training models on large datasets often constructed using specific rules. Examples of such dataset elements are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(title, body)&lt;/code&gt; pairs in news articles or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(question, answer)&lt;/code&gt; pairs on Q&amp;amp;A websites. Pretraining enhances the model’s search relevance and ability to generalize.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Knowledge distillation&lt;/strong&gt;: Some models are powerful but suffer from inefficiencies because of their large size or complex structure. An example of such a model is the cross-encoder reranker. Distillation transfers knowledge from these teacher models to smaller ones, preserving high performance while eliminating these drawbacks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dense retrievers are usually pretrained using Information Noise-Contrastive Estimation (InfoNCE) loss, which improves consistency and uniformity of dense representations. However, we found that InfoNCE loss does not enhance sparse embeddings in doc-only mode as it does for dense models. Instead, knowledge distillation loss is a more effective optimization technique for sparse encoding models. Finding a suitable teacher model for large-scale pretraining is challenging. Siamese encoders are generally not strong models, while cross-encoders struggle with handling large pretraining datasets. Inspired by the performance boost demonstrated by dense and lexical (sparse) &lt;a href=&quot;https://opensearch.org/blog/hybrid-search/&quot;&gt;hybrid search&lt;/a&gt;, we decided to combine a bi-encoder sparse retriever with Siamese dense models to create a strong teacher model. This model combines the strengths of heterogeneous retrievers and is efficient enough for pretraining. We plan to publish a paper detailing the training procedure.&lt;/p&gt;

&lt;p&gt;Pretraining with knowledge distillation allows you to reduce the number of model parameters without compromising performance. Because we performed pretraining on a massive corpus&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, the &lt;strong&gt;v2 series models&lt;/strong&gt; achieved &lt;strong&gt;improved search relevance&lt;/strong&gt; while significantly &lt;strong&gt;reducing the number of model parameters&lt;/strong&gt;. We have released distill-BERT-based models for both the doc-only and bi-encoder modes (similar in size to &lt;a href=&quot;https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b&quot;&gt;tas-b&lt;/a&gt;) and a miniLM-based model for the doc-only mode (similar in size to &lt;a href=&quot;https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2&quot;&gt;all-MiniLM-L6-v2&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id=&quot;accelerating-model-inference&quot;&gt;Accelerating model inference&lt;/h2&gt;

&lt;p&gt;The v2 models continue to use the transformer architecture, reducing the number of parameters by decreasing the number of layers and the hidden dimension size. As a result, the smaller v2 models offer higher ingestion throughput and lower search latency. We benchmarked the v2 models in ingestion and search scenarios using the MS MARCO passage retrieval dataset on an OpenSearch 2.16 cluster with three nodes running on r7a.8xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPU deployment&lt;/strong&gt;: We used an Amazon SageMaker GPU endpoint to host the neural sparse model, connecting to it using a remote connector. For the complete deployment code, see &lt;a href=&quot;https://github.com/zhichao-aws/neural-search/blob/neural_sparse_sagemaker/neural_sparse_sagemaker_example/run.ipynb&quot;&gt;this example&lt;/a&gt;. The model was hosted on a &lt;strong&gt;g5.xlarge&lt;/strong&gt; GPU instance.&lt;br /&gt;
&lt;strong&gt;CPU deployment&lt;/strong&gt;: The model was deployed on &lt;strong&gt;all three nodes&lt;/strong&gt; in the cluster.&lt;/p&gt;

&lt;p&gt;In these experiments, we set the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;batch_size&lt;/code&gt; of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sparse_encoding&lt;/code&gt; ingestion processor to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;. We recorded the mean ingestion throughput and the p99 client-side latency for the Bulk API, using 20 clients for ingestion.&lt;/p&gt;

&lt;h4 id=&quot;remote-deployment-on-a-gpu&quot;&gt;Remote deployment on a GPU&lt;/h4&gt;

&lt;p&gt;The bulk size was set to 24. The experiment results are presented in the following figure. Compared with the v1 model, the &lt;strong&gt;v2-distill&lt;/strong&gt; model provided a &lt;strong&gt;1.39x increase in mean throughput&lt;/strong&gt;, and the &lt;strong&gt;v2-mini&lt;/strong&gt; model provided a &lt;strong&gt;1.74x increase in mean throughput&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-19-neural-sparse-v2-models/gpu_ingest.png&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;local-deployment-on-a-cpu&quot;&gt;Local deployment on a CPU&lt;/h4&gt;

&lt;p&gt;The bulk size was set to 8. Compared with the v1 model, the &lt;strong&gt;v2-distill&lt;/strong&gt; model provided a &lt;strong&gt;1.58x increase in mean throughput&lt;/strong&gt;, and the &lt;strong&gt;v2-mini&lt;/strong&gt; model provided a &lt;strong&gt;4.18x increase in mean throughput&lt;/strong&gt;, as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-19-neural-sparse-v2-models/cpu_ingest.png&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;search&quot;&gt;Search&lt;/h3&gt;

&lt;p&gt;In these experiments, we ingested 1 million documents into an index and used 20 clients to perform concurrent searches. We recorded the p99 latency for both client-side search and model inference. We tested search performance for the &lt;strong&gt;bi-encoder&lt;/strong&gt; mode.&lt;/p&gt;

&lt;h4 id=&quot;remote-deployment-on-a-gpu-1&quot;&gt;Remote deployment on a GPU&lt;/h4&gt;

&lt;p&gt;Compared with the v1 model, the &lt;strong&gt;v2-distill&lt;/strong&gt; model &lt;strong&gt;decreased client-side search latency by 11.7% and model inference latency by 23%&lt;/strong&gt;, as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-19-neural-sparse-v2-models/gpu_search.png&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;local-deployment-on-a-cpu-1&quot;&gt;Local deployment on a CPU&lt;/h4&gt;

&lt;p&gt;Compared with the v1 model, the &lt;strong&gt;v2-distill&lt;/strong&gt; model &lt;strong&gt;decreased client-side search latency by 30.2% and model inference latency by 33.3%&lt;/strong&gt;, as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-19-neural-sparse-v2-models/cpu_search.png&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;search-relevance-benchmarks&quot;&gt;Search relevance benchmarks&lt;/h2&gt;

&lt;p&gt;Similarly to the tests described in our previous &lt;a href=&quot;https://opensearch.org/blog/improving-document-retrieval-with-sparse-semantic-encoders/&quot;&gt;blog post&lt;/a&gt;, we evaluated model search relevance on a subset of the BEIR benchmark. The search relevance results are provided in the following table. &lt;strong&gt;All v2-series models outperform the v1 models with the same architecture&lt;/strong&gt;, indicating that distillation from a heterogeneous teacher model is a more effective method than pretraining using InfoNCE loss.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Model&lt;/th&gt;
      &lt;th&gt;Average&lt;/th&gt;
      &lt;th&gt;Trec-Covid&lt;/th&gt;
      &lt;th&gt;NFCorpus&lt;/th&gt;
      &lt;th&gt;NQ&lt;/th&gt;
      &lt;th&gt;HotpotQA&lt;/th&gt;
      &lt;th&gt;FiQA&lt;/th&gt;
      &lt;th&gt;ArguAna&lt;/th&gt;
      &lt;th&gt;Touche&lt;/th&gt;
      &lt;th&gt;DBPedia&lt;/th&gt;
      &lt;th&gt;SciDocs&lt;/th&gt;
      &lt;th&gt;FEVER&lt;/th&gt;
      &lt;th&gt;Climate FEVER&lt;/th&gt;
      &lt;th&gt;SciFact&lt;/th&gt;
      &lt;th&gt;Quora&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1&quot;&gt;opensearch-neural-sparse-encoding-v1&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;0.524&lt;/td&gt;
      &lt;td&gt;0.771&lt;/td&gt;
      &lt;td&gt;0.360&lt;/td&gt;
      &lt;td&gt;0.553&lt;/td&gt;
      &lt;td&gt;0.697&lt;/td&gt;
      &lt;td&gt;0.376&lt;/td&gt;
      &lt;td&gt;0.508&lt;/td&gt;
      &lt;td&gt;0.278&lt;/td&gt;
      &lt;td&gt;0.447&lt;/td&gt;
      &lt;td&gt;0.164&lt;/td&gt;
      &lt;td&gt;0.821&lt;/td&gt;
      &lt;td&gt;0.263&lt;/td&gt;
      &lt;td&gt;0.723&lt;/td&gt;
      &lt;td&gt;0.856&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v2-distill&quot;&gt;opensearch-neural-sparse-encoding-v2-distill&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;0.528&lt;/td&gt;
      &lt;td&gt;0.775&lt;/td&gt;
      &lt;td&gt;0.347&lt;/td&gt;
      &lt;td&gt;0.561&lt;/td&gt;
      &lt;td&gt;0.685&lt;/td&gt;
      &lt;td&gt;0.374&lt;/td&gt;
      &lt;td&gt;0.551&lt;/td&gt;
      &lt;td&gt;0.278&lt;/td&gt;
      &lt;td&gt;0.435&lt;/td&gt;
      &lt;td&gt;0.173&lt;/td&gt;
      &lt;td&gt;0.849&lt;/td&gt;
      &lt;td&gt;0.249&lt;/td&gt;
      &lt;td&gt;0.722&lt;/td&gt;
      &lt;td&gt;0.863&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1&quot;&gt;opensearch-neural-sparse-encoding-doc-v1&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;0.490&lt;/td&gt;
      &lt;td&gt;0.707&lt;/td&gt;
      &lt;td&gt;0.352&lt;/td&gt;
      &lt;td&gt;0.521&lt;/td&gt;
      &lt;td&gt;0.677&lt;/td&gt;
      &lt;td&gt;0.344&lt;/td&gt;
      &lt;td&gt;0.461&lt;/td&gt;
      &lt;td&gt;0.294&lt;/td&gt;
      &lt;td&gt;0.412&lt;/td&gt;
      &lt;td&gt;0.154&lt;/td&gt;
      &lt;td&gt;0.743&lt;/td&gt;
      &lt;td&gt;0.202&lt;/td&gt;
      &lt;td&gt;0.716&lt;/td&gt;
      &lt;td&gt;0.788&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill&quot;&gt;opensearch-neural-sparse-encoding-doc-v2-distill&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;0.504&lt;/td&gt;
      &lt;td&gt;0.690&lt;/td&gt;
      &lt;td&gt;0.343&lt;/td&gt;
      &lt;td&gt;0.528&lt;/td&gt;
      &lt;td&gt;0.675&lt;/td&gt;
      &lt;td&gt;0.357&lt;/td&gt;
      &lt;td&gt;0.496&lt;/td&gt;
      &lt;td&gt;0.287&lt;/td&gt;
      &lt;td&gt;0.418&lt;/td&gt;
      &lt;td&gt;0.166&lt;/td&gt;
      &lt;td&gt;0.818&lt;/td&gt;
      &lt;td&gt;0.224&lt;/td&gt;
      &lt;td&gt;0.715&lt;/td&gt;
      &lt;td&gt;0.841&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini&quot;&gt;opensearch-neural-sparse-encoding-doc-v2-mini&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;0.497&lt;/td&gt;
      &lt;td&gt;0.709&lt;/td&gt;
      &lt;td&gt;0.336&lt;/td&gt;
      &lt;td&gt;0.510&lt;/td&gt;
      &lt;td&gt;0.666&lt;/td&gt;
      &lt;td&gt;0.338&lt;/td&gt;
      &lt;td&gt;0.480&lt;/td&gt;
      &lt;td&gt;0.285&lt;/td&gt;
      &lt;td&gt;0.407&lt;/td&gt;
      &lt;td&gt;0.164&lt;/td&gt;
      &lt;td&gt;0.812&lt;/td&gt;
      &lt;td&gt;0.216&lt;/td&gt;
      &lt;td&gt;0.699&lt;/td&gt;
      &lt;td&gt;0.837&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;registering-and-deploying-v2-models&quot;&gt;Registering and deploying v2 models&lt;/h2&gt;

&lt;p&gt;OpenSearch now provides &lt;a href=&quot;https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models&quot;&gt;pretrained v2 sparse encoding models&lt;/a&gt;. Depending on the search mode, you need to register different models:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;In &lt;strong&gt;doc-only mode&lt;/strong&gt;, you need to register a sparse encoding model for ingestion and a tokenizer for search.&lt;/li&gt;
  &lt;li&gt;In &lt;strong&gt;bi-encoder mode&lt;/strong&gt;, you need to register a sparse encoding model that will be used for ingestion and search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For detailed setup instructions and tutorials, see &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/neural-sparse-search/&quot;&gt;Neural sparse search&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;registering-and-deploying-models-in-doc-only-mode&quot;&gt;Registering and deploying models in doc-only mode&lt;/h3&gt;

&lt;p&gt;To register and deploy models in doc-only mode, use the following steps.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Register and deploy a sparse encoding model for ingestion:&lt;/p&gt;

    &lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;POST&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;/_plugins/_ml/models/_register?deploy=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
 &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1.0.0&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;model_format&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;TORCH_SCRIPT&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
 &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Register and deploy a tokenizer for search:&lt;/p&gt;

    &lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;POST&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;/_plugins/_ml/models/_register?deploy=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
 &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1.0.1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;model_format&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;TORCH_SCRIPT&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
 &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Get model IDs for the model and tokenizer by calling the Tasks API:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; GET /_plugins/_ml/tasks/{task_id}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;registering-and-deploying-models-in-bi-encoder-mode&quot;&gt;Registering and deploying models in bi-encoder mode&lt;/h3&gt;

&lt;p&gt;To register and deploy models in bi-encoder mode, use the following steps.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Register and deploy a sparse encoding model for ingestion and search:&lt;/p&gt;

    &lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;POST&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;/_plugins/_ml/models/_register?deploy=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
 &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;version&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1.0.0&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
     &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;model_format&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;TORCH_SCRIPT&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
 &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Get the model ID for the sparse encoding model by calling the Tasks API:&lt;/p&gt;

    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; GET /_plugins/_ml/tasks/{task_id}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;further-reading&quot;&gt;Further reading&lt;/h2&gt;

&lt;p&gt;For more information about neural sparse search, see these previous blog posts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/improving-document-retrieval-with-sparse-semantic-encoders&quot;&gt;Improving document retrieval with sparse semantic encoders&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/A-deep-dive-into-faster-semantic-sparse-retrieval-in-OS-2.12&quot;&gt;A deep dive into faster semantic sparse retrieval in OpenSearch 2.12&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/Introducing-a-neural-sparse-two-phase-algorithm&quot;&gt;Introducing the neural sparse two-phase algorithm&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For pretraining, we selected a portion of the sentence transformer &lt;a href=&quot;https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data&quot;&gt;training data&lt;/a&gt;. We removed any data that also appeared in BEIR in order to maintain a zero-shot evaluation environment. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>zhichaog</name></author><category term="technical-posts" /><summary type="html">OpenSearch announces the availability of v2 series neural sparse models that enhance the efficiency of semantic sparse retrieval while accelerating inference and improving search.</summary></entry><entry><title type="html">OpenSearchCon North America 2024: A treasure trove of opportunities</title><link href="https://kolchfa-aws.github.io/blog/opensearchcon-north-america-a-treasure-trove-of-opportunities/" rel="alternate" type="text/html" title="OpenSearchCon North America 2024: A treasure trove of opportunities" /><published>2024-08-20T08:01:01+00:00</published><updated>2024-08-21T15:10:49+00:00</updated><id>https://kolchfa-aws.github.io/blog/opensearchcon-north-america-a-treasure-trove-of-opportunities</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/opensearchcon-north-america-a-treasure-trove-of-opportunities/">&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-20-opensearchcon-north-america-a-treasure-trove-of-opportunity/image1.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The third annual &lt;a href=&quot;/events/opensearchcon/2024/north-america/index.html&quot;&gt;OpenSearchCon North America&lt;/a&gt; is just around the corner, and this year’s event promises to be an absolute must-attend for anyone looking to dive into search; analytics, observability, and security; community; or operating OpenSearch. From the more than 40 main conference sessions to the exciting side events, there’s something for everyone at this year’s OpenSearchCon, taking place at the Hilton Union Square in San Francisco, September 24–26, 2024.&lt;/p&gt;

&lt;p&gt;At the heart of OpenSearchCon is, of course, &lt;a href=&quot;/events/opensearchcon/2024/north-america/sessions/index.html&quot;&gt;the main conference tracks&lt;/a&gt;. This year, the lineup of speakers and session topics is truly impressive. Industry leaders and technical experts will be diving deep into the latest advancements in OpenSearch, sharing their insights, and talking about real-world use cases. Whether you’re new to the platform or a seasoned veteran, you’re sure to come away with a wealth of knowledge and actionable ideas.&lt;/p&gt;

&lt;h3 id=&quot;unconference&quot;&gt;&lt;a href=&quot;https://opensearch.org/events/opensearchcon/2024/north-america/unconference/index.html&quot;&gt;Unconference&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;One of the most unique offerings at OpenSearchCon is &lt;a href=&quot;/events/opensearchcon/2024/north-america/unconference/index.html&quot;&gt;the Unconference&lt;/a&gt;, which is being held on Tuesday, September 24 from 1:00 PM to 4:00 PM. With no preplanned talks, the Unconference is a fun, informative, and interactive experience where attendees vote to determine which topics and speakers get time on stage. This is the perfect opportunity to engage in lively debates, share best practices, and connect with like-minded, passionate OpenSearch community members. The Unconference is also a great way to step away from the formal presentations and really dig into the issues that matter most to you.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-20-opensearchcon-north-america-a-treasure-trove-of-opportunity/image2.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;capture-the-flag&quot;&gt;&lt;a href=&quot;/events/opensearchcon/2024/north-america/capture-the-flag.html&quot;&gt;Capture the Flag&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;For the competitive, security-minded attendees, our partners at &lt;a href=&quot;https://graylog.org/&quot;&gt;Graylog&lt;/a&gt; will be &lt;a href=&quot;/events/opensearchcon/2024/north-america/capture-the-flag.html&quot;&gt;hosting a “Logs in the Shell” Capture the Flag (CTF)&lt;/a&gt; event on Thursday, September 26 from 10:00 AM to 4:30 PM. This hands-on challenge will put your open-source security skills to the test as you work to identify and exploit vulnerabilities in a simulated environment. Whether you’re a seasoned CTF participant or new to the game, this is a chance to learn, network, and have some friendly competition. If you are interested in winning prizes and bragging rights, &lt;a href=&quot;https://airtable.com/appWltifOss0C1Ze3/pagKjDHOEPqWvQDnw/form&quot;&gt;be sure to register and come ready to capture the flag&lt;/a&gt;!&lt;/p&gt;

&lt;h3 id=&quot;paid-training&quot;&gt;&lt;a href=&quot;https://opensearch.org/events/opensearchcon/2024/north-america/osc-training.html&quot;&gt;Paid training&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;For those looking for more in-depth training, renowned search relevance experts and OpenSearch partners&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://opensourceconnections.com/&quot;&gt;OpenSource Connections&lt;/a&gt; will lead a one-day intensive training course, &lt;a href=&quot;/events/opensearchcon/2024/north-america/osc-training.html&quot;&gt;Think Like a Relevance Engineer with OpenSearch,&lt;/a&gt; scheduled for Thursday, September 26 from 9:00 AM to 5:30 PM. Suitable for search relevance engineers, search product owners, search data scientists, and other members of search teams, this hands-on session will provide you a comprehensive understanding of the OpenSearch platform and the confidence to apply your newfound knowledge in your environment. You can &lt;a href=&quot;https://www.eventbee.com/v/opensearch-tlre-intensive-at-opensearchcon-us-24/event?eid=237312684#/tickets&quot;&gt;register here to save your seat&lt;/a&gt; for this paid training that is subsidized by the OpenSearch Project.&lt;/p&gt;

&lt;h3 id=&quot;free-workshop&quot;&gt;&lt;a href=&quot;/events/opensearchcon/2024/north-america/workshops/when-life-hands-you-data-grab-opensearch.html&quot;&gt;Free workshop&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;Back by popular demand, OpenSearch will be conducting a &lt;a href=&quot;/events/opensearchcon/2024/north-america/workshops/when-life-hands-you-data-grab-opensearch.html&quot;&gt;free, half-day workshop,&lt;/a&gt; &lt;a href=&quot;/events/opensearchcon/2024/north-america/workshops/when-life-hands-you-data-grab-opensearch.html&quot;&gt;When Life Hands You Data, Grab OpenSearch&lt;/a&gt;. Taking place on Thursday, September 26 from 9:00 AM to 12:00 PM, this workshop will provides you a hands-on experience where you can learn about what OpenSearch can help you do with your data: how to ingest, secure, search, aggregate, visualize, and analyze data using the OpenSearch toolkit. Ideal for developers, DevOps practitioners, and IT stakeholders, this session fills up fast, so &lt;a href=&quot;https://airtable.com/appWltifOss0C1Ze3/pagr2WBnf0KJBWhx9/form&quot;&gt;be sure to register here&lt;/a&gt;, and we’ll make sure your credentials are ready for you on the day of the workshop.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-20-opensearchcon-north-america-a-treasure-trove-of-opportunity/image3.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;something-for-everyone-at-opensearchcon&quot;&gt;&lt;a href=&quot;/events/opensearchcon/2024/north-america/index.html&quot;&gt;Something for everyone at OpenSearchCon&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;Whether you’re attending OpenSearchCon for the first time or returning as a seasoned veteran, this year’s event is sure to be an unforgettable experience. With so many opportunities to learn, network, and engage, it’s an event you won’t want to miss. &lt;a href=&quot;/events/opensearchcon/2024/north-america/register.html&quot;&gt;Register now&lt;/a&gt;, and we’ll see you in San Francisco!&lt;/p&gt;</content><author><name>pattijuric</name></author><category term="[&quot;community-updates&quot;]" /><summary type="html">The third annual OpenSearchCon North America is just around the corner, and this year’s event promises to be an absolute must-attend for anyone looking to dive into search; analytics, observability, and security; community; or operating OpenSearch. From the more than 40 main conference sessions to the exciting side events, there’s something for everyone at this year’s OpenSearchCon, taking place at the Hilton Union Square in San Francisco, September 24–26, 2024.</summary></entry><entry><title type="html">Using heuristic evaluation to inform the user experience strategy of a product</title><link href="https://kolchfa-aws.github.io/blog/using-heuristic-evaluation-to-inform-the-user-experience-strategy-of-a-product/" rel="alternate" type="text/html" title="Using heuristic evaluation to inform the user experience strategy of a product" /><published>2024-08-16T08:01:01+00:00</published><updated>2024-08-16T18:08:47+00:00</updated><id>https://kolchfa-aws.github.io/blog/using-heuristic-evaluation-to-inform-the-user-experience-strategy-of-a-product</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/using-heuristic-evaluation-to-inform-the-user-experience-strategy-of-a-product/">&lt;p&gt;Analytics products such as OpenSearch rely heavily on visual interfaces such as dashboards to communicate the meaning of data to end users. These products offer powerful tools to construct visualizations, and it is crucial that the user workflow is both intuitive and smooth.&lt;/p&gt;

&lt;p&gt;Of the different research methodologies that a product team can utilize to improve their dashboard offerings, conducting a heuristic evaluation to help inform the user experience strategy can be valuable. This research method helps amplify the benefits of best-in-line research methodologies and helps inform product improvement recommendations.&lt;/p&gt;

&lt;p&gt;Conducting a heuristic evaluation is significantly more cost effective than methodologies involving user samples and can often be performed by internal teams on an ongoing basis. The ultimate value of this methodology is that it can be executed quickly, highlight actionable feedback, and help shape the user experience strategy. This method is not a replacement for primary research but is a valuable tool in any organization’s mixed-method research toolkit.&lt;/p&gt;

&lt;p&gt;The OpenSearch Project recently partnered with &lt;a href=&quot;https://www.steyer.net/insights/&quot;&gt;Steyer Insights&lt;/a&gt; to conduct a heuristic evaluation of &lt;a href=&quot;https://playground.opensearch.org/app/home#/&quot;&gt;OpenSearch Playground&lt;/a&gt;, with a specific focus on identifying gaps that a new user might encounter when getting started with OpenSearch.&lt;/p&gt;

&lt;h2 id=&quot;conducting-a-heuristic-evaluation&quot;&gt;Conducting a heuristic evaluation&lt;/h2&gt;

&lt;p&gt;A heuristic evaluation is a usability inspection method that helps identify issues in a UI design. Evaluators examine the interface and assess its compliance with a set of guidelines, called heuristics, that make systems easy to use. Heuristic evaluations are used to improve the quality of UI design early in the product lifecycle.&lt;/p&gt;

&lt;p&gt;We wanted to conduct a heuristic evaluation of the OpenSearch Dashboards experience. To do this, we applied two frameworks: &lt;a href=&quot;https://uitraps.com/&quot;&gt;UI Tenets &amp;amp; Traps&lt;/a&gt; and &lt;a href=&quot;https://www.nngroup.com/articles/ten-usability-heuristics/&quot;&gt;Nielsen’s 10 Usability Heuristics&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;UI Tenets &amp;amp; Traps is a framework that detects common UI design errors that can affect the user experience, such as invisible elements, poor groupings, and uncomprehended elements.&lt;/p&gt;

&lt;p&gt;Nielsen’s 10 Usability Heuristics are general guidelines for UI design, such as visibility of system status, error prevention, and user control and freedom. These methods help in developing a UI that is easy to use, intuitive, and meets the needs of the user.&lt;/p&gt;

&lt;h2 id=&quot;ux-heuristic-evaluation-opensearch-playground&quot;&gt;UX heuristic evaluation: OpenSearch Playground&lt;/h2&gt;

&lt;p&gt;The goal of this research was to produce a document that not only identifies the tenets, traps, and UX heuristic violations of OpenSearch Playground but also offers the product and UX teams immediately actionable insights.&lt;/p&gt;

&lt;p&gt;To improve the UI/UX of OpenSearch Playground, we tried to view the product through the fresh eyes of a new customer. To do so, we deployed the research principles of a UX heuristic evaluation. As a result of this process, we identified four key recommendations. These recommendations are informed by the most commonly identified violations of tenets, traps, and UX heuristics and are summarized below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-07-19-using-heuristic-evaluation-to-inform-the-user-experience-strategy-of-a-product/Image.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Focus on workflows that help customers achieve a goal rather than complete specific tasks&lt;/strong&gt;. The OpenSearch Playground home page points customers toward concrete tasks like ingesting data, exploring data, and trying a query assistant. Consider organizing the site to support customer goals such as proactive anomaly monitoring, data visualization, and getting started with minimal expertise.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Use consistent terminology and workflows across all tools and features to help customers work faster and build intuitive mental models.&lt;/strong&gt; There are several instances where the experience changes depending on whether users are logged in or not. There are also instances where terms are used that differ from industry standards, like “buckets.” Simplifying terminology and workflows to follow industry best practices will increase customer confidence in the product.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Minimize available actions in OpenSearch Playground that trial users cannot access&lt;/strong&gt;. A recurring Tenet &amp;amp; Trap violation was Inviting Dead End. Users encounter options (for example, “add sample data”) that they do not have access to and are unable to perform. Focusing on eliminating these encounters and errors makes it easier for potential customers to assess the product.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Investigate potential content gaps on &lt;a href=&quot;http://opensearch.org/&quot;&gt;OpenSearch.org&lt;/a&gt;&lt;/strong&gt; that may hinder a new user trial. The website appears to have information gaps that add friction to understanding and trialing OpenSearch Playground.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;takeaways&quot;&gt;Takeaways&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Perform a heuristic analysis to rapidly identify common traps and violations in your product design.&lt;/li&gt;
  &lt;li&gt;You can scope your research as broadly or narrowly as needed.&lt;/li&gt;
  &lt;li&gt;UX heuristic insights are often universally applicable to your entire UX; be sure to broaden your understanding (and dissemination) of your findings to ensure maximum derived value from your efforts.&lt;/li&gt;
  &lt;li&gt;You can engage outside UX research experts or utilize existing resources to apply these UI Tenets &amp;amp; Traps and Nielsen’s 10 Usability Heuristics to your product.&lt;/li&gt;
  &lt;li&gt;Implement a regular heuristic research practice to monitor and track improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;about-steyer&quot;&gt;About Steyer&lt;/h3&gt;

&lt;p&gt;“We don’t just analyze data, we create meaning from it. We don’t just report insights, we inspire action from them.”&lt;/p&gt;

&lt;p&gt;Steyer partners with our clients to make meaningful connections between people and information, solving our clients’ business problems with the tools we know best: strategic analysis, user research, and the creation, organization, and revision of business content. With nearly three decades of delighting both our clients and our own team members, we understand what it takes to effect real change in the real world.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-07-19-using-heuristic-evaluation-to-inform-the-user-experience-strategy-of-a-product/Image1.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;</content><author><name>apasun</name></author><category term="[&quot;community-updates&quot;]" /><summary type="html">Analytics products such as OpenSearch rely heavily on visual interfaces such as dashboards to communicate the meaning of data to end users. These products offer powerful tools to construct visualizations, and it is crucial that the user workflow is both intuitive and smooth.</summary></entry><entry><title type="html">Introducing the neural sparse two-phase algorithm</title><link href="https://kolchfa-aws.github.io/blog/Introducing-a-neural-sparse-two-phase-algorithm/" rel="alternate" type="text/html" title="Introducing the neural sparse two-phase algorithm" /><published>2024-08-13T00:00:00+00:00</published><updated>2024-08-14T03:56:29+00:00</updated><id>https://kolchfa-aws.github.io/blog/Introducing-a-neural-sparse-two-phase-algorithm</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/Introducing-a-neural-sparse-two-phase-algorithm/">&lt;p&gt;Neural sparse search is a new, efficient method of semantic retrieval introduced in OpenSearch 2.11. Like dense semantic matching, neural sparse search interprets queries using semantic techniques, allowing it to handle terms that traditional lexical search might not understand. While dense semantic models excel at finding semantically similar results, they sometimes miss specific terms, particularly exact matches. Neural sparse search addresses this by introducing sparse representations, which capture both semantic similarities and specific terms. This dual capability enables better explanation and presentation of results through text matching by overcoming the limitations of purely semantic matching and offering a more comprehensive retrieval solution.&lt;/p&gt;

&lt;p&gt;Neural sparse search first expands text (either a query or a document) into a larger set of terms, each weighted by its semantic relevance. It then uses Lucene’s efficient term vector computation to identify the highest-scoring results. This approach leads to reduced index and memory costs as well as lower computational expenses. For example, while dense encoding using k-NN retrieval increases RAM costs by 7.9% at search time, neural sparse search uses a native Lucene index, avoiding any increase in RAM cost at search time. Moreover, neural sparse search leads to a much smaller index size compared to dense encoding. A document-only model generates an index that is only 10.4% the size of a dense encoding index, and for a bi-encoder, the index size is 7.2% of a dense encoding index.&lt;/p&gt;

&lt;p&gt;Given these advantages, we’ve continued to refine neural sparse retrieval to make it even more efficient. OpenSearch 2.15 introduced a new feature: the two-phase search pipeline. This pipeline splits the neural sparse query terms into two categories: high-scoring tokens that are more relevant to the search and low-scoring tokens that are less relevant. Initially, the algorithm selects documents using the high-scoring tokens and then recalculates the score for those documents by including both high- and low-scoring tokens. This process significantly reduces computational load while maintaining the quality of the final ranking.&lt;/p&gt;

&lt;h2 id=&quot;the-two-phase-algorithm&quot;&gt;The two-phase algorithm&lt;/h2&gt;

&lt;p&gt;The two-phase search algorithm operates in two stages:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Initial Phase:&lt;/strong&gt; The algorithm uses model inference to quickly select a set of candidate documents using high-scoring tokens from the query. These high-scoring tokens, which constitute a small portion of the total number of tokens, have significant weight—or relevance—allowing for a rapid identification of potentially relevant documents. This process significantly reduces the number of documents that need to be processed, thereby lowering computational costs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Recalculation Phase:&lt;/strong&gt; The algorithm then recalculates the scores for the candidate documents selected in the first phase, this time including both high-scoring and low-scoring tokens from the query. Although low-scoring tokens carry less weight individually, they provide valuable information as part of a comprehensive evaluation, particularly when long-tail terms contribute significantly to the overall score. This allows the algorithm to determine final document scores with greater accuracy.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By processing documents in stages, this approach reduces computational overhead while mainitaining accuracy. The rapid selection in the first phase enhances efficiency, while the more detailed scoring in the second phase ensures accuracy. Even when handling a large number of long-tail terms, the results remain of high quality, with a notable improvement in computational efficiency.&lt;/p&gt;

&lt;h2 id=&quot;performance-metrics&quot;&gt;Performance metrics&lt;/h2&gt;

&lt;p&gt;We measured the speed and quality of search results using neural sparse search.&lt;/p&gt;

&lt;h3 id=&quot;test-environment&quot;&gt;Test environment&lt;/h3&gt;

&lt;p&gt;Performance was measured on OpenSearch clusters containing 3 m5.4xlarge nodes using &lt;a href=&quot;https://opensearch.org/docs/latest/benchmark/&quot;&gt;OpenSearch Benchmark&lt;/a&gt;. The tests were conducted with 20 simultaneous clients, 50 warmup iterations, and 200 test iterations.&lt;/p&gt;

&lt;h3 id=&quot;test-dataset&quot;&gt;Test dataset&lt;/h3&gt;

&lt;p&gt;For search quality, we tested multiple BEIR datasets and measured the relative quality of the results. The following table presents parameter information for these datasets.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Dataset&lt;/th&gt;
      &lt;th&gt;BEIR-Name&lt;/th&gt;
      &lt;th&gt;Queries&lt;/th&gt;
      &lt;th&gt;Corpus&lt;/th&gt;
      &lt;th&gt;Rel D/Q&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;NQ&lt;/td&gt;
      &lt;td&gt;nq&lt;/td&gt;
      &lt;td&gt;3,452&lt;/td&gt;
      &lt;td&gt;2.68M&lt;/td&gt;
      &lt;td&gt;1.2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HotpotQA&lt;/td&gt;
      &lt;td&gt;hotpotqa&lt;/td&gt;
      &lt;td&gt;7,405&lt;/td&gt;
      &lt;td&gt;5.23M&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;DBPedia&lt;/td&gt;
      &lt;td&gt;dbpedia-entity&lt;/td&gt;
      &lt;td&gt;400&lt;/td&gt;
      &lt;td&gt;4.63M&lt;/td&gt;
      &lt;td&gt;38.2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;FEVER&lt;/td&gt;
      &lt;td&gt;fever&lt;/td&gt;
      &lt;td&gt;6,666&lt;/td&gt;
      &lt;td&gt;5.42M&lt;/td&gt;
      &lt;td&gt;1.2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Climate-FEVER&lt;/td&gt;
      &lt;td&gt;climate-fever&lt;/td&gt;
      &lt;td&gt;1,535&lt;/td&gt;
      &lt;td&gt;5.42M&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;p99-latency&quot;&gt;p99 latency&lt;/h3&gt;

&lt;p&gt;The two-phase algorithm maintains the same inference time cost as the existing neural sparse search algorithm. To provide a clearer comparison of acceleration in the search phase, we excluded the inference step from latency calculations because inference is significantly affected by hardware type. The latency benchmark provided in this post uses raw vector search and excludes any additional impact resulting from inference time.&lt;/p&gt;

&lt;h4 id=&quot;doc-only-mode&quot;&gt;Doc-only mode&lt;/h4&gt;

&lt;p&gt;In doc-only mode, the two-phase processor can significantly decrease query latency, as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm/two-phase-doc-model-p99-latency.jpg&quot; alt=&quot;Two-Phase Doc Model P99 Latency&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The following is the &lt;strong&gt;average latency&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Without the two-phase algorithm: &lt;strong&gt;198 ms&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;With the two-phase algorithm: &lt;strong&gt;124 ms&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on the data distribution, the two-phase processor achieved an &lt;strong&gt;increase in speed ranging from 1.22x to 1.78x&lt;/strong&gt;.&lt;/p&gt;

&lt;h4 id=&quot;bi-encoder-mode&quot;&gt;Bi-encoder mode&lt;/h4&gt;

&lt;p&gt;In bi-encoder mode, the two-phase algorithm can significantly decrease query latency, as shown in the following figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/media/blog-images/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm/two-phase-bi-encoder-p99-latency.jpg&quot; alt=&quot;Two-Phase Bi-Encoder P99 Latency&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The following is the &lt;strong&gt;average latency&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Without the two-phase algorithm: &lt;strong&gt;617 ms&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;With the two-phase algorithm: &lt;strong&gt;122 ms&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on the data distribution, the two-phase processor achieved an &lt;strong&gt;increase in speed ranging from 4.15x to 6.87x&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;try-it-out&quot;&gt;Try it out&lt;/h2&gt;

&lt;p&gt;To try the two-phase processor, follow these steps.&lt;/p&gt;

&lt;h3 id=&quot;step-1-set-up-a-neural_sparse_two_phase_processor&quot;&gt;Step 1: Set up a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;neural_sparse_two_phase_processor&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;First, configure a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;neural_sparse_two_phase_processor&lt;/code&gt; with the default parameters:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;PUT&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;/_search/pipeline/&amp;lt;custom-pipeline-name&amp;gt;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;request_processors&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;neural_sparse_two_phase_processor&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;tag&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;neural-sparse&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;This processor creates a neural sparse two-phase processor, which can speed up neural sparse queries!&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;step-2-set-the-default-search-pipeline-to-neural_sparse_two_phase_processor&quot;&gt;Step 2: Set the default search pipeline to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;neural_sparse_two_phase_processor&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;Assuming that you already have a neural sparse index, set the index’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index.search.default_pipeline&lt;/code&gt; to the pipeline created in the previous step:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;PUT&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;err&quot;&gt;/&amp;lt;your-index-name&amp;gt;/_settings&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;index.search.default_pipeline&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&amp;lt;custom-pipeline-name&amp;gt;&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next steps&lt;/h2&gt;

&lt;p&gt;For more information about the two-phase processor, see &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/&quot;&gt;Neural sparse query two-phase processor&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;further-reading&quot;&gt;Further reading&lt;/h2&gt;

&lt;p&gt;Read more about neural sparse search:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/improving-document-retrieval-with-sparse-semantic-encoders&quot;&gt;Improving document retrieval with sparse semantic encoders&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/A-deep-dive-into-faster-semantic-sparse-retrieval-in-OS-2.12&quot;&gt;A deep dive into faster semantic sparse retrieval in OpenSearch 2.12&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/blog/neural-sparse-v2-models&quot;&gt;Advancing Search Quality and Inference Speed with v2 Series Neural Sparse Models&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</content><author><name>zhichaog</name></author><category term="technical-posts" /><summary type="html">We are excited to announce the release of a new feature in OpenSearch 2.15, a two-phase search pipeline for neural sparse retrieval. In testing, this feature has achieved significant speed improvements.</summary></entry><entry><title type="html">Introducing OpenSearch 2.16</title><link href="https://kolchfa-aws.github.io/blog/introducing-opensearch-2-16/" rel="alternate" type="text/html" title="Introducing OpenSearch 2.16" /><published>2024-08-07T21:20:00+00:00</published><updated>2024-08-08T18:51:53+00:00</updated><id>https://kolchfa-aws.github.io/blog/introducing-opensearch-2-16</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/introducing-opensearch-2-16/">&lt;p&gt;OpenSearch 2.16 is &lt;a href=&quot;https://opensearch.org/downloads.html&quot;&gt;here&lt;/a&gt; with an expanded toolkit to make it easier to build search and generative AI applications, along with more advancements in performance and efficiency and upgrades that improve ease of use. You can try the latest version using OpenSearch Dashboards on &lt;a href=&quot;https://playground.opensearch.org/app/home&quot;&gt;OpenSearch Playground&lt;/a&gt; and read the &lt;a href=&quot;https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md&quot;&gt;release notes&lt;/a&gt; for a complete rundown of what’s new in this release. Here are some of the new and updated features you can put to work in OpenSearch 2.16.&lt;/p&gt;

&lt;h2 id=&quot;search-and-machine-learning&quot;&gt;&lt;em&gt;Search and machine learning&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;OpenSearch 2.16 adds a number of features to OpenSearch’s &lt;strong&gt;search&lt;/strong&gt; &lt;strong&gt;and machine learning&lt;/strong&gt; (ML) toolkit to help accelerate application development and enable generative AI workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Boost efficiency with vector compression automation for byte-precision vector quantization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenSearch 2.9 added support for &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/knn/knn-vector-quantization/&quot;&gt;byte-quantized vectors&lt;/a&gt; on indexes built using the &lt;a href=&quot;https://lucene.apache.org/&quot;&gt;Lucene&lt;/a&gt; k-NN engine. This feature can reduce costs and lower query latency through typically favorable search accuracy trade-offs. Byte vector quantization works by compressing your vectors from 4 bytes of data per dimension to 1 byte. This effectively quarters your memory requirements and, in turn, the cost of running your cluster. It adds the benefit of lower query latency, as fewer computations are required to execute a query. Previously, users had to preprocess their vectors off-cluster; in this release, you can configure OpenSearch to byte quantize your full-precision vectors on-cluster as part of your indexing tasks. Support for this capability on the FAISS k-NN engine is targeted for the next release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build more flexible search pipelines with sort search and split search processors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In OpenSearch 2.16, we’ve added &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/search-pipelines/search-processors/&quot;&gt;sort search and split search processors&lt;/a&gt; to our search pipeline toolset. The sort processor can be configured within a search pipeline to sort search responses, and the split processor is used to split strings into arrays of substrings. These processors were added to provide more flexibility and support for more use cases. For instance, along with the ML inference search processor, you can now create a reranking search pipeline that uses a custom ranking model to rescore results and then use the sort processor to re-sort them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lower the cost of vector search workloads with binary vector support&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This release delivers support for &lt;a href=&quot;https://opensearch.org/docs/latest/field-types/supported-field-types/knn-vector/#binary-k-nn-vectors&quot;&gt;binary vectors&lt;/a&gt;, enabling 32x compression on full-precision 32-bit vectors. With the ability to index and retrieve binary vectors at 1 bit per dimension, you can now leverage the latest ML models that emit binary vectors and harness the full potential of OpenSearch’s vector search capabilities. This feature affords high recall performance, especially for large dimensional vectors (&amp;gt;=768 dimensions), making large-scale deployments more economical and efficient. OpenSearch 2.16 also introduces Hamming distance support, enabling bitwise distance measurements for scoring binary vectors. Binary vectors can be used with both the approximate and exact k-NN variants of vector search. Approximate k-NN search is initially available on the FAISS engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enrich search flows by integrating any ML model into OpenSearch AI-native APIs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenSearch 2.14 brought enhancements to the AI connector framework that made it possible for users to natively integrate any AI/ML provider with OpenSearch. These connectors enable users to create AI enrichments within ingestion tasks through the Ingest API by configuring ML inference ingest processors that connect to these AI providers. In OpenSearch 2.16, you can also enable AI enrichments within search flows through the Search API by configuring &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/search-pipelines/ml-inference-search-request/&quot;&gt;ML inference search processors&lt;/a&gt; through the same AI connectors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expand ML capabilities with batch inference support for AI connectors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This release includes enhancements to the AI connector framework that make it possible for integrators to add &lt;a href=&quot;https://opensearch.org/docs/latest/ml-commons-plugin/api/model-apis/batch-predict/&quot;&gt;batch inference support&lt;/a&gt; to their connectors. Previously, the AI connectors were limited to real-time, synchronous ML inference workloads. With this enhancement, connectors can run asynchronous batch inference jobs for better efficiency with large datasets. Users will be able to run batch API calls to run a batch job through a connector to a provider like Amazon SageMaker. In future releases, OpenSearch will provide functionality that will enable you to run these batch inference jobs through OpenSearch ingestion tasks.&lt;/p&gt;

&lt;h2 id=&quot;ease-of-use&quot;&gt;&lt;em&gt;Ease of use&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;This release also includes tools designed to enhance OpenSearch’s &lt;strong&gt;ease of use&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Easily optimize performance for your use case with application-based configuration templates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenSearch provides a versatile set of tools for a wide range of use cases, such as text and image search, observability, log analytics, security, and much more. This versatility means that setting up OpenSearch for a new use case can entail time-consuming effort spent fine-tuning your indexes to your application requirements. With the 2.16 release, we’ve made the process of optimizing new applications faster with the introduction of application-based configuration templates. These templates work with the &lt;a href=&quot;https://opensearch.org/docs/latest/im-plugin/index-templates/&quot;&gt;index template&lt;/a&gt; functionality to provide default settings that can simplify tuning your indexes for compute and storage resource performance as well as for usability through Index State Management (ISM).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access multiple data sources for more OpenSearch Dashboards plugins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As part of the ongoing effort to support &lt;a href=&quot;https://opensearch.org/docs/latest/dashboards/management/multi-data-sources/&quot;&gt;multiple data sources&lt;/a&gt; across OpenSearch Dashboards, OpenSearch 2.14 added support for nine external Dashboards plugins. This release adds support for more plugins to help you manage data across OpenSearch clusters and combine visualizations into a single dashboard. Two more external Dashboards plugins are now supported: Notebooks and Snapshot. All plugins now support version decoupling in place to filter out incompatible data sources from the selection.&lt;/p&gt;

&lt;h2 id=&quot;cost-performance-scale&quot;&gt;&lt;em&gt;Cost, performance, scale&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;This release also delivers new functionality focused on helping you improve the &lt;strong&gt;cost, performance, and scale&lt;/strong&gt; of your OpenSearch deployments, including the addition of fast-filter optimization for range aggregations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Improve range aggregation performance by as much as 100x&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Recent releases introduced fast-filter optimizations to improve performance for the special case of date histogram aggregations, and with OpenSearch 2.16, you can now apply these optimizations to general &lt;a href=&quot;https://opensearch.org/docs/latest/aggregations/bucket/range/&quot;&gt;range aggregations&lt;/a&gt;. These updates have been shown to deliver a &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/pull/13865#:~:text=0%20%7C%20%20%20%20%20%20%20%20%20%20%200%20%7C%20%20%20%20%20%20%20%20%20%20%20%200%20%7C%20%20%20%20%20%20%25%20%7C-,noaa,-%7C%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%2050th%20percentile%20latency&quot;&gt;performance improvement of more than 100x&lt;/a&gt; in simple range aggregation for the NOAA workload.&lt;/p&gt;

&lt;h2 id=&quot;stability-availability-resiliency&quot;&gt;&lt;em&gt;Stability, availability, resiliency&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;This release introduces updates to help you improve the &lt;strong&gt;stability, availability, and resiliency&lt;/strong&gt; of your OpenSearch clusters, including several updates to cluster management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scale large workloads with cluster manager optimizations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenSearch users can encounter challenges when scaling their domains across large workloads. Often, the cluster manager is the cause of the bottleneck. This release brings several updates to the cluster manager, including network optimization of cluster manager APIs, compute optimization of pending task processing, and incremental read/writes for routing tables. The result is a reduced load on the cluster manager, which paves the way for the cluster manager to support a greater number of nodes and shards. Additionally, further optimizations to OpenSearch’s shard allocation have reduced the overhead of scaling and operating large domains. Together, these updates will help users scale up to more nodes and larger volumes of data.&lt;/p&gt;

&lt;h2 id=&quot;security-analytics&quot;&gt;&lt;em&gt;Security Analytics&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;This release also includes a major expansion of OpenSearch’s Security Analytics capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expand visibility into potential security threats&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenSearch &lt;a href=&quot;https://opensearch.org/platform/security-analytics/index.html&quot;&gt;Security Analytics&lt;/a&gt; provides a comprehensive toolkit with more than 3,300 prepackaged, open-source Sigma rules for detecting, investigating, and analyzing potential security threats across your monitored infrastructure. With new security threats continuously emerging, users tell us they want to use external sources of threat intelligence to find malicious activity.&lt;/p&gt;

&lt;p&gt;With this release, OpenSearch adds &lt;a href=&quot;https://opensearch.org/docs/latest/security-analytics/threat-intelligence/getting-started/&quot;&gt;threat intelligence&lt;/a&gt; capabilities as part of its out-of-the-box Security Analytics solution. This functionality enables you to use customized Structured Threat Information Expression (STIX)-compliant threat intelligence feeds by uploading a file locally or referencing an Amazon S3 bucket. Supported malicious indicator of compromise (IOC) types include IPv4-Address, IPv6-Address, domains, and file hashes. Users can apply this information to their data to help find potential threats before they escalate. Combined with the threat detection provided by Sigma rules, this functionality offers a more comprehensive view into security threats, affording greater insights to support decision-making and remediation.&lt;/p&gt;

&lt;h2 id=&quot;deprecating-centos7&quot;&gt;&lt;em&gt;Deprecating CentOS7&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;We previously issued a &lt;a href=&quot;https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.12.0.md#deprecation-notice&quot;&gt;deprecation notice in 2.12&lt;/a&gt; regarding CentOS Linux 7, which reached end-of-life on June 30, 2024. Following the official &lt;a href=&quot;https://blog.centos.org/2023/04/end-dates-are-coming-for-centos-stream-8-and-centos-linux-7/&quot;&gt;notice&lt;/a&gt; issued by the CentOS Project, the OpenSearch Project is also &lt;a href=&quot;https://github.com/opensearch-project/opensearch-build/issues/4379&quot;&gt;deprecating CentOS Linux 7&lt;/a&gt; as a continuous integration build image and supported operating system in the 2.16 release. To view OpenSearch’s compatible operating systems, visit the &lt;a href=&quot;https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/#operating-system-compatibility&quot;&gt;Operating system compatibility&lt;/a&gt; page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Getting started with OpenSearch 2.16&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Today’s release is &lt;a href=&quot;https://www.opensearch.org/downloads.html&quot;&gt;available for download&lt;/a&gt; and ready to explore on &lt;a href=&quot;https://playground.opensearch.org/app/home#/&quot;&gt;OpenSearch Playground&lt;/a&gt;. For more information about this release, check out the &lt;a href=&quot;https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md&quot;&gt;release notes&lt;/a&gt; as well as the &lt;a href=&quot;https://github.com/opensearch-project/documentation-website/blob/main/release-notes/opensearch-documentation-release-notes-2.16.0.md&quot;&gt;documentation release notes&lt;/a&gt;. Feel free to share your feedback on this release on our &lt;a href=&quot;https://forum.opensearch.org/&quot;&gt;community forum&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Connect with the OpenSearch community in person!&lt;/em&gt; &lt;em&gt;Our third annual&lt;/em&gt; &lt;a href=&quot;https://opensearch.org/events/opensearchcon/2024/north-america/index.html&quot;&gt;&lt;em&gt;OpenSearchCon North America&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is coming to San Francisco September 24–26. Join us and meet your fellow community members while learning about new and upcoming OpenSearch developments!&lt;/em&gt;&lt;/p&gt;</content><author><name>jamesmcintyre</name></author><category term="releases" /><summary type="html">OpenSearch 2.16 is here with an expanded toolkit to make it easier to build search and generative AI applications, along with more advancements in performance and efficiency and upgrades that improve ease of use. You can try the latest version using OpenSearch Dashboards on OpenSearch Playground and read the release notes for a complete rundown of what’s new in this release. Here are some of the new and updated features you can put to work in OpenSearch 2.16.</summary></entry><entry><title type="html">Exploring concurrent segment search performance</title><link href="https://kolchfa-aws.github.io/blog/concurrent-search-follow-up/" rel="alternate" type="text/html" title="Exploring concurrent segment search performance" /><published>2024-07-30T00:00:00+00:00</published><updated>2024-07-30T16:39:17+00:00</updated><id>https://kolchfa-aws.github.io/blog/concurrent-search-follow-up</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/concurrent-search-follow-up/">&lt;p&gt;In October 2023, we &lt;a href=&quot;https://opensearch.org/blog/concurrent_segment_search/&quot;&gt;introduced concurrent segment search in OpenSearch&lt;/a&gt; as an experimental feature. Searching segments concurrently improves search latency across a large variety of workloads. This feature was made generally available in OpenSearch 2.12; we highly recommend that you try it! Here, we’ll share performance results of simulations of different real-world scenarios. In particular, we’ll look at performance trends as available system resources decrease and concurrency increases.&lt;/p&gt;

&lt;p&gt;Concurrent segment search divides each shard-level search request on a node into multiple execution tasks called &lt;em&gt;slices&lt;/em&gt;. Slices can be executed concurrently on separate threads in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool, separately from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search&lt;/code&gt; thread pool. Each slice searches within its associated segments. Once all slice executions are complete, the collected results from all slices are combined (reduced) and returned to the coordinator node. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool is used to execute the slices of each shard search request and is shared across all shard search requests on a node. By default, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool has twice as many threads as the number of available processors.&lt;/p&gt;

&lt;h2 id=&quot;performance-results&quot;&gt;Performance results&lt;/h2&gt;

&lt;p&gt;For our performance testing, we used a standard r5.8xlarge instance type, which has 32 vCPUs and an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool size of 64. This allowed us to explore various concurrency and cluster load scenarios on a realistic instance as well as capture a more accurate picture of performance as we increased the number of slices and search clients.&lt;/p&gt;

&lt;p&gt;In our &lt;a href=&quot;https://opensearch.org/blog/concurrent_segment_search/&quot;&gt;previous blog post&lt;/a&gt;, we showed that the strongest performance improvements were demonstrated in long-running and CPU-intensive operations, while fast queries like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_all&lt;/code&gt; saw little improvement or even a slight performance regression from the overhead of concurrent segment search. Because in this post we want to focus on how performance changes under various cluster load conditions, we will look at a smaller subset of longer-running operations for the &lt;a href=&quot;https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/nyc_taxis&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/big5&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;&lt;/a&gt; workloads using OpenSearch Benchmark. Additionally, we’ll report the p90 system metrics for the entire workload provided by OpenSearch Benchmark, which is most relevant to the longer-running operations.&lt;/p&gt;

&lt;p&gt;First, we established a performance baseline using a single shard and a single search client. This configuration is the theoretical best-case scenario when using concurrent segment search because in this configuration, there is a maximum of 1 shard-level search request being processed at a time; thus, each request uses the maximum number of resources.&lt;/p&gt;

&lt;p&gt;The following sections present the performance results. We abbreviate concurrent segment search as “CS”.&lt;/p&gt;

&lt;h3 id=&quot;cluster-setup-1&quot;&gt;Cluster setup 1&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Instance type: r5.8xlarge (32 vCPUs, 256 GB RAM)&lt;/li&gt;
  &lt;li&gt;Node count: 1&lt;/li&gt;
  &lt;li&gt;Shard count: 1&lt;/li&gt;
  &lt;li&gt;Search client count: 1&lt;/li&gt;
  &lt;li&gt;Concurrent search thread pool size: 64&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;p90-query-latency-comparison&quot;&gt;p90 query latency comparison&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Operation&lt;/th&gt;
      &lt;th&gt;CS disabled (in ms)&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;21757&lt;/td&gt;
      &lt;td&gt;4178&lt;/td&gt;
      &lt;td&gt;81%&lt;/td&gt;
      &lt;td&gt;11710&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;6915&lt;/td&gt;
      &lt;td&gt;68%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;7950&lt;/td&gt;
      &lt;td&gt;1486&lt;/td&gt;
      &lt;td&gt;81%&lt;/td&gt;
      &lt;td&gt;4341&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;2555&lt;/td&gt;
      &lt;td&gt;67%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query-string-on-message&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;143&lt;/td&gt;
      &lt;td&gt;49&lt;/td&gt;
      &lt;td&gt;66%&lt;/td&gt;
      &lt;td&gt;81&lt;/td&gt;
      &lt;td&gt;43%&lt;/td&gt;
      &lt;td&gt;59&lt;/td&gt;
      &lt;td&gt;58%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;keyword-in-range&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;127&lt;/td&gt;
      &lt;td&gt;58&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;87&lt;/td&gt;
      &lt;td&gt;31%&lt;/td&gt;
      &lt;td&gt;70&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;distance_amount_agg&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;12403&lt;/td&gt;
      &lt;td&gt;2921&lt;/td&gt;
      &lt;td&gt;76%&lt;/td&gt;
      &lt;td&gt;6642&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;3633&lt;/td&gt;
      &lt;td&gt;71%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-big5&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;3%&lt;/td&gt;
      &lt;td&gt;36%&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
      &lt;td&gt;12%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;56%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;17&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-nyc_taxis&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;3%&lt;/td&gt;
      &lt;td&gt;23%&lt;/td&gt;
      &lt;td&gt;7%&lt;/td&gt;
      &lt;td&gt;13%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;23%&lt;/td&gt;
      &lt;td&gt;23%&lt;/td&gt;
      &lt;td&gt;21%&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;22&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Similarly to the initial performance results shared in our first blog post, with the larger &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r5.8xlarge&lt;/code&gt; instance type we see strong performance improvements in long-running, CPU-intensive operations. As for system resource utilization, CPU usage increases as expected when the number of active concurrent search threads increases. However, the p90 JVM heap utilization appears to be mostly uncorrelated with increased concurrency.&lt;/p&gt;

&lt;p&gt;Taking a step back, the theoretical maximum performance gained from using concurrent segment search is approximately a 50% improvement in shard-level search request latency for every twofold increase in concurrency. For example, when going from no concurrent search to concurrent search with 2 slices, the theoretical maximum performance improvement is 50%. Increasing to 4 slices, the maximum performance improvement is 75%, and to 8 slices, 87.5%. Additionally, for every doubling of concurrency, we expect the CPU utilization to roughly double, assuming the same work distribution across slices. This is because twice the number of CPU threads is used, although for a shorter duration.&lt;/p&gt;

&lt;p&gt;With that in mind, even in &lt;strong&gt;Cluster setup 1&lt;/strong&gt;, we begin to observe diminishing returns in performance improvement as we increase the slice-level concurrency. This change can largely be attributed to duplicate work and the additional effort required to reduce the number of slice-level search results as the number of slices increases.&lt;/p&gt;

&lt;h4 id=&quot;performance-improvements-vs-cpu-utilization-of-range-auto-date-histo-with-metrics&quot;&gt;Performance improvements vs. CPU utilization of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt;&lt;/h4&gt;

&lt;p&gt;The following table provides example performance improvement data for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; operation.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Comparison&lt;/th&gt;
      &lt;th&gt;% Performance improvement&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization (p90)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Concurrent search disabled to 2 slices&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;3%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2 slices to 4 slices&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4 slices to Lucene default slice count&lt;/td&gt;
      &lt;td&gt;13%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;When we compare concurrent search with 2 slices to not using concurrent search, we can see that we get a 46% performance improvement by utilizing just 3% more CPU. However, performance improvements diminish as we increase the slice count in order to utilize more CPU. We can see that going from 4 slices to the Lucene default slice count results in only a 13% performance improvement at the cost of a 24% higher CPU utilization.&lt;/p&gt;

&lt;p&gt;Of course, in the real world, a cluster rarely serves only a single search request at a time. To understand how the performance of concurrent segment search changes as the load on a cluster increases, we ran performance tests on a few additional cluster setups. The results are presented in the following sections.&lt;/p&gt;

&lt;h3 id=&quot;cluster-setup-2&quot;&gt;Cluster setup 2&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Instance type: r5.8xlarge (32 vCPUs, 256 GB RAM)&lt;/li&gt;
  &lt;li&gt;Node count: 1&lt;/li&gt;
  &lt;li&gt;Shard count: 1&lt;/li&gt;
  &lt;li&gt;Search client count: 2&lt;/li&gt;
  &lt;li&gt;Concurrent search thread pool size: 64&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;p90-query-latency-comparison-1&quot;&gt;p90 query latency comparison&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Operation&lt;/th&gt;
      &lt;th&gt;CS disabled (in ms)&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;21888&lt;/td&gt;
      &lt;td&gt;4544&lt;/td&gt;
      &lt;td&gt;79%&lt;/td&gt;
      &lt;td&gt;11930&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;6920&lt;/td&gt;
      &lt;td&gt;68%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;8235&lt;/td&gt;
      &lt;td&gt;1634&lt;/td&gt;
      &lt;td&gt;80%&lt;/td&gt;
      &lt;td&gt;4532&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;2633&lt;/td&gt;
      &lt;td&gt;68%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query-string-on-message&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;142&lt;/td&gt;
      &lt;td&gt;49&lt;/td&gt;
      &lt;td&gt;65%&lt;/td&gt;
      &lt;td&gt;101&lt;/td&gt;
      &lt;td&gt;29%&lt;/td&gt;
      &lt;td&gt;63&lt;/td&gt;
      &lt;td&gt;55%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;keyword-in-range&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;127&lt;/td&gt;
      &lt;td&gt;61&lt;/td&gt;
      &lt;td&gt;52%&lt;/td&gt;
      &lt;td&gt;105&lt;/td&gt;
      &lt;td&gt;18%&lt;/td&gt;
      &lt;td&gt;73&lt;/td&gt;
      &lt;td&gt;43%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;distance_amount_agg&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;12335&lt;/td&gt;
      &lt;td&gt;2941&lt;/td&gt;
      &lt;td&gt;76%&lt;/td&gt;
      &lt;td&gt;6969&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
      &lt;td&gt;3689&lt;/td&gt;
      &lt;td&gt;70%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-big5-1&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
      &lt;td&gt;60%&lt;/td&gt;
      &lt;td&gt;12%&lt;/td&gt;
      &lt;td&gt;25%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;53%&lt;/td&gt;
      &lt;td&gt;56%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;34&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-nyc_taxis-1&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
      &lt;td&gt;47%&lt;/td&gt;
      &lt;td&gt;13%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;59%&lt;/td&gt;
      &lt;td&gt;59%&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
      &lt;td&gt;55%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;44&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;In &lt;strong&gt;Cluster setup 2&lt;/strong&gt;, we increase the search client count to 2, so there are now 2 search clients sending search requests to the cluster at the same time. Based on the system metrics, we can confirm that the max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads metric is showing twice as many active threads in the 2-slice, 4-slice, and Lucene default cases. Moreover, the OpenSearch Benchmark workloads run in &lt;a href=&quot;https://opensearch.org/docs/latest/benchmark/user-guide/target-throughput/#benchmarking-mode&quot;&gt;benchmarking mode&lt;/a&gt;, which means that there is no delay between requests: As soon as search clients receive a response from the server, they send a subsequent request.&lt;/p&gt;

&lt;p&gt;Based on the system utilization metrics, even with the additional search client, CPU utilization remains below 60%. In fact, even in the worst-case scenario, CPU is not being fully utilized. Therefore, we don’t observe any decline in performance improvement when comparing various slice count scenarios between &lt;strong&gt;Cluster setup 1&lt;/strong&gt; and &lt;strong&gt;Cluster setup 2&lt;/strong&gt;. As we increase the number of slices, we continue to observe diminishing returns in performance improvement relative to the rise in CPU usage.&lt;/p&gt;

&lt;h4 id=&quot;performance-improvements-vs-cpu-utilization-of-range-auto-date-histo-with-metrics-1&quot;&gt;Performance improvements vs. CPU utilization of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt;&lt;/h4&gt;

&lt;p&gt;The following table provides example performance improvement data for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; operation.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Comparison&lt;/th&gt;
      &lt;th&gt;% Performance improvement&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization (p90)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Concurrent search disabled to 2 slices&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2 slices to 4 slices&lt;/td&gt;
      &lt;td&gt;23%&lt;/td&gt;
      &lt;td&gt;13%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4 slices to Lucene default slice count&lt;/td&gt;
      &lt;td&gt;11%&lt;/td&gt;
      &lt;td&gt;35%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;cluster-setup-3&quot;&gt;Cluster setup 3&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Instance type: r5.8xlarge (32 vCPUs, 256 GB RAM)&lt;/li&gt;
  &lt;li&gt;Node count: 1&lt;/li&gt;
  &lt;li&gt;Shard count: 1&lt;/li&gt;
  &lt;li&gt;Search client count: 4&lt;/li&gt;
  &lt;li&gt;Concurrent search thread pool size: 64&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;p90-query-latency-comparison-2&quot;&gt;p90 query latency comparison&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Operation&lt;/th&gt;
      &lt;th&gt;CS disabled (in ms)&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;21307&lt;/td&gt;
      &lt;td&gt;6398&lt;/td&gt;
      &lt;td&gt;70%&lt;/td&gt;
      &lt;td&gt;11692&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;6921&lt;/td&gt;
      &lt;td&gt;68%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;8088&lt;/td&gt;
      &lt;td&gt;2444&lt;/td&gt;
      &lt;td&gt;70%&lt;/td&gt;
      &lt;td&gt;4504&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
      &lt;td&gt;2727&lt;/td&gt;
      &lt;td&gt;66%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query-string-on-message&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;142&lt;/td&gt;
      &lt;td&gt;51&lt;/td&gt;
      &lt;td&gt;64%&lt;/td&gt;
      &lt;td&gt;103&lt;/td&gt;
      &lt;td&gt;27%&lt;/td&gt;
      &lt;td&gt;69&lt;/td&gt;
      &lt;td&gt;52%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;keyword-in-range&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;132&lt;/td&gt;
      &lt;td&gt;68&lt;/td&gt;
      &lt;td&gt;48%&lt;/td&gt;
      &lt;td&gt;110&lt;/td&gt;
      &lt;td&gt;17%&lt;/td&gt;
      &lt;td&gt;81&lt;/td&gt;
      &lt;td&gt;39%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;distance_amount_agg&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;12022&lt;/td&gt;
      &lt;td&gt;3512&lt;/td&gt;
      &lt;td&gt;71%&lt;/td&gt;
      &lt;td&gt;6362&lt;/td&gt;
      &lt;td&gt;47%&lt;/td&gt;
      &lt;td&gt;3649&lt;/td&gt;
      &lt;td&gt;70%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-big5-2&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;13%&lt;/td&gt;
      &lt;td&gt;93%&lt;/td&gt;
      &lt;td&gt;25%&lt;/td&gt;
      &lt;td&gt;49%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;64&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-nyc_taxis-2&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;12%&lt;/td&gt;
      &lt;td&gt;77%&lt;/td&gt;
      &lt;td&gt;25%&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;59%&lt;/td&gt;
      &lt;td&gt;59%&lt;/td&gt;
      &lt;td&gt;59%&lt;/td&gt;
      &lt;td&gt;59%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;64&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The next setup serves search requests to 4 search clients concurrently. For the Lucene default slice count, this scenario creates enough segment slices to fill the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool. The maximum number of concurrent search active threads is 64, which is equal to the thread pool size. Because of this, the majority of the available CPU resources are consumed in the Lucene default slice count case, leading to diminishing performance improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster setup 1&lt;/strong&gt; and &lt;strong&gt;Cluster setup 2&lt;/strong&gt; showed a roughly 80% performance improvement in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; operation for the Lucene default case. However, in &lt;strong&gt;Cluster setup 3&lt;/strong&gt;, when we reach the CPU availability bottleneck, this same performance improvement decreases to 70%.&lt;/p&gt;

&lt;h4 id=&quot;performance-improvements-vs-cpu-utilization-of-range-auto-date-histo-with-metrics-2&quot;&gt;Performance improvements vs. CPU utilization of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt;&lt;/h4&gt;

&lt;p&gt;The following table provides example performance improvement data for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; operation.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Comparison&lt;/th&gt;
      &lt;th&gt;% Performance improvement&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization (p90)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Concurrent search disabled to 2 slices&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;12%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2 slices to 4 slices&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4 slices to Lucene default slice count&lt;/td&gt;
      &lt;td&gt;2%&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Reviewing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; operation again shows that as we approach 100% CPU utilization, the performance benefit provided by additional slices when using concurrent segment search mostly disappears. In &lt;strong&gt;Cluster setup 3&lt;/strong&gt;, when comparing 4 slices to the Lucene default slice count, we see only a 2% performance improvement at the cost of a prohibitive 44% additional CPU usage.&lt;/p&gt;

&lt;h3 id=&quot;cluster-setup-4&quot;&gt;Cluster setup 4&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Instance type: r5.8xlarge (32 vCPUs, 256 GB RAM)&lt;/li&gt;
  &lt;li&gt;Node count: 1&lt;/li&gt;
  &lt;li&gt;Shard count: 1&lt;/li&gt;
  &lt;li&gt;Search client count: 8&lt;/li&gt;
  &lt;li&gt;Concurrent search thread pool size: 64&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;p90-query-latency-comparison-3&quot;&gt;p90 query latency comparison&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Operation&lt;/th&gt;
      &lt;th&gt;CS disabled (in ms)&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;21641&lt;/td&gt;
      &lt;td&gt;11937&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;11596&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;11884&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;8382&lt;/td&gt;
      &lt;td&gt;4457&lt;/td&gt;
      &lt;td&gt;47%&lt;/td&gt;
      &lt;td&gt;4536&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;4437&lt;/td&gt;
      &lt;td&gt;47%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query-string-on-message&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;166&lt;/td&gt;
      &lt;td&gt;83&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
      &lt;td&gt;118&lt;/td&gt;
      &lt;td&gt;29%&lt;/td&gt;
      &lt;td&gt;75&lt;/td&gt;
      &lt;td&gt;55%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;keyword-in-range&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;162&lt;/td&gt;
      &lt;td&gt;99&lt;/td&gt;
      &lt;td&gt;39%&lt;/td&gt;
      &lt;td&gt;126&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
      &lt;td&gt;90&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;distance_amount_agg&lt;/code&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;11727&lt;/td&gt;
      &lt;td&gt;5420&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;6586&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
      &lt;td&gt;5326&lt;/td&gt;
      &lt;td&gt;55%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-big5-3&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;25%&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
      &lt;td&gt;49%&lt;/td&gt;
      &lt;td&gt;99%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;56%&lt;/td&gt;
      &lt;td&gt;55%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
      &lt;td&gt;54%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;64&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics-nyc_taxis-3&quot;&gt;System metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt;)&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=2)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;26%&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;60%&lt;/td&gt;
      &lt;td&gt;58%&lt;/td&gt;
      &lt;td&gt;59%&lt;/td&gt;
      &lt;td&gt;58%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;64&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;32&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;For this setup, we again double the number of search clients, to 8. Based on the system resource utilization metrics, we see that in both the 4-slice and the Lucene default slice cases, we reach 100% CPU utilization. As expected, increasing the slice count in this scenario results in even more pronounced diminishing returns and, in some cases, even slight performance regressions.&lt;/p&gt;

&lt;h4 id=&quot;performance-improvements-vs-cpu-utilization-of-range-auto-date-histo-with-metrics-3&quot;&gt;Performance improvements vs. CPU utilization of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt;&lt;/h4&gt;

&lt;p&gt;The following table provides example performance improvement data for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt; operation.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Comparison&lt;/th&gt;
      &lt;th&gt;% Performance improvement&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization (p90)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Concurrent search disabled to 2 slices&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2 slices to 4 slices&lt;/td&gt;
      &lt;td&gt;-1%&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4 slices to Lucene default slice count&lt;/td&gt;
      &lt;td&gt;0%&lt;/td&gt;
      &lt;td&gt;1%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;In &lt;strong&gt;Cluster setup 3&lt;/strong&gt;, we saw little to no benefit in going from 4 slices to the Lucene default slice count when CPU utilization reached 100%. Similarly, moving from 2 slices to 4 slices in this scenario yields minimal benefit when CPU utilization reaches 100%. We can clearly see that in scenarios with a high number of search clients sending requests concurrently, performance gains are unlikely when we increase the slice-level concurrency on the cluster because CPU resource utilization starts to reach the maximum number of queries per second.&lt;/p&gt;

&lt;h3 id=&quot;comparing-setups&quot;&gt;Comparing setups&lt;/h3&gt;

&lt;p&gt;The following sections present a comparison of the preceding setups.&lt;/p&gt;

&lt;h4 id=&quot;setup-comparison-for-range-auto-date-histo-with-metrics&quot;&gt;Setup comparison for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt;&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Clsuter configuration&lt;/th&gt;
      &lt;th&gt;% Performance improvement from CS disabled to 2 slices&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
      &lt;th&gt;% Performance improvement from 2 slices to 4 slices&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
      &lt;th&gt;% Performance improvement from 4 slices to Lucene default&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 1 client&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;3%&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
      &lt;td&gt;12%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 2 client&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
      &lt;td&gt;13%&lt;/td&gt;
      &lt;td&gt;10%&lt;/td&gt;
      &lt;td&gt;35%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 4 client&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;12%&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
      &lt;td&gt;2%&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 8 client&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
      &lt;td&gt;-1%&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
      &lt;td&gt;0%&lt;/td&gt;
      &lt;td&gt;1%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;setup-comparison-for-distance_amount_agg&quot;&gt;Setup comparison for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;distance_amount_agg&lt;/code&gt;&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Cluster configuration&lt;/th&gt;
      &lt;th&gt;% Performance improvement from CS disabled to 2 slices&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
      &lt;th&gt;% Performance improvement from 2 slices to 4 slices&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
      &lt;th&gt;% Performance improvement from 4 slices to Lucene default&lt;/th&gt;
      &lt;th&gt;% Additional CPU utilization&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 1 client&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;4%&lt;/td&gt;
      &lt;td&gt;26%&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
      &lt;td&gt;6%&lt;/td&gt;
      &lt;td&gt;10%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 2 client&lt;/td&gt;
      &lt;td&gt;49%&lt;/td&gt;
      &lt;td&gt;7%&lt;/td&gt;
      &lt;td&gt;21%&lt;/td&gt;
      &lt;td&gt;11%&lt;/td&gt;
      &lt;td&gt;7%&lt;/td&gt;
      &lt;td&gt;23%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 4 client&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
      &lt;td&gt;13%&lt;/td&gt;
      &lt;td&gt;22%&lt;/td&gt;
      &lt;td&gt;25%&lt;/td&gt;
      &lt;td&gt;3%&lt;/td&gt;
      &lt;td&gt;27%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1 shard / 8 client&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
      &lt;td&gt;10%&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
      &lt;td&gt;0%&lt;/td&gt;
      &lt;td&gt;0%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The main takeaway here is that whenever there are available CPU resources, you can improve performance by further increasing concurrency. However, once CPU resources are fully utilized, you will no longer see performance gains by increasing concurrency, and you may even see slight regressions. Moreover, there are diminishing returns on increasing the concurrency of a single request even when there are CPU resources available. This is because there is additional overhead introduced by the combination of duplicated work across slices in the concurrent portion and sequential work during the reduce phase. As the availability of these CPU resources decreases, the effect of the diminishing returns on concurrency is further amplified.&lt;/p&gt;

&lt;h2 id=&quot;noaa-workload&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;noaa&lt;/code&gt; workload&lt;/h2&gt;

&lt;p&gt;In addition to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt; workloads, we also benchmarked performance for the &lt;a href=&quot;https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/noaa&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;noaa&lt;/code&gt;&lt;/a&gt; OpenSearch Benchmark workload, which is focused on aggregations. Because we saw the greatest improvements for the aggregation-related queries in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nyc_taxis&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;big5&lt;/code&gt;, we wanted to see how these performance gains held up across datasets and queries.&lt;/p&gt;

&lt;h3 id=&quot;cluster-setup-5&quot;&gt;Cluster setup 5&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Instance type: r5.2xlarge (8 vCPUs, 64 GB RAM)&lt;/li&gt;
  &lt;li&gt;Node count: 1&lt;/li&gt;
  &lt;li&gt;Shard count: 1&lt;/li&gt;
  &lt;li&gt;Search client count: 1&lt;/li&gt;
  &lt;li&gt;Concurrent search thread pool size: 16&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;p90-query-latency-comparison-4&quot;&gt;p90 query latency comparison&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Workload operation&lt;/th&gt;
      &lt;th&gt;CS disabled (in ms)&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4) (in ms)&lt;/th&gt;
      &lt;th&gt;% Improvement&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date-histo-entire-range&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;-31%&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;2%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date-histo-string-significant-terms-via-map&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;13700&lt;/td&gt;
      &lt;td&gt;7650&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
      &lt;td&gt;7538&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;keyword-terms&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;147&lt;/td&gt;
      &lt;td&gt;83&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
      &lt;td&gt;78&lt;/td&gt;
      &lt;td&gt;47%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-auto-date-histo-with-metrics&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;3957&lt;/td&gt;
      &lt;td&gt;2182&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;2226&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-date-histo&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;1803&lt;/td&gt;
      &lt;td&gt;948&lt;/td&gt;
      &lt;td&gt;47%&lt;/td&gt;
      &lt;td&gt;1029&lt;/td&gt;
      &lt;td&gt;43%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;range-numeric-significant-terms&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;2597&lt;/td&gt;
      &lt;td&gt;2951&lt;/td&gt;
      &lt;td&gt;-14%&lt;/td&gt;
      &lt;td&gt;1627&lt;/td&gt;
      &lt;td&gt;37%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;system-metrics&quot;&gt;System metrics&lt;/h4&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;CS disabled&lt;/th&gt;
      &lt;th&gt;CS enabled (Lucene default slices)&lt;/th&gt;
      &lt;th&gt;CS enabled (fixed slice count=4)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 CPU&lt;/td&gt;
      &lt;td&gt;25%&lt;/td&gt;
      &lt;td&gt;99%&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p90 JVM&lt;/td&gt;
      &lt;td&gt;60%&lt;/td&gt;
      &lt;td&gt;60%&lt;/td&gt;
      &lt;td&gt;60%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Max &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; active threads&lt;/td&gt;
      &lt;td&gt;–&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The performance results for these aggregation types confirm the results we saw with previous cluster configurations. We see strong performance improvements in most aggregation types and, again, these performance improvements diminish, and sometimes even regress, as we increase concurrency on the CPU load.&lt;/p&gt;

&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;

&lt;p&gt;Increases or decreases in performance related to concurrent segment search can usually be attributed to one of the following four reasons:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;First, whenever the number of segment slices is large, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool is filled. Whenever there are no threads available to execute the shard search task for a slice, the slice waits in the queue until other slices are finished processing. For example, in &lt;strong&gt;Cluster setup 3&lt;/strong&gt; there are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;4 * 17 = 68&lt;/code&gt; total segment slices when using the Lucene default slice count but only 64 threads available in the concurrent search thread pool. Thus, 4 segment slices will spend some time waiting in the thread pool queue.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Second, whenever the number of active threads is higher than the number of CPU cores, each individual thread may spend more time processing because the CPU cores are multiplexing tasks. By default, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r5.2xlarge&lt;/code&gt; instance with 8 CPU cores has 16 threads in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;index_searcher&lt;/code&gt; thread pool and 13 threads in the search thread pool. If all 29 threads are concurrently processing search tasks, then each individual thread will encounter a longer processing time because there are only 8 CPU cores to serve these 29 threads.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Third, the specific query implementation can greatly impact performance when increasing concurrency because some queries may perform more duplicate work as the number of slices increases. For example, significant terms aggregations run count queries for each bucket key to determine the term background frequencies. Thus, duplicated bucket keys across segment slices result in duplicated count queries across slices as well.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Fourth, the reduce phase is performed sequentially on all segment slices. If the reduce overhead is large, it can offset the gains realized from searching documents concurrently. For example, for aggregations, a new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Aggregator&lt;/code&gt; instance is created for each segment slice. Each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Aggregator&lt;/code&gt; creates an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InternalAggregation&lt;/code&gt; object, which represents the buckets created during document collection. These &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InternalAggregation&lt;/code&gt; object instances are then processed sequentially during the reduce phase. As a result, a simple &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;term&lt;/code&gt; aggregation can create up to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;slice_count * shard_size&lt;/code&gt; buckets per shard, which are then processed sequentially during the reduce phase.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;/h2&gt;

&lt;p&gt;In summary, when choosing a segment slice count to use, it’s important to run your own benchmarking to determine whether the additional parallelization produced by adding more segment slices outweighs the additional processing overhead. Concurrent segment search is ready for use in production environments, and you can continue to track its ongoing improvements on this &lt;a href=&quot;https://github.com/orgs/opensearch-project/projects/117&quot;&gt;project board&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Additionally, to provide visibility into performance over time, we will publish nightly performance runs for concurrent segment search in &lt;a href=&quot;https://opensearch.org/benchmarks&quot;&gt;OpenSearch Performance Benchmarks&lt;/a&gt;, covering all the test workloads mentioned in this post.&lt;/p&gt;

&lt;p&gt;For guidelines on getting started with concurrent segment search, see &lt;a href=&quot;https://opensearch.org/docs/latest/search-plugins/concurrent-segment-search/#general-guidelines&quot;&gt;General guidelines&lt;/a&gt;.&lt;/p&gt;</content><author><name>jaydeng</name></author><category term="search" /><category term="technical-post" /><summary type="html">In October 2023, we introduced concurrent segment search in OpenSearch as an experimental feature. Searching segments concurrently improves search latency across a large variety of workloads. This feature was made generally available in OpenSearch 2.12; we highly recommend that you try it! Here, we’ll share performance results of simulations of different real-world scenarios. In particular, we’ll look at performance trends as available system resources decrease and concurrency increases.</summary></entry><entry><title type="html">Optimize storage and performance with the MatchOnlyText field in OpenSearch</title><link href="https://kolchfa-aws.github.io/blog/Optimize-storage-and-performance-using-MatchOnlyText-field/" rel="alternate" type="text/html" title="Optimize storage and performance with the MatchOnlyText field in OpenSearch" /><published>2024-07-22T00:00:00+00:00</published><updated>2024-07-23T23:00:44+00:00</updated><id>https://kolchfa-aws.github.io/blog/Optimize-storage-and-performance-using-MatchOnlyText-field</id><content type="html" xml:base="https://kolchfa-aws.github.io/blog/Optimize-storage-and-performance-using-MatchOnlyText-field/">&lt;p&gt;The OpenSearch Project introduced a new field type called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; in version 2.12. This field type is designed for full-text search scenarios where scoring and positional information of terms within a document are not critical. If you’re working with large datasets in OpenSearch and looking to optimize storage and performance, then the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field could be an interesting option to explore.&lt;/p&gt;

&lt;style&gt;

.light-green-clr {
    background-color: #e3f8e3;
}

.bold {
    font-weight: 700;
}

.left {
    text-align: left;
}

.center {
    text-align: center;
}

table { 
    font-size: 16px; 
}

h3 {
    font-size: 22px;
}

th {
    background-color: #f5f7f7;
}​

&lt;/style&gt;

&lt;h2 id=&quot;what-is-the-matchonlytext-field&quot;&gt;What is the MatchOnlyText field?&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field is a variant of the standard &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;text&lt;/code&gt; field in OpenSearch. It differs from the regular &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;text&lt;/code&gt; field in a few key ways:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Reduced storage requirements&lt;/strong&gt;: It omits storing positions, frequencies, and norms, which reduces the overall storage requirements.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Constant scoring&lt;/strong&gt;: It disables scoring so that all matching documents receive a constant score of 1.0.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Limited query support&lt;/strong&gt;: It supports most query types, except for interval and span queries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By avoiding the overhead of storing frequencies and positions, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; fields result in smaller indexes and lower storage costs, especially for large datasets.&lt;/p&gt;

&lt;h2 id=&quot;why-use-the-matchonlytext-field&quot;&gt;Why use the MatchOnlyText field?&lt;/h2&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field can be particularly beneficial when you need to quickly find documents containing specific terms, without the need for relevance ranking or queries that rely on term proximity or order (like interval or span queries). For example, when searching for exceptions in logs for the last hour, relevance may not be critical.&lt;/p&gt;

&lt;p&gt;The reduced storage requirements of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; fields can lead to significant cost savings, especially for organizations managing large amounts of text data. According to initial benchmarks, the storage savings can be as high as 25% compared to using standard &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;text&lt;/code&gt; fields.&lt;/p&gt;

&lt;h2 id=&quot;how-matchonlytext-achieves-smaller-index-sizes&quot;&gt;How MatchOnlyText achieves smaller index sizes&lt;/h2&gt;

&lt;p&gt;For regular &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;text&lt;/code&gt; fields, the inverted index stores the term-to-postings mapping, where postings contain the IDs of the documents in which the term exists as well as additional information such as positions, document frequencies, and norms. When executing queries in which positions are not needed, such as term queries, the positions are never loaded. However, when running phrase queries, the positions of terms within a document are required in order to ensure that the individual terms of the phrase query are in order.&lt;/p&gt;

&lt;p&gt;With the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field, the positional information is not stored, resulting in smaller indexes. To run phrase queries without the positional data, OpenSearch converts the phrase query into a conjunction of individual term queries and then checks the matching documents against the original document content in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_source&lt;/code&gt; field using a Lucene MemoryIndex. This approach trades off the performance of phrase queries for reduced storage requirements.&lt;/p&gt;

&lt;h2 id=&quot;estimating-storage-savings&quot;&gt;Estimating storage savings&lt;/h2&gt;

&lt;p&gt;To understand how much the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field could save you in storage costs and whether it would be worth trading off unsupported features and the performance of phrase queries, you can use the OpenSearch Index Stats API:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/&amp;lt;index_name&amp;gt;/_stats/segments?level=shards&amp;amp;include_segment_file_sizes&amp;amp;pretty
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This API provides information about the storage usage of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pos&lt;/code&gt; (positions), &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;doc&lt;/code&gt; (frequencies), and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nvm&lt;/code&gt; (norms) components. The savings you can achieve by using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; will depend on your specific data and workload, but initial benchmarks have shown storage reductions of up to 25% for the PMC workload in OpenSearch Benchmark.&lt;/p&gt;

&lt;p&gt;Keep in mind that with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_stats&lt;/code&gt; API, it’s not possible to get field-level statistics. A &lt;a href=&quot;https://github.com/opensearch-project/OpenSearch/issues/6836#issuecomment-1758529469&quot;&gt;GitHub issue&lt;/a&gt; has been created to address this limitation in order to help you accurately predict storage optimization after transitioning from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;text&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;using-matchonlytext-in-opensearch&quot;&gt;Using MatchOnlyText in OpenSearch&lt;/h2&gt;

&lt;p&gt;To use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field, you can simply define it in your OpenSearch index mappings, like this:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;mappings&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;properties&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;my_text_field&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;match_only_text&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Remember that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field comes with some trade-offs, such as reduced phrase query performance and the inability to use proximity-based queries. Make sure to evaluate your specific use case and requirements to determine whether this field type is the right choice for your OpenSearch application.&lt;/p&gt;

&lt;p&gt;For more detailed information, refer to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;match_only_text&lt;/code&gt; field &lt;a href=&quot;https://opensearch.org/docs/latest/field-types/supported-field-types/match-only-text&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;</content><author><name>rishabhmaurya</name></author><category term="technical-posts" /><summary type="html">The OpenSearch Project introduced a new field type called match_only_text in version 2.12. This field type is designed for full-text search scenarios where scoring and positional information of terms within a document are not critical. If you’re working with large datasets in OpenSearch and looking to optimize storage and performance, then the match_only_text field could be an interesting option to explore.</summary></entry></feed>