Skip to content

Elasticsearch Index Querying Strategies

Barbara Hui edited this page Oct 21, 2022 · 2 revisions

Limitation of 10k results set

As explained in the official documentation, ElasticSearch will by default not allow you to page through more than 10,000 result hits using the GET /_search API endpoint combined with the from and size parameters. This is the method that one uses when doing "ordinary" queries that have results of < 10k.

Using search_after with PIT for deep querying

Starting with ES v7.8, it is recommended that one use the search_after parameter in combination with a point in time (PIT) (instead of from and to) to page through more than 10,000 hits. The PIT safeguards against getting inconsistent results across pages.

As described in the documentation, this kind of "deep" search request is quite a bit more involved than an ordinary from/size page request. Steps:

  1. run a query to create a PIT and retrieve the PIT ID
  2. run the initial request, using sort parameter and pit parameter
  3. to retrieve the next page of results, repeat the request, take the sort values from the last hit, and insert those into the search_after array
  4. repeat this process by updating the search_after array every time you retrieve a new page of results.

Proposed logic for querying ES

Since ES returns a hit total with the first request, I think it makes sense to use the following steps when searching ES:

  1. run an initial search query to determine the number of hits in the result set
  2. If the number of hits is <= 10k, then page through the results using from and size parameters
  3. elseif the number of hits is > 10k, then use the steps outlined above to do a "deep" search