-
Notifications
You must be signed in to change notification settings - Fork 3
Elasticsearch Index Querying Strategies
As explained in the official documentation, ElasticSearch will by default not allow you to page through more than 10,000 result hits using the GET /_search
API endpoint combined with the from
and size
parameters. This is the method that one uses when doing "ordinary" queries that have results of < 10k.
Starting with ES v7.8, it is recommended that one use the search_after
parameter in combination with a point in time (PIT) (instead of from
and to
) to page through more than 10,000 hits. The PIT safeguards against getting inconsistent results across pages.
As described in the documentation, this kind of "deep" search request is quite a bit more involved than an ordinary from/size page request. Steps:
- run a query to create a PIT and retrieve the PIT ID
- run the initial request, using
sort
parameter andpit
parameter - to retrieve the next page of results, repeat the request, take the sort values from the last hit, and insert those into the
search_after
array - repeat this process by updating the search_after array every time you retrieve a new page of results.
Since ES returns a hit total with the first request, I think it makes sense to use the following steps when searching ES:
- run an initial search query to determine the number of hits in the result set
- If the number of hits is <= 10k, then page through the results using
from
andsize
parameters - elseif the number of hits is > 10k, then use the steps outlined above to do a "deep" search