diff --git a/serverless/pages/explore-your-data-ml-nlp-deploy-model.mdx b/serverless/pages/explore-your-data-ml-nlp-deploy-model.mdx
index 82d8534..a689e1d 100644
--- a/serverless/pages/explore-your-data-ml-nlp-deploy-model.mdx
+++ b/serverless/pages/explore-your-data-ml-nlp-deploy-model.mdx
@@ -52,7 +52,7 @@ increases the speed of ((infer)) requests. The value of this setting must not
 exceed the number of available allocated processors per node.
 
 You can view the allocation status in ((kib)) or by using the
-[get trained model stats API](((ref))/get-trained-models-stats.html). If you to
+[get trained model stats API](((ref))/get-trained-models-stats.html). If you want to
 change the number of allocations, you can use the
 [update trained model stats API](((ref))/update-trained-model-deployment.html) 
 after the allocation status is `started`.
@@ -71,10 +71,10 @@ can fill up, resulting in rejected requests. Consider using dedicated
 deployments to prevent this situation.
 
 ((infer-cap)) requests originating from search, such as the 
-[`text_expansion` query](((ref))/query-dsl-text-expansion-query.html), have a higher 
+[`sparse_vector` query](((ref))/query-dsl-sparse-vector-query.html), have a higher 
 priority compared to non-search requests. The ((infer)) ingest processor generates 
 normal priority requests. If both a search query and an ingest processor use the 
 same deployment, the search requests with higher priority skip ahead in the 
 queue for processing before the lower priority ingest requests. This 
 prioritization accelerates search responses while potentially slowing down 
-ingest where response time is less critical.
\ No newline at end of file
+ingest where response time is less critical.
diff --git a/serverless/pages/explore-your-data-ml-nlp-elser.mdx b/serverless/pages/explore-your-data-ml-nlp-elser.mdx
index c82b6e3..89c20f2 100644
--- a/serverless/pages/explore-your-data-ml-nlp-elser.mdx
+++ b/serverless/pages/explore-your-data-ml-nlp-elser.mdx
@@ -140,11 +140,11 @@ Dev Console.
 
     You can deploy the model multiple times with different deployment IDs.
 
-After the deployment is complete, ELSER is ready to use either in an ingest 
-pipeline or in a `text_expansion` query to perform semantic search.
+After the deployment is complete, ELSER is ready to use either in an ingest
+pipeline or in a `sparse_vector` query to perform semantic search.
 
 
 ## Further reading
 
 * [Perform semantic search with ELSER](((ref))/semantic-search-elser.html)
-* [Improving information retrieval in the Elastic Stack: Introducing Elastic Learned Sparse Encoder, our new retrieval model](https://www.elastic.co/blog/may-2023-launch-information-retrieval-elasticsearch-ai-model)
\ No newline at end of file
+* [Improving information retrieval in the Elastic Stack: Introducing Elastic Learned Sparse Encoder, our new retrieval model](https://www.elastic.co/blog/may-2023-launch-information-retrieval-elasticsearch-ai-model)
diff --git a/serverless/pages/search-your-data-semantic-search-elser.mdx b/serverless/pages/search-your-data-semantic-search-elser.mdx
index 27c5cca..a6699e7 100644
--- a/serverless/pages/search-your-data-semantic-search-elser.mdx
+++ b/serverless/pages/search-your-data-semantic-search-elser.mdx
@@ -181,12 +181,12 @@ curl -X GET "${ES_URL}/_tasks/<task_id>" \
 You can also open the Trained Models UI, select the Pipelines tab under ELSER to
 follow the progress.
 
-<div id="text-expansion-query"></div>
+<div id="sparse-vector-query"></div>
 
-## Semantic search by using the `text_expansion` query
+## Semantic search by using the `sparse_vector` query
 
-To perform semantic search, use the `text_expansion` query, and provide the
-query text and the ELSER model ID. The example below uses the query text "How to
+To perform semantic search, use the `sparse_vector` query, and provide the
+query text and the inference ID associated with the ELSER model service. The example below uses the query text "How to
 avoid muscle soreness after running?", the `content_embedding` field contains
 the generated ELSER output:
 
@@ -197,11 +197,10 @@ curl -X GET "${ES_URL}/my-index/_search" \
 -d'
 {
    "query":{
-      "text_expansion":{
-         "content_embedding":{
-            "model_id":".elser_model_2",
-            "model_text":"How to avoid muscle soreness after running?"
-         }
+      "sparse_vector":{
+         "field": "content_embedding",
+         "inference_id": "my-elser-endpoint",
+         "query": "How to avoid muscle soreness after running?"
       }
    }
 }
@@ -251,23 +250,20 @@ weights.
 }
 ```
 
-To learn about optimizing your `text_expansion` query, refer to
-[Optimizing the search performance of the text_expansion query](((ref))/query-dsl-text-expansion-query.html#optimizing-text-expansion).
-
-<div id="text-expansion-compound-query"></div>
+<div id="sparse-vector-compound-query"></div>
 
 ## Combining semantic search with other queries
 
-You can combine `text_expansion` with other queries in a
+You can combine `sparse_vector` with other queries in a
 [compound query](((ref))/compound-queries.html). For example using a filter clause in a
 [Boolean query](((ref))/query-dsl-bool-query.html) or a full text query which may or may not use the same
-query text as the `text_expansion` query. This enables you to combine the search
+query text as the `sparse_vector` query. This enables you to combine the search
 results from both queries.
 
-The search hits from the `text_expansion` query tend to score higher than other
+The search hits from the `sparse_vector` query tend to score higher than other
 ((es)) queries. Those scores can be regularized by increasing or decreasing the
 relevance scores of each query by using the `boost` parameter. Recall on the
-`text_expansion` query can be high where there is a long tail of less relevant
+`sparse_vector` query can be high where there is a long tail of less relevant
 results. Use the `min_score` parameter to prune those less relevant documents.
 
 ```bash
@@ -280,11 +276,11 @@ curl -X GET "${ES_URL}/my-index/_search" \
     "bool": {  [^1]
       "should": [
         {
-          "text_expansion": {
-            "content_embedding": {
-              "model_text": "How to avoid muscle soreness after running?",
-              "model_id": ".elser_model_2",
-              "boost": 1  [^2]
+          "sparse_vector": {
+            "field": "content_embedding",
+            "query": "How to avoid muscle soreness after running?",
+            "inference_id": "my-elser-endpoint",
+            "boost": 1  [^2]
             }
           }
         },
@@ -301,9 +297,9 @@ curl -X GET "${ES_URL}/my-index/_search" \
 }
 '
 ```
-[^1]: Both the `text_expansion` and the `query_string` queries are in a `should`
+[^1]: Both the `sparse_vector` and the `query_string` queries are in a `should`
 clause of a `bool` query.
-[^2]: The `boost` value is `1` for the `text_expansion` query which is the default
+[^2]: The `boost` value is `1` for the `sparse_vector` query which is the default
 value. This means that the relevance score of the results of this query are not
 boosted.
 [^3]: The `boost` value is `4` for the `query_string` query. The relevance score
@@ -320,7 +316,7 @@ search results.
 ## Saving disk space by excluding the ELSER tokens from document source
 
 The tokens generated by ELSER must be indexed for use in the
-[text_expansion query](((ref))/query-dsl-text-expansion-query.html). However, it is not
+[sparse_vector query](((ref))/query-dsl-sparse-vector-query.html). However, it is not
 necessary to retain those terms in the document source. You can save disk space
 by using the [source exclude](((ref))/mapping-source-field.html#include-exclude) mapping to remove the ELSER
 terms from the document source.
@@ -376,4 +372,3 @@ curl -X PUT "${ES_URL}/my-index" \
 ## Interactive example
 
 * The `elasticsearch-labs` repo has an interactive example of running [ELSER-powered semantic search](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/03-ELSER.ipynb) using the ((es)) Python client.
-
diff --git a/serverless/pages/search-your-data-semantic-search.mdx b/serverless/pages/search-your-data-semantic-search.mdx
index 1aad30a..8119fce 100644
--- a/serverless/pages/search-your-data-semantic-search.mdx
+++ b/serverless/pages/search-your-data-semantic-search.mdx
@@ -111,8 +111,8 @@ Now it is time to perform semantic search!
 
 ## Search the data
 
-Depending on the type of model you have deployed, you can query rank features
-with a text expansion query, or dense vectors with a kNN search.
+Depending on the type of model you have deployed, you can query sparse vectors
+with a sparse vector query, or dense vectors with a kNN search.
 
 <SearchWidget />
 
diff --git a/serverless/partials/hybrid-search-elser.mdx b/serverless/partials/hybrid-search-elser.mdx
index 39e2ce1..ce795c9 100644
--- a/serverless/partials/hybrid-search-elser.mdx
+++ b/serverless/partials/hybrid-search-elser.mdx
@@ -1,11 +1,8 @@
 
 
 
-Hybrid search between a semantic and lexical query can be achieved by using a
-`sub_searches` clause in your search request. In the `sub_searches` clause,
-provide a `text_expansion` query and a full-text query. Next to the
-`sub_searches` clause, also provide a `rank` clause with
-the `rrf` parameter to rank documents using reciprocal rank fusion.
+Hybrid search between a semantic and lexical query can be achieved by using retrievers in your search request.
+The following example uses retrievers to perform a match query and a sparse vector query, and rank them using RRF.
 
 ```bash
 curl -X GET "${ES_URL}/my-index/_search" \
@@ -13,29 +10,34 @@ curl -X GET "${ES_URL}/my-index/_search" \
 -H "Content-Type: application/json" \
 -d'
 {
-  "sub_searches": [
-    {
-      "query": {
-        "match": {
-          "my_text_field": "the query string"
-        }
-      }
-    },
-    {
-      "query": {
-        "text_expansion": {
-          "my_tokens": {
-            "model_id": ".elser_model_2",
-            "model_text": "the query string"
+  "retriever": {
+    "rrf": {
+      "retrievers": [
+        {
+          "standard": {
+            "query": {
+              "match": {
+                "my_text_field": "the query string"
+              }
+            }
+          }
+        },
+        {
+          "standard": {
+            "query": {
+              "sparse_vector": {
+                "field": "my_tokens",
+                "inference_id": "my-elser-endpoint",
+                "query": "the query string"
+              }
+            }
           }
         }
-      }
+      ],
+      "window_size": 50,
+      "rank_constant": 20
     }
-  ],
-  "rank": {
-    "rrf": {}
   }
 }
 '
 ```
-
diff --git a/serverless/partials/search-elser.mdx b/serverless/partials/search-elser.mdx
index 7440684..b711901 100644
--- a/serverless/partials/search-elser.mdx
+++ b/serverless/partials/search-elser.mdx
@@ -1,10 +1,9 @@
 
 
 
-ELSER text embeddings can be queried using a 
-[text expansion query](((ref))/query-dsl-text-expansion-query.html). The text expansion 
-query enables you to query a rank features field or a sparse vector field, by 
-providing the model ID of the NLP model, and the query text:
+ELSER text embeddings can be queried using a
+[sparse vector query](((ref))/query-dsl-sparse-vector-query.html). The sparse vector
+query enables you to query a sparse vector field, by providing an inference ID, and the query text:
 
 ```bash
 curl -X GET "${ES_URL}/my-index/_search" \
@@ -13,10 +12,10 @@ curl -X GET "${ES_URL}/my-index/_search" \
 -d'
 {
    "query":{
-      "text_expansion":{
-         "my_tokens":{  [^1]
-            "model_id":".elser_model_2",
-            "model_text":"the query string"
+      "sparse_vector":{
+         "field": "my_tokens",  [^1]
+         "inference_id": "my-elser-endpoint",
+         "query": "the query string"
          }
       }
    }