-
Notifications
You must be signed in to change notification settings - Fork 216
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a guide for upgrading and migrating ES indices (#3321)
Co-authored-by: Olga Bulat <obulat@gmail.com> Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
- Loading branch information
1 parent
fa2fac4
commit 57bf632
Showing
6 changed files
with
561 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,4 +8,7 @@ config | |
mapping | ||
test | ||
deploy | ||
migrate | ||
upgrade | ||
troubleshoot | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,256 @@ | ||
# Index migration runbook | ||
|
||
From time to time, we will need to update our Elasticsearch indices. These | ||
modifications can be classified into two broad-strokes categories, depending on | ||
whether the changes affect the main consumer of the indices, the API. | ||
|
||
## Migration types | ||
|
||
### API-free | ||
|
||
These changes are safe modifications to the ES schema that do not affect the | ||
API. As such they do not need any migration process. Examples: | ||
|
||
- addition of new fields or subfields | ||
- removal of fields that are not referenced or used by the API | ||
- changing the type to another compatible type (like `text` ↔ `keyword` ) | ||
|
||
For API-free changes, we deploy the ingestion server and perform one of the two: | ||
|
||
- standard data-refresh (either triggered manually or as scheduled) | ||
- [manual index upgrade](/ingestion_server/guides/upgrade.md) | ||
|
||
The indices will be updated to the new schema and will be made available to the | ||
API. | ||
|
||
### API-involved | ||
|
||
These changes are modifications to fields that already are in use by the API and | ||
involve code changes in both the ingestion server and the API. Examples: | ||
|
||
- removal of a field | ||
- changing the type to an incompatible type | ||
- renaming of a field | ||
|
||
Such kinds of changes need us to precisely deploy the API in coordination with | ||
the promotion of new index because of these reasons: | ||
|
||
- If the API deployment lags behind index promotion, the old field that the API | ||
uses will disappear. | ||
- If the API deployment leads ahead of index promotion, the new field the API | ||
uses will not be present. | ||
|
||
This runbook documents guidelines and processes for API-involved migrations. | ||
|
||
Our goal is to break down an API-involved change into multiple small, atomic | ||
changes with each step affecting at most one of the ingestion server or the API | ||
and ensuring that the API and ES remain compatible throughout the process. | ||
|
||
## Pull request guidelines | ||
|
||
A change that involves modification to the ES index as well as its usage in the | ||
API requires at least three steps, each associated with exactly one PR that | ||
modifies exactly one of the ingestion server or the API to allow them to be | ||
deployed independently. | ||
|
||
1. Change the ES index mapping in the ingestion server. Ensure that the change | ||
is purely additive, keeping the old fields unchanged and creating new fields | ||
that contain the data the API will need. | ||
|
||
This PR should make changes only within the `ingestion_server/` directory, | ||
more specifically the following two files concerned with ES mappings and | ||
document schemas: | ||
|
||
- [`es_mappings.py`](https://github.com/WordPress/openverse/tree/main/ingestion_server/ingestion_server/es_mapping.py) | ||
- [`elasticsearch_models.py`](https://github.com/WordPress/openverse/tree/main/ingestion_server/ingestion_server/elasticsearch_models.py) | ||
|
||
2. Update the API code to reference and use the new ES fields added in the | ||
previous step. Ensure that the old fields become unreferenced. | ||
|
||
The PR should make changes only within the `api/` directory. | ||
|
||
3. Change the ES index mapping in the ingestion server to remove the old, | ||
now-unreferenced fields. | ||
|
||
Like PR number 1, this PR should also make changes only within the | ||
`ingestion_server/` directory. | ||
|
||
```{tip} | ||
Get the PRs reviewed in advance so that the entire process has been vetted by | ||
the team and there are no surprises or delays when the plans have been set into | ||
motion. | ||
``` | ||
|
||
```{caution} | ||
Each PR in the chain should branch from, and point to, its predecessor in the | ||
chain so that CI continues to pass for each PR. | ||
``` | ||
|
||
### Example | ||
|
||
Assume we have a field `foo` with type `text` in the index. It has a subfield | ||
`keyword` with type `keyword`. The API uses `foo.keyword` for all purposes. We | ||
want the `foo` field to have type `keyword` and for the API to use `foo` instead | ||
of `foo.keyword`. To accomplish this without downtime, we need three PRs: | ||
|
||
1. Changing `foo` to type `keyword` would be an API-free change because it is a | ||
type change between two compatible types and does not affect the nested field | ||
`foo.keyword` that is in use by the API. Technically the outer field can be | ||
assumed to be "new" because it was not being used at all. | ||
|
||
2. Then we make an API change to use `foo` directly instead of `foo.keyword`. | ||
Any other accommodations to make use of `foo` can be made in this step. In | ||
this case `foo` will be the same as `foo.keyword` so no other changes will be | ||
needed. | ||
|
||
3. Removal of the `foo.keyword` field would now also be an API-free change | ||
because the field would no longer be in use. | ||
|
||
## Migration process | ||
|
||
The entire migration process can be classified into 3 phases. | ||
|
||
```{mermaid} | ||
flowchart TD | ||
subgraph api[API] | ||
API | ||
end | ||
subgraph elasticsearch[Elasticsearch] | ||
image --> image-old | ||
image-filtered --> image-old-filtered | ||
audio --> audio-old | ||
audio-filtered --> audio-old-filtered | ||
end | ||
API --> image | ||
API --> image-filtered | ||
API --> audio | ||
API --> audio-filtered | ||
``` | ||
|
||
### Create the new fields | ||
|
||
1. Merge [PR number 1](#pull-request-guidelines). | ||
2. Perform a [manual index upgrade](/ingestion_server/guides/upgrade.md). | ||
|
||
At the close of this phase we have all the new information for the API to use. | ||
|
||
```{mermaid} | ||
flowchart TD | ||
subgraph api[API] | ||
API | ||
end | ||
subgraph elasticsearch[Elasticsearch] | ||
image -.-> image-old | ||
image-filtered -.-> image-old-filtered | ||
audio -.-> audio-old | ||
audio-filtered -.-> audio-old-filtered | ||
image --> image-mid | ||
image-filtered --> image-mid-filtered | ||
audio --> audio-mid | ||
audio-filtered --> audio-mid-filtered | ||
end | ||
API --> image | ||
API --> image-filtered | ||
API --> audio | ||
API --> audio-filtered | ||
style image-old opacity:0.3 | ||
style image-old-filtered opacity:0.3 | ||
style audio-old opacity:0.3 | ||
style audio-old-filtered opacity:0.3 | ||
``` | ||
|
||
### Use the new fields instead of the old | ||
|
||
1. Merge [PR number 2](#pull-request-guidelines). This will automatically deploy | ||
the API to staging. | ||
2. Verify that the staging API continues to work. | ||
3. [Deploy the API](/api/guides/deploy.md) to production. | ||
4. Verify that the production API continues to work. | ||
|
||
At the close of this phase the API is exclusively using the new fields and the | ||
old ones have become unreferenced. | ||
|
||
```{mermaid} | ||
flowchart TD | ||
subgraph api[API] | ||
old[API] | ||
new[New API] | ||
end | ||
subgraph elasticsearch[Elasticsearch] | ||
image --> image-mid | ||
image-filtered --> image-mid-filtered | ||
audio --> audio-mid | ||
audio-filtered --> audio-mid-filtered | ||
end | ||
old -.-> image | ||
old -.-> image-filtered | ||
old -.-> audio | ||
old -.-> audio-filtered | ||
new --> image | ||
new --> image-filtered | ||
new --> audio | ||
new --> audio-filtered | ||
style old opacity:0.3 | ||
``` | ||
|
||
### Remove the old fields | ||
|
||
1. Merge [PR number 3](#pull-request-guidelines). | ||
2. Perform a [manual index upgrade](/ingestion_server/guides/upgrade.md). | ||
|
||
```{mermaid} | ||
flowchart TD | ||
subgraph api[API] | ||
new[New API] | ||
end | ||
subgraph elasticsearch[Elasticsearch] | ||
image -.-> image-mid | ||
image-filtered -.-> image-mid-filtered | ||
audio -.-> audio-mid | ||
audio-filtered -.-> audio-mid-filtered | ||
image --> image-final | ||
image-filtered --> image-final-filtered | ||
audio --> audio-final | ||
audio-filtered --> audio-final-filtered | ||
end | ||
new --> image | ||
new --> image-filtered | ||
new --> audio | ||
new --> audio-filtered | ||
style image-mid opacity:0.3 | ||
style image-mid-filtered opacity:0.3 | ||
style audio-mid opacity:0.3 | ||
style audio-mid-filtered opacity:0.3 | ||
``` | ||
|
||
You're done! | ||
|
||
```{mermaid} | ||
flowchart TD | ||
subgraph api[API] | ||
new[New API] | ||
end | ||
subgraph elasticsearch[Elasticsearch] | ||
image --> image-final | ||
image-filtered --> image-final-filtered | ||
audio --> audio-final | ||
audio-filtered --> audio-final-filtered | ||
end | ||
new --> image | ||
new --> image-filtered | ||
new --> audio | ||
new --> audio-filtered | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Troubleshooting | ||
|
||
This guide describes various manual steps to troubleshoot issues with the | ||
ingestion server's processes like database transfer, ES indexing. | ||
|
||
## Interrupt indexing | ||
|
||
The ingestion server performs indexing using indexer workers, whose primary | ||
purpose it is to create documents from the API database and index them in | ||
Elasticsearch. | ||
|
||
Indexer workers are EC2 instances that are stopped by default when indexing is | ||
not taking place. The ingestion server raises them up, provides them with the | ||
necessary information to perform the indexing and once they report back to the | ||
ingestion server with a completion message, they are shut down again. | ||
|
||
Sometimes it is necessary to manually interrupt indexing, for example to limit | ||
the size of a test/staging index. To do so, follow these steps. | ||
|
||
1. Determine the active ingestion worker machines from the AWS EC2 dashboard. | ||
They will be named `indexer-worker-(dev|prod)` and will be in the "running" | ||
state. | ||
|
||
2. SSH into the machine using it's public IP. | ||
|
||
```console | ||
$ ssh ec2-user@<public-ip> | ||
``` | ||
|
||
3. Determine the name of the active `indexer_worker` container and pause it. | ||
|
||
```console | ||
$ docker ps | ||
$ docker pause <container_id> | ||
``` | ||
|
||
4. Repeat steps 2 and 3 for each active ingestion worker machine. Leave the SSH | ||
sessions open. | ||
|
||
5. Wait for a few minutes and keep an eye on the document count in the | ||
Elasticsearch index that was currently being created. It may increase a | ||
little because of timing effects but should stop after a few minutes. | ||
|
||
6. From each of the open SSH sessions, send a completion notification to the | ||
ingestion server's internal IP address. | ||
|
||
```console | ||
$ curl \ | ||
-X POST \ | ||
-H "Content-Type: application/json" \ | ||
-d '{"error":false}' \ | ||
http://<internal-ip>:8001/worker_finished | ||
``` | ||
|
||
7. Terminate the SSH sessions and stop the indexer worker EC2 machines from the | ||
AWS EC2 dashboard. | ||
|
||
8. The ingestion server will the instruct ES to start the next step of indexing, | ||
i.e. replication. |
Oops, something went wrong.