Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Elastic alias for cleaner ingestion #1071

Merged
merged 5 commits into from
Oct 7, 2024
Merged

Conversation

rachaelcodes
Copy link
Contributor

@rachaelcodes rachaelcodes commented Sep 27, 2024

Context

When ingesting our files, we should be making sure we remove old artifacts. The standard ElasticSearch approach is to use an 'alias' rather than 'index' as a reference, which allows us to update the index 'behind the scenes' without affecting users. I've started updating the file ingestion to reflect this.

Changes proposed in this pull request

  • New django management commands to create and apply the new alias (via our q2 queuing system)
  • A new jupyter notebook to simplify dev testing (I will remove this and the new jupyter dependency in a later commit)
  • Fix references to elastic to make use of the alias rather than the index
  • Update automated tests
  • Check/update integration tests
  • Move the jupyter notebook to notebooks for reference and remove the jupyter poetry dependency

Guidance to review

To try this out locally, try out the commands as a Scheduled Command from the local django admin in the order listed in the jupyter notebook:

  • add_es_alias
  • reingest_files
  • change_es_aliased_index
  • delete_es_indices

You can check the current list of indices via http://localhost:9200/_cat/indices and the aliased index via http://localhost:9200/_alias/redbox-data-{environment}-chunk-current

Alternatively, you can run the notebook from the /redbox/django_app directory after running poetry install and poetry run jupyter notebook

Relevant links

Things to check

  • I have added any new ENV vars in all deployed environments
  • I have tested any code added or changed
  • I have run integration tests

@rachaelcodes rachaelcodes force-pushed the feature/elastic-aliases branch 6 times, most recently from fc4e4f0 to e54a484 Compare October 3, 2024 15:18
@rachaelcodes rachaelcodes changed the title DRAFT: first working draft of elastic updates Add Elastic alias for cleaner ingestion Oct 4, 2024

env = Settings()
alias = f"{env.elastic_root_index}-chunk-current"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should/could be part of Settings?

Copy link
Collaborator

@gecBurton gecBurton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is working me for me in dev. would be good to merge some of these commands into one as you suggest (but in a seperate PR)

@rachaelcodes rachaelcodes merged commit c78a982 into main Oct 7, 2024
15 checks passed
@rachaelcodes rachaelcodes deleted the feature/elastic-aliases branch October 7, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants