-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr Schema Upgrades - how to handle in Indexer? #52
Comments
In the past, we have often upgraded a solr schema.xml file without reindexing the entire corpus, especially for DataONE that has millions of versions of documents. That said, SOLR really recommends against this, particularly for major version upgrades. They discuss this here, specifically dealing with schema changes, solrconfig changes, and version upgrades: https://solr.apache.org/guide/solr/latest/indexing-guide/reindexing.html One mechanism to avoid downtime is to reindex to a new SOLR collection, so that the old index continues to exist. Once the new collection is available, then we can use a collection alias to atomically switch the service from the old index content to the new content. When thinking about a SOLR cloud update, it would be good to contemplate a rolling update strategy where 1) existing SOLR pods continue operating and serving an existing collection index, probably in read-only mode; 2) a new version of the service is rolled out, and the new PODS start up and begin reindexing the content into a new collection; 3) when the reindex is complete, the old pods are brought down, and the new pods begin serving requests, possibly after renaming with a collection alias. Extra bonus points for tying this seamlessly into Kubernetes rolling updates in a way that permits rollback to the original pods and indexed collection if for some reason the upgrade is not successful. Of course, this is an ideal world -- we can manually manage this transition as well if such a set of features would significantly delay release. |
Related background (but with elasticsearch): https://developers.soundcloud.com/blog/how-to-reindex-1-billion-documents-in-1-hour-at-soundcloud |
Notes from related discussion in ESS-DIVE meeting: |
This is important for 3.0.0 release, since that release involves a schema upgrade |
Example
For 3.0.0 release, this will be manual, and we can choose not to reindex for huge corpuses. No end-user impact other than not being able to access that new info in metacatui (eg new license field) |
Manage manually for 3.0; automate for 3.1 |
We need to decide upon and implement how we handle solr schema upgrades and their associated reindex actions.
The text was updated successfully, but these errors were encountered: