Skip to content

Migration Plan

amy wieliczka edited this page Oct 19, 2020 · 7 revisions

Migration Pieces

Code Migration

  • Fetcher migration
  • Mapper migration
  • Enrichment migration
  • Migrate image harvester
  • Migrate deep harvester

Data Migration

  • Do we want to run the harvest operation for all collections in Calisphere, or come up with some sort of migrate script to pull data from Solr into ElasticSearch?

UI Migration

  • Setup Calisphere UI to use ES instead of Solr

Attention Migration

  • Balance efforts on the existing harvester with efforts on Pachamama

Migration Options

Add a Pachamama -> Solr connector

Pachamama would output to both the existing Solr stage AND ElasticSearch. The existing Solr index will contain collections harvested through v1 and through Pachamama. The ElasticSearch index would only contain collections harvested through Pachamama. This setup would involve running the following components during the migration:

  • v1 harvest platform (including rq, workers etc)
  • pachamama platform
  • couchdb-stg (output of v1)
  • couchdb-prd (duplicate of couchdb-stg when couchdb-stg in good spot for a collection)
  • solr-stg (output of couchdb-stg and pachamama)
  • solr-prd (from couchdb-prd)
  • es-stg (output of pachamama)
  • cali-test (hooked up to solr-stg)
  • cali-prd (hooked up to solr-prd)
  • cali-test-es (hooked up to es-stg)

once pachamama platform complete, we'd add:

  • es-prd (duplicate of es-stg when es-stg in good spot)
  • cali-prd-es (hooked up to es-stg)

then swap the CName for cali-prd-es and cali-prd and:

  • retire v1 harvest platform
  • retire couchdb-stg
  • retire couchdb-prd
  • retire solr-stg
  • retire solr-prd
  • retire cali-prd
  • retire cali-test

Build a Solr -> ES converter + modify the Calisphere UI to use ES

The ES index will contain collections harvested through v1 and through Pachamama. The Solr index would only contain collections harvested through v1. This setup would involve running the following components during the migration:

  • v1 harvest platform (including rq, workers etc)
  • pachamama platform
  • couchdb-stg (contains data from v1 harvest)
  • solr-stg (output of couchdb-stg)
  • es-test (output of pachamama harvest)
  • es-stg (contains data from v1 harvest and pachamama harvest)
  • es-prd (contains data from v1 harvest and pachamama harvest)
  • cali-test (modified to work off es-stg)
  • cali-prd (modified to work off es-prd)
  • cali-pachamama (modified to work off es-test)

In both of these cases, re-harvests and new collections on existing v1 mappers and fetchers that have not yet been migrated would continue to run through the v1 stack, while new fetcher and mapper types would get added to Pachamama and existing fetchers and mappers would be migrated to Pachamama. We would need some way to let operators know which system to use (especially for new harvest types which would exclusively exist in Pachamama).

In the 1st case, we'd be able to verify the accuracy of migrated fetchers and mappers by comparing data in the ES index to data in the Solr Index. In the 2nd case, we'd have to