Migration Plan

Migration Pieces

Code Migration

Fetcher migration
Mapper migration
Enrichment migration
Migrate image harvester
Migrate deep harvester

Data Migration

Do we want to run the harvest operation for all collections in Calisphere, or come up with some sort of migrate script to pull data from Solr into ElasticSearch?

UI Migration

Setup Calisphere UI to use ES instead of Solr

Attention Migration

Balance efforts on the existing harvester with efforts on Pachamama

Migration Options

Add a Pachamama -> Solr connector

Pachamama would output to both the existing Solr stage AND ElasticSearch. The existing Solr index will contain collections harvested through v1 and through Pachamama. The ElasticSearch index would only contain collections harvested through Pachamama. This setup would involve running the following components during the migration:

v1 harvest platform (including rq, workers etc)
pachamama platform
couchdb-stg (output of v1)
couchdb-prd (duplicate of couchdb-stg when couchdb-stg in good spot for a collection)
solr-stg (output of couchdb-stg and pachamama)
solr-prd (from couchdb-prd)
es-stg (output of pachamama)
cali-test (hooked up to solr-stg)
cali-prd (hooked up to solr-prd)
cali-test-es (hooked up to es-stg)

once pachamama platform complete, we'd add:

es-prd (duplicate of es-stg when es-stg in good spot)
cali-prd-es (hooked up to es-stg)

then swap the CName for cali-prd-es and cali-prd and:

retire v1 harvest platform
retire couchdb-stg
retire couchdb-prd
retire solr-stg
retire solr-prd
retire cali-prd
retire cali-test

Build a Solr -> ES converter + modify the Calisphere UI to use ES

The ES index will contain collections harvested through v1 and through Pachamama. The Solr index would only contain collections harvested through v1. This setup would involve running the following components during the migration:

v1 harvest platform (including rq, workers etc)
pachamama platform
couchdb-stg (contains data from v1 harvest)
solr-stg (output of couchdb-stg)
es-test (output of pachamama harvest)
es-stg (contains data from v1 harvest and pachamama harvest)
es-prd (contains data from v1 harvest and pachamama harvest)
cali-test (modified to work off es-stg)
cali-prd (modified to work off es-prd)
cali-pachamama (modified to work off es-test)

In both of these cases, re-harvests and new collections on existing v1 mappers and fetchers that have not yet been migrated would continue to run through the v1 stack, while new fetcher and mapper types would get added to Pachamama and existing fetchers and mappers would be migrated to Pachamama. We would need some way to let operators know which system to use (especially for new harvest types which would exclusively exist in Pachamama).

In the 1st case, we'd be able to verify the accuracy of migrated fetchers and mappers by comparing data in the ES index to data in the Solr Index. In the 2nd case, we'd have to

Provide feedback

Saved searches

Use saved searches to filter your results more quickly