pipelines/gbif/identifiers/diagnostics at master · biodiversitydata-se/pipelines

Name	Name	Last commit message	Last commit date
parent directory ..
src	src
README.md	README.md
pom.xml	pom.xml

How to use:

Please use pipelines-gbif-id-migrator script from a crawler vm to migrate GBIF identifiers:

Keys:

-f - Registry source datasetKey (usually identical for from-dataset and to-dataset keys)
-t - Registry new datasetKey (usually identical for from-dataset and to-dataset keys)
-p - Path to the csv file (Read more about file structure down below)

Example:

cd utils
./pipelines-gbif-id-migrator -f DATSET_KEY -t DATSET_KEY -p PATH/TO/CSV/FILE.csv

The diagnostics tool

Note: To assemble diagnostics tool artifact you have to activate maven profile

mvn clean package -P extra-artifacts

Diagnostics tool contains 3 key features:

MIGRATOR - when you need to migrate GBIF identifiers from old occurrence_id/triplet to the new occurrence_id/triplet
REPAIR - (DEPRECATED) fixing GBIF ID collisions by removing GBIF identifier value from triplet or occurrence_id
LOOKUP - when you want to print out the values in occurrenceID and the triplet in each crawl attempt DwC-A

Display HELP information

java -jar target/diagnostics-VERSION-SNAPSHOT-shaded.jar --help

MIGRATOR

Migrator source file - By defualt file format is CSV, the file must be without header and contain two rows with values, where first value is old_occurrence_id, second is the new_occurrence_id, exaple:

0000001,GLM-P-0000001
0000002,GLM-P-0000002
0000003,GLM-P-0000003

Note: that it is possible to migrate GBIF identifiers using triplet value, the file must have the following format, including null at the end:

old_institutionCode|old_collectionCode|old_catalogNumber|null,new_institutionCode|new_collectionCode|new_catalogNumber|null

SMNG|GLM|1|null,SMNG|GLM|GLM-P-0000001|null
SMNG|GLM|2|null,SMNG|GLM|GLM-P-0000002|null
SMNG|GLM|3|null,SMNG|GLM|GLM-P-0000003|null

Keys:

tool - name for the tool, must be MIGRATOR
zk-connection - Zookeeper connection string for HBase
lookup-table - HBase lookup table name
counter-table - HBase counter table name
occurrence-table - HBase occurrence table name
from-dataset - Registry source datasetKey (usually identical for from-dataset and to-dataset keys)
to-dataset - Registry new datasetKey (usually identical for from-dataset and to-dataset keys)
file-path - Path to the csv file
delete-keys - (Optional) Deletes GBIF identifiers if they have been created for new occurrence_id
skip-issues - (Optional) Continue the processing when an issue appears
splitter - (Optional) Default is comma (,)

Example:

java -jar diagnostics-VERSION-SNAPSHOT-shaded.jar \
    --tool MIGRATOR \
    --zk-connection ZK_CONNECTION_STING \ 
    --lookup-table OCCURRENCE_LOOKUP_TABLE_NAME \
    --occurrence-table OCCURRENCE_TABLE_NAME \
    --counter-table OCCURRENCE_COUNTER_TABLE_NAME \ 
    --skip-issues \
    --delete-keys \
    --from-dataset e330e2ff-9816-482e-aceb-27f2b3cc05c4 \ 
    --to-dataset e330e2ff-9816-482e-aceb-27f2b3cc05c4 \
    --file-path changesOccurrenceIDHemipteraUMAG.csv \

REPAIR - (DEPRECATED)

The feature is deprecated, becuase current GBIF identifier workflow doesn't rely on couple triplets+occurrence_id, but only occurrence_id

java -jar target/diagnostics-VERSION-SNAPSHOT-shaded.jar \
  --dataset-key DATASET_REGISTY_KEY \
  --input-source /full/path/DATASET_REGISTY_KEY/DATASET_REGISTY_KEY.dwca \
  --zk-connection ZK_CONNECTION_STRING \
  --lookup-table LOOKUP_TABLE \
  --counter-table COUNTER_TABLE \
  --occurrence-table OCCURRENCE_TABLE \
  --deletion-strategy BOTH \
  --only-collisions \
  --dry-run

LOOKUP

// EMPTY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diagnostics

diagnostics

README.md

How to use:

Please use pipelines-gbif-id-migrator script from a crawler vm to migrate GBIF identifiers:

The diagnostics tool

Display HELP information

MIGRATOR

REPAIR - (DEPRECATED)

LOOKUP

Files

diagnostics

Directory actions

More options

Directory actions

More options

Latest commit

History

diagnostics

Folders and files

parent directory

README.md

How to use:

Please use pipelines-gbif-id-migrator script from a crawler vm to migrate GBIF identifiers:

The diagnostics tool

Display HELP information

MIGRATOR

REPAIR - (DEPRECATED)

LOOKUP