Skip to content

Latest commit

 

History

History
 
 

ingestion

Pipelines

Module uses Apache Beam as an unified programming model to define and execute data processing pipelines

Module structure:

  • export-gbif-hbase - The pipeline to export the verbatim data from the GBIF HBase tables and save as ExtendedRecord avro files
  • ingest-gbif-beam - Main GBIF pipelines for ingestion of biodiversity data
  • ingest-gbif-fragmenter - Writes raw archive's data to HBase store
  • ingest-gbif-java - Main GBIF pipelines for ingestion of biodiversity data, Java version