Code for cleaning and merging Siphonaptera taxonomy for the Terrestrial Parasite Tracker
The R scripts in this repository were designed for cleaning taxonomic classifications received from various sources for the Terrestrial Parasite Tracker Thematic Collections Network (TPT) Taxonomy Reource. Specific scripts were created for transforming data from each resource as well as merging the resources for review.
Loads all needed libraries and functions for other scripts. Should be run before any other scripts are run.
Transforms BYU Lewis list as updated provided by Mike Hastriter to Darwin Core
File Name | Description |
---|---|
Lewis World Species List MMM DD YYYY.xlsx | Lewis database as provided by Mike Hastriter at BYU |
Lewis_reviewed.xlsx | Names from Lewis_name_review output that have been corrected and are to be returned to the working file |
Lewis_removed.xlsx | Names from Lewis_name_review output that have been removed from the working file |
tpt_dwc_template.xlsx | Template (no data) for Darwin Core file |
File Name | Description |
---|---|
Lewis_duplicates.csv | Names removed from the original data because they were duplicates |
Lewis_name_review.csv | Names removed from the original data that need review before adding back or removing (see inputs above) |
Lewis_non_DwC.csv | Name ID plus all non Darwin Core fields from original file |
Lewis_DwC.csv | Name ID plus all applicable Darwin Core fields |
Transforms Smithsonian (NMNH) list of taxa to Darwin Core
File Name | Description |
---|---|
NMNH_Siphonaptera.xlsx | Catalog of fleas from the Smithsonian |
NMNH_reviewed.xlsx | Names from NMNH_name_review output that have been corrected and are to be returned to the working file |
tpt_dwc_template.xlsx | Template (no data) for Darwin Core file |
File Name | Description |
---|---|
NMNH_need_review.csv | Names removed from the original data that need review before adding back or removing (see inputs above) |
NMNH_non_DwC.csv | Name ID plus all non Darwin Core fields from original file |
NMNH_DwC.csv | Name ID plus all applicable Darwin Core fields |
Transforms Field Museum (FMNH) list of taxa to Darwin Core
File Name | Description |
---|---|
FMNH_Siphonaptera.xlsx | List of flea names from the Field Museum |
FMNH_reviewed.xlsx | Names from NMNH_name_review output that have been corrected and are to be returned to the working file |
tpt_dwc_template.xlsx | Template (no data) for Darwin Core file |
File Name | Description |
---|---|
FMNH_need_review.csv | Names removed from the original data that need review before adding back or removing (see inputs above) |
FMNH_non_DwC.csv | Name ID plus all non Darwin Core fields from original file |
FMNH_DwC.csv | Name ID plus all applicable Darwin Core fields |
Transforms Catalogue of Life (CoL) download to Darwin Core
File Name | Description |
---|---|
CoL_DwC.xlsx | Flea names from Catalogue of Life download |
tpt_dwc_template.xlsx | Template (no data) for Darwin Core file |
File Name | Description |
---|---|
CoL_DwC.csv | Name ID plus all applicable Darwin Core fields |
Transforms Global Biodiversity Information Facility (GBIF) download and all ofther Darwin Core files to taxotools format, then merges them and generates a checklist for expert review
File Name | Description |
---|---|
Lewis_DwC.csv | Output of Lewis_transform.r |
NMNH_DwC.csv | Output of NMNH_transform.r |
FMNH_DwC.csv | Output of FMNH_transform.r |
CoL_DwC.csv | Output of CoL_transform.r |
GBIF_Siphonaptera.xlsx | Flea names from GBIF download (already in DwC format, but still transformed a bit in this script) |
File Name | Description |
---|---|
problems.csv | Names that could not be merged and need review |
taxo_siphonaptera.csv | Merged list of names |
Flea_taxolist.html | Checklist of merged names for expert review |
Transforms Darwin Core files to Arctos hierarchical tool upload format (awaiting final list to create transform)
File Name | Description |
---|---|
Arctos_upload.csv | Template (no data) for Arctos upload |
File Name | Description |
---|
Information from a source may need to be run through the appropriate script multiple times. Any change to a primary source will require re-run and a new merge.