Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kmexter authored May 17, 2024
1 parent 73e5290 commit b2344ea
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The taxonomic assignments from PEMA for COI and 18S from the batch 1 processing of the ARMS-MBON data have been curated by us to accommodate some issues:
* Due to a bug in V2.1.4 of PEMA (subsequently fixed), the assignments in the **Extended_final_table_XX.xlsx files for COI** in the [taxonomic_assignments folder](https://github.com/arms-mbon/data_workspace/tree/main/analysis_data/from_pema/processing_batch1/taxonomic_assignments) are only denoted to the genus level. The species-level assignments can be found in the tax_assignments files that accompany those final tables. We have extracted those species-level assignments and inserted those, together with newly-minted species-level NBCI IDs, into new Extended_final_table_XX.csv files (note: CSV rather than XLSX). The code to do this and the new tables can be found here.
* For the **Extended_final_table_XX.xlsx files for 18S**, the taxonomy returned from the PR2 database is not as straightforward to compare to taxonomies from other databases due to its unique organisation of taxon nodes used. In order to make more sensible use of these taxonomy results for our subsequent need to match the taxonomic assigments to WoRMS (World Register of Marine Species), we have undertaken a curation of the taxonomic classification for 18S. The code for doing this curation, and the outputs from the curation (being new Extended_final_table_XX.csv files -- note: CSV rather than XLSX -- and files comparing the previous to the new taxonomic classifications) can be found here. It is up to the user to decide whether they wish to adopt these curations for their own work, or not. For the specific case of 18S taxonomy, strings assigned by the PR2 database were curated as follows:
* For the **Extended_final_table_XX.xlsx files for 18S**, the taxonomy returned from the PR2 database is not as straightforward to compare to taxonomies from other databases due to its unique organisation of taxon nodes used. In order to make more sensible use of these taxonomy results for our subsequent need to match the taxonomic assigments to WoRMS (World Register of Marine Species), we have undertaken a curation of the taxonomic classification for 18S. The code for doing this curation and the outputs from the curation can be found here: we have created new __Extended_final_table_XX_TaxonomyCurated.csv__ files (note: CSV rather than XLSX); and a comparison of the previous and new taxonomic classifications can be found in the files called __XXX_TaxonomyCompared.csv__ (e.g. April2021_18S_noBlank_TaxonomyCurated.csv). It is up to the user to decide whether they wish to adopt these curations for their own work, or not. For the specific case of 18S taxonomy, strings assigned by the PR2 database were curated as follows:
* Separate taxonomy strings into separate columns by ";"
* Strings partially containing "var." will be entirely repalced by "var."
* Strings containing a space, "XX" or "sp." are set as NA (this step is missing a couple of cases where species assignments are actually present but are in such a cryptic format that no general code that worked on all other cases could also retrieve those ones. This happens for cases where genus and species are for example in the following format: Phascolopsis;Phascolopsis (strain);gouldii (Phascolopsis (strain)). The species part at the end is not recognized with the code above and we could not come up with a rule that fits takes care of all other cases as well as this one. We just had to accept this as trade-off. The taxonomy strings from the PR2 database are just too cryptic in some cases.).
Expand Down

0 comments on commit b2344ea

Please sign in to comment.