Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kmexter authored May 17, 2024
1 parent 57ffe87 commit b87ee10
Showing 1 changed file with 3 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ The __Extended_final_table__ files contain the following information:
* ASV/OTU identifiers of the following format:
* For COI: __ASV_XY:ID__. The ID part matches the ID in the corresponding __tax_assignments__ and __fasta__ files. The "ASV" part of the identifier is unique _within_ a single PEMA run, while the "ID" part is unique for each DNA sequence and can therefore occur in files across processing runs. This means that while the "ASV_1" prefix for example can occur in each sequencing run, it does not necessarily represent the same DNA sequence in these runs, while the ID is unqiue across all runs for each unique sequence.
* For 18S and ITS: __OtuXY__. The identifiers match the identifiers in the corresponding __fasta__ files for each run and they are unique _within_ a single PEMA run. This means that while "Otu 1" for example can occur in each sequencing run, it does not necessarily represent the same DNA sequence across runs. Unique sequences across runs can be identified by matching OTUs/ASVs to the seqeuences in the fasta files. Because there was an error within PEMA at the time of usage regarding the sequence identifier format for ITS runs, the sequence identifiers in these files are of the format OtuXY. However, these sequences represent ASVs clustered with Swarm v2.
* For 18S: the taxonomic classification for 18S (PR2) is not so straightforward to compare to other taxonomies -- in particular to WoRMS, which is necessary for these data to be submitted to the EurOBIS database. Hence we have done some curation of the taxonomic classification: the output of this can be found in the [updated_taxonomic_assigments](https://github.com/arms-mbon/data_workspace/tree/main/analysis_data/from_pema/processing_batch1/updated_taxonomic_assignments) folder.

* The read counts for each ASV/OTU in each sample that was processed, i.e., columns up to the third last column represent material sample IDs.

* The second last column contains the full taxonomic classification as returned by the respective reference database in a single character string. __NOTE:__ In PEMA v2.1.4 used here, taxonomy of COI sequences is denoted only to genus level in these tables. The species-level classification is not included in these tables. To obtain species-level classification for COI gene sequences, users should refer to the __tax_assignments__ files (see below).
* The penultimate column contains the full taxonomic classification as returned by the respective reference database in a single character string. __NOTE:__ In PEMA v2.1.4 used here, taxonomy of COI sequences is denoted only to genus level in these tables. The species-level classification is not included in these tables. To obtain species-level classification for COI gene sequences, users should refer to the __tax_assignments__ files (see below). __NEW__: we have taken the full taxonomic assignments from these tax_assigments files and added them to this penultimate column in the __Extended_final_table__ files; these new files are in the [updated_taxonomic_assigments](https://github.com/arms-mbon/data_workspace/tree/main/analysis_data/from_pema/processing_batch1/updated_taxonomic_assignments) folder.

* The last column contains the NBCI taxon ID and taxon name for the lowest taxonomic level the respective ASV/OTU could be assigned to and for which and NCBI taxon ID could be found.
* The last column contains the NCBI taxon ID and taxon name for the lowest taxonomic level the respective ASV/OTU could be assigned to and for which and NCBI taxon ID could be found. __NOTE:__ due to the issue mentioned in the point above for COI, the NCBI IDs are for the genus level only. To see the NCBI IDs for the species level, see the Extended_final_tables in the [updated_taxonomic_assigments](https://github.com/arms-mbon/data_workspace/tree/main/analysis_data/from_pema/processing_batch1/updated_taxonomic_assignments) folder.

* The filenames contain:
* The date the samples were sequenced (e.g., April2021)
Expand Down

0 comments on commit b87ee10

Please sign in to comment.