Skip to content

Latest commit

 

History

History
27 lines (19 loc) · 1.78 KB

README.md

File metadata and controls

27 lines (19 loc) · 1.78 KB

16S-pipeline

usearch based 16S community profiling pipeline for analysis of ribosomal amplicon sequencing & analysis


You will need the following tools to use this pipeline:

  • usearch7 from Rob Edgar's Drive5 site as described here:

  • Edgar, R.C. (2013) UPARSE: Highly accurate OTU sequences from microbial amplicon reads, Nature Methods Pubmed:23955772, dx.doi.org/10.1038/nmeth.2604

  • The naive Bayes RDP classifier from the RDP Project: on github as described here:

  • Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16):5261-7.

  • A working MySQL server installation (contact me for a SQLite3 version)

Getting started:

  • Edit the file globals with the paths to the above files and the MySQL host and database name; Leave the TRUNCLEN unchanged for now.
  • Look at the 0.setup script to make sure the paths to the data are correct and adjust as necessary to find the .fasta, .qual, and mapping.txt files.
  • Use the EXECUTE command to run the pipeline and review the results. In particular, pay attention to the data in the 1.quality_filter.stats.log file. Use the rules described on Rob Edgar's site, decide on the TRUNCLEN and possibly the MAXEE parameters.
  • The 1.quality_filter.stats.log file contains data on the % of reads falling into the read length bins and what % of reads are accounted for buy a bin and cummlatively. A choice needs to be made between the accumulated % of reads and the avgEE (cumulative error rate average).
  • Rerun the EXECUTE command and examine the output.