Michael G. Campana, 2020
Smithsonian Institution
Script to calculate taxon-specific alignment statistics from a collection of ultraconserved element (UCE) alignments
To the extent possible under law, the Smithsonian Institution has waived all copyright and related or neighboring rights to ucealnstats; this work is published from the United States. You should have received a copy of the CC0 legal code along with this work. If not, see http://creativecommons.org/publicdomain/zero/1.0/.
We politely request that you cite this script as:
Campana, M.G. 2020. ucealnstats. Smithsonian Institution. https://github.com/campanam/ucealnstats.
Clone the repository: git clone https://github.com/campanam/ucealnstats
Make the ucealnstats.rb script executable: chmod +x ucealnstats/ucealnstats.rb
Move the ucealnstats.rb script to a target destination: mv ucealnstats/ucealnstats.rb <destination>
The ucealnstats.rb script expects the UCE alignments to be contained within a single directory. UCE alignments should be in NEXUS format with '.nex' or '.nexus' extensions.
Execute the script using the command ruby ucealnstats.rb <directory of UCE alignments>
. Output can be redirected from standard output to a file using >
, e.g. the command ruby ucealnstats.rb test_dir > test_results.tsv
will collect alignment statistics from UCE alignments within the directory 'test_dir' and print them to the file 'test_results.tsv'.
The script will output results in tab-separated values (TSV) format.
The script will print the following overall statistics:
- Results for : Name of target directory analyzed by the script.
- Total No. of Loci: Total number of UCE alignments with the target directory.
- Total UCE Alignment Length: Total concatenated length (in bp) of UCE alignments.
The script will then print per-sample alignment statistics:
- Sample: Name of the sample.
- CapturedUCEs: Number of UCE alignments that included the sample.
- MissingUCEs: Number of UCE alignments that excluded the sample (i.e. 100% missing data for a sample at that locus).
- GappedAlignmentLength: Total concatenated UCE alignment length for the sample including gaps due to indels, but excluding missing data.
- UngappedAlignmentLength: Total concatenated UCE alignment length for the sample excluding both gaps due to indels and missing data.
- TotalLengthCapturedUCEs: Total length of UCE alignments for which some data for the sample was generated (including gaps and missing data).
- MeanGapped(Missing): Sample mean gapped UCE lengths including missing loci.
- MeanUngapped(Missing): Sample mean ungapped UCE lengths including missing loci.
- MeanGapped(NoMissing): Sample mean gapped UCE lengths excluding missing loci.
- MeanUngapped(NoMissing): Sample mean ungapped UCE lengths excluding missing loci.
- Coverage(Missing): Sample coverage including missing loci defined as GappedAlignmentLength/Total UCE alignment length.
- Coverage(NoMissing): Sample coverage excluding missing loci defined as GappedAlignmentLength/TotalLengthCapturedUCEs.