ucealnstats

Michael G. Campana, 2020
Smithsonian Institution

Script to calculate taxon-specific alignment statistics from a collection of ultraconserved element (UCE) alignments

Creative Commons 0 Waiver

To the extent possible under law, the Smithsonian Institution has waived all copyright and related or neighboring rights to ucealnstats; this work is published from the United States. You should have received a copy of the CC0 legal code along with this work. If not, see http://creativecommons.org/publicdomain/zero/1.0/.

Citation

We politely request that you cite this script as:
Campana, M.G. 2020. ucealnstats. Smithsonian Institution. https://github.com/campanam/ucealnstats.

Installation

Clone the repository: git clone https://github.com/campanam/ucealnstats
Make the ucealnstats.rb script executable: chmod +x ucealnstats/ucealnstats.rb
Move the ucealnstats.rb script to a target destination: mv ucealnstats/ucealnstats.rb <destination>

Usage

Input

The ucealnstats.rb script expects the UCE alignments to be contained within a single directory. UCE alignments should be in NEXUS format with '.nex' or '.nexus' extensions.

Execution

Execute the script using the command ruby ucealnstats.rb <directory of UCE alignments>. Output can be redirected from standard output to a file using >, e.g. the command ruby ucealnstats.rb test_dir > test_results.tsv will collect alignment statistics from UCE alignments within the directory 'test_dir' and print them to the file 'test_results.tsv'.

Output

The script will output results in tab-separated values (TSV) format.

The script will print the following overall statistics:

Results for : Name of target directory analyzed by the script.
Total No. of Loci: Total number of UCE alignments with the target directory.
Total UCE Alignment Length: Total concatenated length (in bp) of UCE alignments.

The script will then print per-sample alignment statistics:

Sample: Name of the sample.
CapturedUCEs: Number of UCE alignments that included the sample.
MissingUCEs: Number of UCE alignments that excluded the sample (i.e. 100% missing data for a sample at that locus).
GappedAlignmentLength: Total concatenated UCE alignment length for the sample including gaps due to indels, but excluding missing data.
UngappedAlignmentLength: Total concatenated UCE alignment length for the sample excluding both gaps due to indels and missing data.
TotalLengthCapturedUCEs: Total length of UCE alignments for which some data for the sample was generated (including gaps and missing data).
MeanGapped(Missing): Sample mean gapped UCE lengths including missing loci.
MeanUngapped(Missing): Sample mean ungapped UCE lengths including missing loci.
MeanGapped(NoMissing): Sample mean gapped UCE lengths excluding missing loci.
MeanUngapped(NoMissing): Sample mean ungapped UCE lengths excluding missing loci.
Coverage(Missing): Sample coverage including missing loci defined as GappedAlignmentLength/Total UCE alignment length.
Coverage(NoMissing): Sample coverage excluding missing loci defined as GappedAlignmentLength/TotalLengthCapturedUCEs.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
ucealnstats.rb		ucealnstats.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ucealnstats

Creative Commons 0 Waiver

Citation

Installation

Usage

Input

Execution

Output

About

Releases

Packages

Languages

License

campanam/ucealnstats

Folders and files

Latest commit

History

Repository files navigation

ucealnstats

Creative Commons 0 Waiver

Citation

Installation

Usage

Input

Execution

Output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages