-
Notifications
You must be signed in to change notification settings - Fork 1
ridgelab/SelecT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
____ _ _____ / ___| ___| | ___ __|_ _| \___ \ / _ \ |/ _ \/ __|| | ___) | __/ | __/ (__ | | |____/ \___|_|\___|\___||_| Created by RidgeLab Group, BYU Bioinformatics Table of Contents ----------------- I. Introduction II. Installation Instructions III. Usage Instructions and Examples IV. Funding and Acknowledgements V. Contact I. Introduction --------------- SelecT is a software tool developed to ___. Please see our paper in __journal__ for further information: http://sub-domain.domain.tld/some/path/to/resource II. Installation Instructions ----------------------------- To install SelecT, first ensure the Java Runtime Environment (JRE) is installed on your machine. Second, download the software from the git repository as follows: git clone https://github.com/ridgelab/SelecT.git III. Usage Instructions and Examples ------------------------------------- Instructions are created for use on a high-performance computing cluster. Modifications for individual setup may be necessary. The pipeline has been divided into three phases. Please note, a log file will be created in the current working directory when any part of SelecT is run. Should the same part of SelecT be run again in the same directory, a number (first `1', then `2', etc.) will be appended to the new logfile name so as to avoid collisions. -------------------------------- | PHASE 1 -- Environment Setup | -------------------------------- Required Positional Arguments: [1] Data Directory Directory should contain all phased VCF or HAP/LEGEND file required for selection analysis. File names must contain proper flags and file extensions. Include optional (but highly recommended) Ancestral data embedded in VCF files or as separate LEGEND/EMF file [2] Map Directory Directory that contains all required genetic map files for SelecT analysis. File names must contain proper chromosome flags. [3] Start Chromosome Must be a number between 1-22; sex chromosomes not yet supported. [4] End Chromosome Must be a number between 1-22 and greater or equal to Start Chromosome; sex chromosomes not yet supported. [5] Target Population Population identifier for experimental population TST can be used if no standard indentifier exists [6] Cross Population Population identifier for cross population TST can be used if no standard indentifier exists. Cross Population cannont be the same as Target Population. Optional Arguments: --out_pop Outgroup Population Population identifier for outgroup population. TST can be used if no standard indentifier exists. Outgroup Population cannont be the same as Target Population. --working_dir Working Directory Defines the directory where SelecT will create a new working directory. Default is current directory. --win_size Window Size For changing SelecT analysis window size (in megabases. Default is 0.5Mb. Examples: java -Xmx[MB]m -jar EnviSetup.jar [1] [2] [3] [4] [5] [6] \ --working_dir=path/to/directory java -Xmx3000m -jar EnviSetup.jar example/haplegend_data example/map 21 21 CEU YRI \ --working_dir=example ----------------------------------- | PHASE 2 -- Calculate Statistics | ----------------------------------- Required Positional Arguments: [1] Working Directory SelecT working directory created in Phase 1. Working Directory name can be changed but subdirectory names must be unchanged. [2] Simulation Directory Directory where simulations can be found. Must contain simulation file neutral_simulation.tsv and selection_simulation.tsv. These can be found here: https://github.com/ridgelab/SelecT/tree/master/example/sim [3] Chromosome Chromosome number where window can be found [4] Window Number Window index number as defined by SelecT evironment setup. See SelecT_workspace/envi_files/all_wins for window ranges. Optional Arguments: -inon Non-absolute iHS Runs iHS score probabilities where large negative scores ONLY are associated with selection (replicate CMS_local). Defualt is large positive AND negative iHS scores equate to greater selection (Voight). --prior_prob Prior Probability Set Prior Probability to a custom value between 0.0 and 1. Defaults to (1 / actual number of variants within window). -pp Prior Probability Flag Sets Prior Probability to 1/10,000 (replicate CMS_local). --daf_cutoff DAF Cutoff Defines the derived dllele frequency cutoff for compose score. Defaults to a DAF value of 0.2. Special case: DAF value of 0.0 indicates incomplete MoP score calculation, PoP is unchanged. Exmaples: java -Xmx[MB]m -jar StatsCalc.jar [1] [2] [3] [4] java -jar StatsCalc.jar example/SelecT_workspace example/sim 21 2 Note, this could be run by hand on a local machine, but we imagine automating the process a bit. A simple example SLURM script is provided (`selection.slurm') for running StatsCalc.jar for a single window. A possible bash script for submitting this SLURM script for each window to a high-performance computing cluster running SLURM might look like this: #! /bin/bash chromosome=21 for window in {0..3} do sbatch selection.slurm $chromosome $window done exit 0 ----------------------------------- | PHASE 3 -- Analyze Significance | ----------------------------------- Required Positional Arguments: [1] Working Directory SelecT working directory created in Phase 1. Working Directory name can be changed but subdirectory names must be unchanged. [2] Chromosome Chromosome number where window can be found Optional Arguments: -co Combine Only Only runs first half of analysis where windows are combined into one file. --combine_fltr Combine Filter Uses a specific filter for printing specific combination of stats in output. Can only be used in conjunction with the -co flag i=iHS, x=XPEHH, h=iHH, dd=dDAF, d=DAF, f=Fst, up=unstd_PoP, um=unstd_MoP, p=PoP, m=MoP. Each tag should be separated by a colon (i.e. i:x:h:dd:d:f:up:um:p:m). --p_value p_Value Sets the p-value cutoff for significance check on composite scores. Defaults to 0.01 -wc Write Combine Similar to -co, but also runs significance filtering -rn Normalization Runs normalization step across the entire dataset/chromosome. Normalizes by standardization (mean 0; standard deviation 1). -ui Use Incomplete Use incomplete data when analyzing MoP scores -im Ignore MoP Ignores all MoP scores and finds significance based upon PoP only. -ip Ignore PoP Ignores all PoP scores and finds significance based upon MoP only. If both -im and -ip flags are present significance is found by looking at either PoP or MoP. If neither -im or -ip flag is present significance is found by looking at both PoP and MoP. Exmaples: java -Xmx[MB]m -jar SignificanceAnalyzer.jar [1] [2] java -jar SignificanceAnalyzer.jar example/SelecT_workspace 21 IV. Funding and Acknowledgements ------------------------------- Funding for the research and production of this software was provided by startup funds to Perry Ridge, Ph.D. V. Contact ----------- For questions, comments, concerns, feature requests, suggestions, etc., please contact: Hayden Smith -- smithinformatics@gmail.com Pery Ridge, Ph.D. -- perry.ridge@byu.edu Note: For usage questions, please consult section `III. Usage Instructions and Examples' first.
About
A genome-wide study of evolution
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published