README

 ____       _          _____ 
/ ___|  ___| | ___  __|_   _|
\___ \ / _ \ |/ _ \/ __|| |
 ___) |  __/ |  __/ (__ | |
|____/ \___|_|\___|\___||_|

Created by RidgeLab Group, BYU Bioinformatics


Table of Contents
-----------------

  I. Introduction
 II. Installation Instructions
III. Usage Instructions and Examples
 IV. Funding and Acknowledgements
  V. Contact


I. Introduction
---------------
SelecT is a software tool developed to ___.

Please see our paper in __journal__ for further information:
    http://sub-domain.domain.tld/some/path/to/resource


II. Installation Instructions
-----------------------------
To install SelecT, first ensure the Java Runtime Environment (JRE) is installed
on your machine.  Second, download the software from the git repository as
follows:
    git clone https://github.com/ridgelab/SelecT.git


III. Usage Instructions and Examples
-------------------------------------
Instructions are created for use on a high-performance computing cluster.
Modifications for individual setup may be necessary. The pipeline has been
divided into three phases.

Please note, a log file will be created in the current working directory when
any part of SelecT is run. Should the same part of SelecT be run again in the
same directory, a number (first `1', then `2', etc.) will be appended to the new
logfile name so as to avoid collisions.


--------------------------------
| PHASE 1 -- Environment Setup |
--------------------------------

Required Positional Arguments:
[1]    Data Directory        Directory should contain all phased VCF or
                HAP/LEGEND file required for selection analysis. File names must
                contain proper flags and file extensions. Include optional (but
                highly recommended) Ancestral data embedded in VCF files or as
                separate LEGEND/EMF file

[2]    Map Directory        Directory that contains all required genetic map
                files for SelecT analysis. File names must contain proper
                chromosome flags.

[3]    Start Chromosome    Must be a number between 1-22; sex chromosomes not
                yet supported.

[4]    End Chromosome        Must be a number between 1-22 and greater or equal
                to Start Chromosome; sex chromosomes not yet supported.

[5]    Target Population    Population identifier for experimental population
                TST can be used if no standard indentifier exists

[6]    Cross Population    Population identifier for cross population TST can
                be used if no standard indentifier exists. Cross Population
                cannont be the same as Target Population.

Optional Arguments:
--out_pop    Outgroup Population    Population identifier for outgroup
                    population. TST can be used if no standard indentifier
                    exists. Outgroup Population cannont be the same as Target
                    Population.

--working_dir    Working Directory    Defines the directory where SelecT will
                    create a new working directory. Default is current
                    directory.

--win_size    Window Size        For changing SelecT analysis window size (in
                    megabases. Default is 0.5Mb.

Examples:
java -Xmx[MB]m -jar EnviSetup.jar [1] [2] [3] [4] [5] [6] \
        --working_dir=path/to/directory
java -Xmx3000m -jar EnviSetup.jar example/haplegend_data example/map 21 21 CEU YRI \
        --working_dir=example


-----------------------------------
| PHASE 2 -- Calculate Statistics |
-----------------------------------

Required Positional Arguments:
[1]    Working Directory    SelecT working directory created in Phase 1.
                Working Directory name can be changed but subdirectory names
                must be unchanged.

[2]    Simulation Directory    Directory where simulations can be found.
                Must contain simulation file neutral_simulation.tsv and
                selection_simulation.tsv. These can be found here:
                https://github.com/ridgelab/SelecT/tree/master/example/sim

[3]    Chromosome        Chromosome number where window can be found

[4]    Window Number        Window index number as defined by SelecT evironment
                setup. See SelecT_workspace/envi_files/all_wins for window
                ranges.

Optional Arguments:
-inon        Non-absolute iHS    Runs iHS score probabilities where large
             negative scores ONLY are associated with selection (replicate
             CMS_local). Defualt is large positive AND negative iHS scores
             equate to greater selection (Voight).

--prior_prob    Prior Probability    Set Prior Probability to a custom value
                    between 0.0 and 1. Defaults to (1 / actual number of
                    variants within window).

-pp        Prior Probability Flag    Sets Prior Probability to 1/10,000
                    (replicate CMS_local).

--daf_cutoff    DAF Cutoff        Defines the derived dllele frequency cutoff
                    for compose score. Defaults to a DAF value of 0.2. Special
                    case: DAF value of 0.0 indicates incomplete MoP score
                    calculation, PoP is unchanged.

Exmaples:
java -Xmx[MB]m -jar StatsCalc.jar [1] [2] [3] [4]
java -jar StatsCalc.jar example/SelecT_workspace example/sim 21 2

Note, this could be run by hand on a local machine, but we imagine automating
the process a bit. A simple example SLURM script is provided (`selection.slurm')
for running StatsCalc.jar for a single window.  A possible bash script for
submitting this SLURM script for each window to a high-performance computing
cluster running SLURM might look like this:

    #! /bin/bash
    
    chromosome=21

    for window in {0..3}
    do
        sbatch selection.slurm $chromosome $window
    done

    exit 0


-----------------------------------
| PHASE 3 -- Analyze Significance |
-----------------------------------

Required Positional Arguments:
[1]     Working Directory    SelecT working directory created in Phase 1.
                Working Directory name can be changed but subdirectory names
                must be unchanged.

[2]    Chromosome        Chromosome number where window can be found

Optional Arguments:
-co        Combine Only    Only runs first half of analysis where windows are
                combined into one file.

--combine_fltr    Combine Filter    Uses a specific filter for printing specific
                combination of stats in output. Can only be used in conjunction
                with the -co flag i=iHS, x=XPEHH, h=iHH, dd=dDAF, d=DAF, f=Fst,
                up=unstd_PoP, um=unstd_MoP, p=PoP, m=MoP. Each tag should be
                separated by a colon (i.e. i:x:h:dd:d:f:up:um:p:m).

--p_value    p_Value        Sets the p-value cutoff for significance check on
                composite scores. Defaults to 0.01

-wc        Write Combine    Similar to -co, but also runs significance filtering

-rn        Normalization    Runs normalization step across the entire
                dataset/chromosome. Normalizes by standardization (mean 0;
                standard deviation 1).

-ui        Use Incomplete    Use incomplete data when analyzing MoP scores

-im        Ignore MoP    Ignores all MoP scores and finds significance based
                upon PoP only.

-ip        Ignore PoP     Ignores all PoP scores and finds significance based
                upon MoP only. If both -im and -ip flags are present
                significance is found by looking at either PoP or MoP. If
                neither -im or -ip flag is present significance is found by
                looking at both PoP and MoP.

Exmaples:
java -Xmx[MB]m -jar SignificanceAnalyzer.jar [1] [2]
java -jar SignificanceAnalyzer.jar example/SelecT_workspace 21


IV. Funding and Acknowledgements
-------------------------------
Funding for the research and production of this software was provided by
startup funds to Perry Ridge, Ph.D.


V. Contact
-----------
For questions, comments, concerns, feature requests, suggestions, etc., please
contact:

Hayden Smith -- smithinformatics@gmail.com
Pery Ridge, Ph.D. -- perry.ridge@byu.edu

Note: For usage questions, please consult section `III. Usage Instructions and
Examples' first.