Skip to content

Latest commit

 

History

History
104 lines (77 loc) · 4.8 KB

README.md

File metadata and controls

104 lines (77 loc) · 4.8 KB

RRSelection

RRSelection: A new simple and efficient software to detect selection region analysis based Variant Call Format

RRSelection: A Llinkage disequilibrium method to detect selection region across population VCF

1) Install


Download


Method1 For linux/Unix and macOS
        git clone https://github.com/BGI-shenzhen/RRSelection.git
	cd RRSelection ;chmod 755 configure; ./configure;
        make;
        mv RRSelection  bin/;    #     [rm *.o]

Note: If fail to link,try to re-install the libraries zlib

Method2 For linux/Unix and macOS

        tar -zxvf  RRSelectionXXX.tar.gz
        cd RRSelectionXXX;
        cd src;
        make ; make clean                            # or [sh make.sh]
        ../bin/RRSelection

Note: If fail to link,try to re-install the libraries zlib

2) Example


see more detailed Usage in the Documentation

    1. Calculate sliding windows mean RR for one or two population,and give out the selection region. also give out the whole genome RR plot figure.
      # 1)  For all samples in one population 
               ./bin/RRSelection   -InVCF SNP.vcf.gz  -OutPut OutPrefix
      # 2)  For same samples in one population
	       ./bin/RRSelection   -InVCF SNP.vcf.gz  -OutPut OutPrefix  -SubGroup  subgroup.list  # subgroup.list is the sample name of this population
      # 3)  For Tow  population
               ./bin/RRSelection   -InVCF SNP.vcf.gz  -OutPut OutPrefix  -SubGroup  subgroup.list  #  PopID : sample name list
    1. see the result [OutPrefix.winRR.gz OutPrefix.selection.gz] and [OutPrefix.png OutPrefix.pdf]. ALso Run the perl script to regain the beautiful picture
         perl     PlotRRSele.pl    -inFile   OutPrefix.winRR.gz  -output OutPrefix

3) Introduction


To detect the selection region is the most important and most common analysis in the population resequencing. Here we introduce a new software :RRSelection, a simple-efficient software to detect the selection region analysis based Variant Call Format. Sliding whole genome windows to calculate every region mean R^2 for one or two population, and pick out the high-chained region (one population) or the region with the greatest difference (two populations), which is regarded the selection region according to the top distribution of measure MeanRR(Z-test Pvalue).

  • Parameter description
        Usage: RRSelection  -InVCF  <in.vcf.gz>  -OutPut <outPrefix>

               -InVCF      <str>     Input SNP VCF Format
               -OutPut     <str>     OutPut sliding stat mean r^2 Result

               -SubGroup   <str>     one/two sub-group Sample List File,-h for more help
               -Windows    <int>     Sliding windows bin (kb),MaxDis between two pairwise SNP[300]
               -Step       <float>   Step ratio(0,1] of windows,1:NoOverlap [0.2]
               -Masked     <int>     Masked windows when the SNP Number too low[10]

               -MAF        <float>   Min minor allele frequency filter [0.05]
               -Het        <float>   Max ratio of het allele filter [0.88]
               -Miss       <float>   Max ratio of miss allele filter [0.25]

               -Pvalue     <float>   T-test Pvalue to pick out selection region[0.005]
               -KeepR                Keep Rscript used to modify and plots

	       -help                 See more help [hewm2008 Beta v0.85]

4) Results


The following is the format of the result output file header . and the Figure is no showed here.

#Chr    Start   End     Mean_r^2_cul    Sum_r^2_cul     Count_cul       Mean_r^2_wild   Sum_r^2_wild    Count_wild      MeanRRDiff(cul-wild)    ZScore  Pvalue
##Group[cul], MeanRR:0.245096   SD:0.0529981    Effective windows Count:30
##Group[wild], MeanRR:0.247118  SD:0.0631814    Effective windows Count:30
##Diff MeanRR[cul-wild], Mean:-0.00202159       SD:0.0276476    Effective windows Count:30
Tu2     0       300000  0.1779  68833.0295      386921  0.1034  60456.6791      584831  0.0745  2.77    0.002822
Tu2     60000   360000  0.0782  12577.9534      160803  0.0734  21013.7645      286430  0.0049  0.25    0.401158
Tu2     120000  420000  0.1223  26006.7348      212590  0.0877  31059.4803      354331  0.0347  1.33    0.092056
...

5) Discussing


######################swimming in the sky and flying in the sea #############################