Merge pull request #45 from GregFa/master

Decoupled QTLplot module
senresearch · Feb 23, 2024 · df69cfd · df69cfd · GregFa · Feb 23, 2024
2 parents 44d252b + f73e94f
commit df69cfd
Show file tree

Hide file tree

Showing 11 changed files with 31 additions and 300 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,3 +8,4 @@ Manifest.toml
 .ipynb_checkpoints
 */.ipynb_checkpoints/*
 
+.vscode/*
diff --git a/Project.toml b/Project.toml
@@ -1,28 +1,24 @@
 name = "FlxQTL"
 uuid = "ede81e08-c6d8-4fe3-94c2-f928d9d678ed"
 authors = ["Hyeonju Kim <hyeonjukm01@gmail.com>"]
-version = "0.3.1"
+version = "0.3.2"
 
 [deps]
-Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
 DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
 Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 LossFunctions = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7"
-PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
 StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
 
 [compat]
-Conda = "1.10"
 DelimitedFiles = "1.6"
 Distributed = "1"
 Distributions = "^0.23, 0.24, 0.25"
 LossFunctions = "~0.11"
-PyPlot = "2.11" 
 StaticArrays = "1.9"
 Statistics = "1"
 StatsBase = "0.33, 0.34"

diff --git a/README.md b/README.md
@@ -5,7 +5,6 @@
 [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://senresearch.github.io/FlxQTL.jl/stable)
 [![CI](https://github.com/senresearch/FlxQTL.jl/actions/workflows/ci.yml/badge.svg)](https://github.com/senresearch/FlxQTL.jl/actions/workflows/ci.yml)
 [![codecov](https://codecov.io/gh/senresearch/FlxQTL.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/senresearch/FlxQTL.jl)
-<!-- [![codecov](https://codecov.io/gh/hkim89/FlxQTL.jl/branch/master/graph/badge.svg?token=wNYkIkfRx1)](https://codecov.io/gh/hkim89/FlxQTL.jl) -->
 
 *FlxQTL.jl* is a package for a multivariate linear mixed model based
 QTL analysis tool that supports incorporating information from trait
@@ -15,9 +14,9 @@ multivariate genome scans, visualization of genome scans, support for
 LOCO (leave-one-chromosome-out), computation of kinship matrices, and
 support for distributed computing.
 
-![1D Genome Scan](image/ex1.png)
+![1D Genome Scan](images/ex1.png)
 
-![2D Genome Scan](image/ex2.jpg)
+![2D Genome Scan](images/ex2.jpg)
 
 The package is written in [Julia](https://www.julialang.org) and
 includes extensive

diff --git a/docs/src/guide/analysis.md b/docs/src/guide/analysis.md
@@ -13,6 +13,7 @@ Use any Julia package able to read data files (`.txt`, `.csv`, etc.).  Julia's b
 Let's try using an example dataset in `FlxQTL`. It is plant data: Arabidopsis thaliana in the `data` folder.  Detailed description on the data can be 
 referred to `README` in the folder.
 
+
 ```julia
 using DelimitedFiles
 
@@ -21,13 +22,13 @@ pheno = readdlm("data/Arabidopsis_fitness.csv",',';skipstart=1); # skip to read
 geno = readdlm("data/Arabidopsis_genotypes.csv",',';skipstart=1); 
 
 markerinfo = readdlm("data/Arabidopsis_markerinfo_1d.csv",',';skipstart=1);
-
 ```
 
 For efficient computation, the normalization of matrices is necessary.  The phenotype matrix labelled as `pheno` here composes of wide range of values 
 from 1.774 to 34.133, so that it is better to narow the range of values in [0,1], [-1,1], or any narrower interval for easy computation.  Note that 
 the dimension of a phenotype matrix should be `the number of traits x the number of individuals`.
 
+
 ```julia
 using Statistics, StatsBase
 Y=convert(Array{Float64,2},pheno'); #convert from transposed one to a Float64 matrix
@@ -39,28 +40,31 @@ Ystd=(Y.-mean(Y,dims=2))./std(Y,dims=2); # sitewise normalization
 
 In the genotype data, `1`, `2` indicate Italian, Swedish parents, respectively. You can rescale the genotypes for efficiency. 
 
+
 ```julia
 geno[geno.==1.0].=0.0;geno[geno.==2.0].=1.0; # or can do geno[geno.==1.0].=-1.0 for only genome scan
-
 ```
+
 For genome scan, we need restructure the standardized genotype matrix combined with marker information.  Note that the genome scan in `FlxQTL` is 
 implemented by CPU parallelization, so we need to add workers (or processes) before the genome scan.  Depending on the computer CPU, one can add as many 
 processes as possible. If your computer has 16 cores, then you can add 15 or little more.  Note that you need to type `@everywhere` followed by `using PackageName` for parallel computing.  The dimension of a genotype (probability) matrix should be 
 `the number of markers x the number of individuals`.
 
+
 ```julia
 using Distributed
 addprocs(4) 
 @everywhere using FlxQTL 
 XX=Markers(markerinfo[:,1],markerinfo[:,2],markerinfo[:,3],geno') # marker names, chromosomes, marker positions, genotypes
-
 ```
+
 - **Julia tip**: Whenever you reload a package, i.e. `using FlxQTL`, you should re-enter `XX=FlxQTL.Markers(markerinfo[:,1],markerinfo[:,2],markerinfo[:,3],geno')` to fresh the struct of array.  If not, your genome scan throws an error.  You should also do with another struct of array in a submodule `QTLplot`, `FlxQTL.layers`.
 
 Optionally, one can generate a trait covariate matrix (Z).  The first column indicates overall mean between the two regions, and 
 the second implies site difference: `-1` for Italy, and `1` for Sweden.
 
-```@repl
+
+```julia
 Z=hcat(ones(6),vcat(-ones(3),ones(3)))
 m,q = size(Z) # check the dimension
 ```
@@ -75,20 +79,23 @@ Note that the shrinkage option is only used for `kinshipMan`, `kinship4way`.
 For the Arabidopsis genotype data, we will use a genetic relatedness matrix using manhattan distance measure, `kinshipMan` with a shrinkage with 
 the LOCO option.
 
+
 ```julia
 Kg = shrinkgLoco(kinshipMan,10,XX)
 ```
+
 For no LOCO option with shrinkage,
 
+
 ```julia
 K = shrinkg(kinshipMan,10,XX.X)
 ```
 
-
 If you have climatic information on your trait data, you can compute the relatedness matrix using one of the above functions, but it is recommended using 
 `kinshipGs`,`kinshipLin`,`kinshipCtr` after normalization.  Since the climatic information is not available, we use an identity matrix.
 
-```@repl
+
+```julia
 using LinearAlgebra
 Kc = Matrix(1.0I,6,6) # 3 years x 2 sites
 ```
@@ -99,6 +106,7 @@ Once all input matrices are ready, we need to proceed the eigen-decomposition to
 For a non-identity climatic relatedness, and a kinship with LOCO, you can do eigen-decomposition simultaneously.  Since we use the identity climatic 
 relatedness, you can use `Matrix(1.0I,6,6)` for a matrix of eigenvectors and `ones(6)` for a vector of eigenvalues.
 
+
 ```julia
 Tg,Λg,Tc,λc = K2Eig(Kg,Kc,true); # the last argument: LOCO::Bool = false (default)
 
@@ -107,63 +115,60 @@ Tg,λg = K2eig(Kg, true) # for eigen decomposition to one kinship with LOCO
 
 For eigen decomposition to one kinship with no LOCO option,
 
+
 ```julia
 T,λ = K2eig(K)
 ```
+
 Now start with 1D genome scan with (or without) LOCO including `Z` or not.  
 For the genome scan with LOCO including `Z`, 
 
+
 ```julia
 LODs,B,est0 = geneScan(1,Tg,Tc,Λg,λc,Ystd,XX,Z,true); # FlxQTL for including Z (trait covariates) or Z=I
 ```
+
 For the genome scan with LOCO excluding `Z`, i.e. an identity matrix, we have two options: a FlxQTL model and a conventional MLMM 
+
+
 ```julia
 LODs,B,est0 = geneScan(1,Tg,Tc,Λg,λc,Ystd,XX,true); # FlxQTL for Z=I 
 
 LODs,B,est0 = geneScan(1,Tg,Λg,Ystd,XX,true); # MLMM
 ```
+
 Note that the first argument in `geneScan` is `cross::Int64`, which indicates a type of genotype or genotype probability.  For instance, if you use a 
 genotype matrix whose entry is one of 0,1,2, type `1`. If you use genotype probability matrices, depending on the number of alleles or genotypes in a marker, one can type the corresponding number. i.e. `4-way cross: 4`, `HS DO mouse: 8 for alleles, 32 for genotypes`, etc.   
 
 For no LOCO option,
 
+
 ```julia
 LODs,B,est0 = geneScan(1,T,Tc,λ,λc,Ystd,XX,Z);
 
 LODs,B,est0 = geneScan(1,T,Tc,λ,λc,Ystd,XX);
 
 LODs,B,est0 = geneScan(1,T,λ,Ystd,XX); # MLMM
 ```
+
 The function `geneScan` has three arguments: `LOD scores (LODs)`, `effects matrix under H1 (B)`, and `parameter estimates under H0 (est0)`, which 
 is an `Array{Any,1}`.  If you want to see null parameter esitmate in chromosome 1 for LOCO option, type `est0[1].B`, `est0[1].loglik`, `est0[1].τ2`, 
 `est0[1].Σ`.   
 In particular, you can extract values from each matrix in `B` (3D array of matrices) to generate an effects plot. To print an effect size matrix for the 
 third marker, type `B[:,:,3]`.
 
-
 ## Generating plots
 
-The `QTLplot` module is currently unavailable but will replaced with [BigRiverQTLPlots.jl](https://github.com/senresearch/BigRiverQTLPlots.jl) soon.
-
-<!-- To produce a plot (or plots) for LOD scores or effects, you need first a struct of arrays, `layers` consisting of chromosomes, marker positions, 
-LOD scores (or effects), which should be `Array{Float64,2}`.  You can then generate one genome scan result or multiple genenome scan results on one plot.  Note that the color is randomly selected to generate a plot.   
-The function `plot1d` has more keyword argument options: `yint=[]` for a vector of y-intercept(s), `yint_color=["red"]` for a vector of y-intercept 
-color(s), `Legend=[]` for multiple graphs, `loc="upper right"` for the location of `Legend`, etc.
-
-```julia
-Arab_lod = layers(markerinfo[:,2],markerinfo[:,3],LODs[:,:]) # LODs is a vector here, so force it to be a matrix
-plot1d(Arab_lod;title= "LOD for Arabidopsis thaliana : Fitness (2 site by 3 year, 6 traits)",ylabel="LOD")
-```
-![arabidopsis](arab-lod.png)
+The `QTLplot` module is currently unavailable but plotting functions will be replaced with [BigRiverQTLPlots.jl](https://github.com/senresearch/BigRiverQTLPlots.jl) soon.
 
-
- -->
+![arabidopsis](images/arab-lod.png)
 
 ## Performing a permutation test
 
 Since the statistical inference for `FlxQTL` relies on LOD scores and LOD scores, the function `permTest` finds thresholds for a type I error.  The first 
 argument is `nperm::Int64` to set the number of permutations for the test. For `Z = I`, type `Matrix(1.0I,6,6)` for the Arabidopsis thaliana data.  In the keyword argument, `pval=[0.05 0.01]` is default to get thresholds of `type I error rates (α)`.  Note that permutation test is implemented by no LOCO option.
 
+
 ```julia
-julia> maxLODs, H1par_perm, cutoff = permTest(1000,1,K,Kc,Ystd,XX,Z;pval=[0.05]) # cutoff at 5 %
+maxLODs, H1par_perm, cutoff = permTest(1000,1,K,Kc,Ystd,XX,Z;pval=[0.05]) # cutoff at 5 %
 ```
diff --git a/docs/src/guide/arab-lod.png → docs/src/guide/images/arab-lod.png b/docs/src/guide/arab-lod.png → docs/src/guide/images/arab-lod.png
diff --git a/docs/src/guide/tutorial.md b/docs/src/guide/tutorial.md
@@ -15,8 +15,6 @@ Or, equivalently,
 ```julia
 julia> using Pkg; Pkg.add("FlxQTL")
 ```
-<!-- Currently Julia `1.5` supports for the package. -->
-
 
 To remove the package from the Julia REPL,
 

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -9,7 +9,6 @@ structured multivariate traits.
 - LOCO (Leave One Chromosome Out) support for genome scan 
 - Computation for Genetic (or Climatic) Relatedness matrix (or kinship) 
 - CPU parallelization
-<!-- - Visualization (1D, 2D plots) -->
 
 ## Guide
 

diff --git a/image/ex1.png → images/ex1.png b/image/ex1.png → images/ex1.png
diff --git a/image/ex2.jpg → images/ex2.jpg b/image/ex2.jpg → images/ex2.jpg
diff --git a/src/FlxQTL.jl b/src/FlxQTL.jl
@@ -12,7 +12,6 @@ module FlxQTL
 
 
 include("MLM.jl")
-# include("QTLplot.jl")
 include("Miscellanea.jl")
 include("GRM.jl")
 include("EcmNestrv.jl")
@@ -25,9 +24,6 @@ using .GRM:kinshipMan,kinship4way,kinshipGs,kinshipLin,kinshipCtr,kinshipStd,shr
 export kinshipMan,kinship4way,kinshipGs,kinshipLin,kinshipCtr,kinshipStd
 export shrinkg,shrinkgLoco,kinshipLoco
 
-# using .QTLplot:layers, plot1d, plot2d, subplot2d
-# export layers, plot1d, plot2d, subplot2d
-
 using .flxMLMM: geneScan,gene2Scan,envScan,permTest,K2eig, K2Eig, obtainKc,gene1Scan,updateKc
 #selectQTL
 export geneScan,gene2Scan,envScan,permTest,K2eig, K2Eig, obtainKc,gene1Scan,updateKc
Original file line number	Diff line number	Diff line change
Expand Up		@@ -8,3 +8,4 @@ Manifest.toml
		.ipynb_checkpoints
		/.ipynb_checkpoints/

		.vscode/*