Skip to content

Commit

Permalink
add
Browse files Browse the repository at this point in the history
  • Loading branch information
rickyota committed Nov 23, 2023
2 parents cd9897b + de7d9ec commit aeee663
Show file tree
Hide file tree
Showing 64 changed files with 3,679 additions and 1,509 deletions.
112 changes: 62 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# GenoBoost v0.4.0
# GenoBoost v1.0.3

[![GenoBoost](https://github.com/rickyota/genoboost/actions/workflows/genoboost.yml/badge.svg)](https://github.com/rickyota/genoboost/actions/workflows/genoboost.yml)
[![Release](https://github.com/rickyota/genoboost/actions/workflows/publish.yml/badge.svg)](https://github.com/rickyota/genoboost/actions/workflows/publish.yml)
Expand All @@ -13,35 +13,37 @@ $ genoboost train \
--dir ./result \
--file-genot ./example/genot \
--file-phe ./example/genot.cov \
--cov age,sex
--cov age,sex \
--major-a2-train
```

## Table of Contents

- [GenoBoost v0.4.0](#genoboost-v040)
- [Getting Started](#getting-started)
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Users' Guide](#users-guide)
- [Installation](#installation)
- [Plink1 Input](#plink1-input)
- [Plink2 Input](#plink2-input)
- [Advanced Install](#advanced-install)
- [Train GenoBoost Model](#train-genoboost-model)
- [Simplest Usage](#simplest-usage)
- [Without Validation](#without-validation)
- [Input Plink2](#input-plink2)
- [Cross-validation](#cross-validation)
- [ Options for Training](#-options-for-training)
- [ Calculate Sample Scores](#-calculate-sample-scores)
- [Simplest Usage](#simplest-usage-1)
- [Without Validation](#without-validation-1)
- [Input Plink2](#input-plink2-1)
- [Cross-validation](#cross-validation-1)
- [ Options for Score](#-options-for-score)
- [Advanced Guide](#advanced-guide)
- [Docker](#docker)
- [Singularity](#singularity)
- [Getting Started](#getting-started)
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Users' Guide](#users-guide)
- [Installation](#installation)
- [Plink1 Input](#plink1-input)
- [Plink2 Input](#plink2-input)
- [Advanced Install](#advanced-install)
- [Train GenoBoost Model](#train-genoboost-model)
- [Simplest Usage](#simplest-usage)
- [Without Validation](#without-validation)
- [Input Plink2](#input-plink2)
- [Cross-validation](#cross-validation)
- [ Options for Training](#-options-for-training)
- [ Calculate Sample Scores](#-calculate-sample-scores)
- [Simplest Usage](#simplest-usage-1)
- [Without Validation](#without-validation-1)
- [Input Plink2](#input-plink2-1)
- [Cross-validation](#cross-validation-1)
- [ Options for Score](#-options-for-score)
- [Advanced Guide](#advanced-guide)
- [Advanced Installation](#advanced-installation)
- [Docker](#docker)
- [Singularity](#singularity)
- [Computational Time](#computational-time)

## <a name="introduction"></a>Introduction

Expand Down Expand Up @@ -84,8 +86,7 @@ cargo build --manifest-path ./projects_rust/Cargo.toml --release --bin genoboost
cp ./projects_rust/target/release/genoboost ./genoboost
```

and you can use `genoboost` program.

and you can use `genoboost` program. This should take less than 5 minutes.

#### <a name="install-advanced"></a>Advanced Install

Expand All @@ -97,7 +98,6 @@ GenoBoost returns the SNV weights file with $s_0, s_1, s_2$ for each SNV in one

<img src='readme/img/wgt.png' width=800>


#### <a name="train-simple"></a>Simplest Usage

You can run GenoBoost at least with plink1 genotype files and, in most cases, a covariates file.
Expand All @@ -109,13 +109,15 @@ See `./example/` for reference of file format. For example, the covariates file
With the minimum options, GenoBoost produces SNV weights list with the best parameter.
SNV weights list is computed from randomly extracted 80% training samples, and the best parameter is determined in the remaining 20% validation samples.
Write the column name to be used in covariates file after `--cov`.
It is important that major allele is set to a2 by `--major-a2-train`since $s_2$ is winsorized. This option is unnecessary if major allele is already set as reference allele in genotype file.

```bash
$ genoboost train \
--dir ./result \
--file-genot ./example/genot \
--file-phe ./example/genot.cov \
--cov age,sex
--cov age,sex \
--major-a2-train
```

#### <a name="train-train-only"></a>Without Validation
Expand All @@ -128,6 +130,7 @@ $ genoboost train \
--file-genot ./example/genot \
--file-phe ./example/genot.cov \
--cov age,sex \
--major-a2-train \
--train-only \
--iter-snv 10000
```
Expand All @@ -147,7 +150,8 @@ $ genoboost train \
--genot-format plink2-vzs \
--file-phe ./example/genot2.phe \
--phe-name PHENO1 \
--cov age,sex
--cov age,sex \
--major-a2-train
```

#### <a name="train-cv"></a>Cross-validation
Expand All @@ -160,6 +164,7 @@ $ genoboost train \
--file-genot ./example/genot \
--file-phe ./example/genot.cov \
--cov age,sex \
--major-a2-train \
--cross-validation 5 \
--seed 51
```
Expand All @@ -184,6 +189,8 @@ $ genoboost train \

`--file-snv [FILE]`: Snv file for training. One line for one SNV id.

`--major-a2-train`: Set major allele as a2 in training dataset.

`--iter-snv [NUMBER]`, `--iter [NUMBER]` : Maximum number of SNVs or iterations for training.

`--learning-rates [NUMBERS]`: Learning rates in space-delimited format. Default value is `"0.5 0.2 0.1 0.05"`.
Expand Down Expand Up @@ -230,7 +237,6 @@ $ genoboost score \
--iters "10 20 50"
```


#### <a name="score-plink2"></a>Input Plink2

Use `--genot-format`, `--file-phe` etc. for plink2 as shown in [training phase](#train-plink2).
Expand All @@ -257,20 +263,19 @@ $ genoboost score \
--cross-validation 5
```


#### <a name="score-option"></a> Options for Score

`--dir <DIR>` : Directory to output score files.

`--dir-wgt [DIR]` : Same directory specified on training.
`--dir-wgt [DIR]` : Same directory specified on training.

`--file-wgt [FILE]` : Use this specific SNV weight file.
`--file-wgt [FILE]` : Use this specific SNV weight file.

`--file-genot <FILE>`: Prefix of a plink1 or plink2 file (.bed, .fam, .bim or .pgen, .psam, .pvar/.pvar.zst should exist).

`--genot-format [FORMAT]`: {`plink`, `plink2`, `plink2-vzs`}. Genotype format. Default is `plink`.

`--file-phe [FILE]`: Covariates file.
`--file-phe [FILE]`: Covariates file.

`--cov [NAMES]`: Covariates names in comma-delimited format. ex. `age,sex,PC1-PC10`.

Expand All @@ -290,34 +295,41 @@ $ genoboost score \

`--verbose`: Let GenoBoost speak more!


## <a name="advanced-guide"></a>Advanced Guide

### <a name="docker"></a>Docker
### <a name="advanced-installation"></a>Advanced Installation

Using docker or singularity is recommended.

Run GenoBoost on an example dataset in `./test/data/1kg_n10000` (1000 samples x 10000 SNVs).
#### <a name="docker"></a>Docker

```bash
$ docker run -td \
-v "$(pwd)/test/data/1kg_n10000":/work/data:ro -v "$(pwd)/result":/work/result \
rickyota/genoboost:latest \
bash ./genoboost.docker.cv.sh
$ docker pull rickyota/genoboost:latest \
$ docker run -it rickyota/genoboost:latest \
train \
--dir ./result \
--file-genot ./example/genot \
--file-phe ./example/genot.cov \
--cov age,sex \
--major-a2-train
```

### <a name="singularity"></a>Singularity
#### <a name="singularity"></a>Singularity

```bash
$ singularity build geno.sif docker://rickyota/genoboost:latest
$ singularity exec \
--bind "$(pwd)/test/data/1kg_n10000":/work/data,"$(pwd)/result":/work/result \
--no-home --pwd /opt/genoboost geno.sif \
bash ./genoboost.docker.cv.sh
$ singularity build genoboost.sif ./docker/genoboost.def
$ singularity run genoboost.sif \
train \
--dir ./result \
--file-genot ./example/genot \
--file-phe ./example/genot.cov \
--cov age,sex \
--major-a2-train
```

Result files are now in `./result/` .
### <a name="computational-time"></a>Computational Time

For ~216 thousands training samples and ~1.1 million SNVs for 10,000 unique SNVs, GenoBoost would take 10 hours.

[release]: https://github.com/rickyota/genoboost/releases
[rust-install]: https://www.rust-lang.org/tools/install
12 changes: 3 additions & 9 deletions create.publish.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,31 +16,25 @@ d_publish="./${artifact_name}/"

cargo build \
--release \
--target=${target} \
--target=${target} \
--manifest-path ./projects_rust/Cargo.toml \
--no-default-features \
--bin genoboost

#cargo build --manifest-path ./projects_rust/Cargo.toml \
# --release --target=${target} \
# --bin genoboost

mkdir -p ${d_publish}
if [[ ${target} == *"windows"* ]]; then
cp ./projects_rust/target/${target}/release/genoboost.exe ${d_publish}/
cp ./projects_rust/target/${target}/release/genoboost.exe ${d_publish}/
else
cp ./projects_rust/target/${target}/release/genoboost ${d_publish}/
cp ./projects_rust/target/${target}/release/genoboost ${d_publish}/
fi

mkdir -p ${d_publish}/example/
cp ./example/* ${d_publish}/example/


zip -r ./${artifact_name}.zip ${d_publish}

#if [[ ${target} == *"windows"* ]]; then
# tar -cvzf ./${artifact_name}.zip ${d_publish}
#else
# zip -r ./${artifact_name}.zip ${d_publish}
#fi

12 changes: 5 additions & 7 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
FROM --platform=linux/amd64 rust:1.68 AS builder

FROM --platform=linux/amd64 rust:1.74 AS builder
# error. old?
#FROM --platform=linux/amd64 rust:1.68 AS builder

RUN apt-get update && \
apt-get install -y --no-install-recommends \
Expand All @@ -7,28 +10,23 @@ RUN apt-get update && \
WORKDIR /opt/genoboost
COPY ../ .

#export RUSTFLAGS='-C target-cpu=native'
RUN RUSTFLAGS='-C target-cpu=native' \
cargo build \
--release \
--verbose \
--manifest-path ./projects_rust/Cargo.toml \
--bin genoboost

#CMD ["./projects_rust/target/genoboost"]

FROM --platform=linux/amd64 debian:bookworm-slim AS runner
#FROM --platform=linux/amd64 debian:bullseye-stable AS runner
#FROM --platform=linux/amd64 debian:buster-slim AS runner

RUN apt-get update && \
apt-get install -y --no-install-recommends \
libgomp1

WORKDIR /opt/genoboost
# TODO: copy only necessary files
COPY --from=builder /opt/genoboost/projects_rust/target/release/genoboost ./
ENTRYPOINT ["/opt/genoboost/genoboost"]
#ENTRYPOINT ["./genoboost"]

# https://stackoverflow.com/questions/73037618/glibc-incompatibility-on-debian-docker
# https://stackoverflow.com/questions/73037618/glibc-incompatibility-on-debian-docker
6 changes: 6 additions & 0 deletions docker/genoboost.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Bootstrap: docker
From: rickyota/genoboost:latest

%runscript
/opt/genoboost/genoboost "$@"

9 changes: 4 additions & 5 deletions genoboost.cv.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ file_plink="./test/data/1kg_maf0.1_m1k/genot"
# covariate file
file_cov="./test/data/1kg_maf0.1_m1k/genot.cov"


# compile
export RUST_BACKTRACE=full
cargo build --manifest-path ./projects_rust/Cargo.toml --release --bin genoboost
Expand All @@ -24,14 +23,14 @@ cp ./projects_rust/target/release/genoboost ./genoboost
--file-genot "$file_plink" \
--file-phe "$file_cov" \
--cov age,sex \
--cross-validation 5
--cross-validation 5 \
--major-a2-train

# score
./genoboost score \
--dir-score "${dir}/score" \
--dir-wgt "${dir}/train" \
--dir-wgt "${dir}/train" \
--file-genot "$file_plink" \
--file-phe "$file_cov" \
--cov age,sex \
--cov age,sex \
--cross-validation 5

Loading

0 comments on commit aeee663

Please sign in to comment.