Indirect GWAS is a Rust program for computing genome-wide association study results indirectly. Unlike traditional methods, indirect GWAS generates GWAS summary statistics for a phenotype definition using only other summary statistics. To do so, we approximate a target phenotype using phenotypes for which GWAS summary statistics are already available.
As an example, indirect GWAS allows you to compute GWAS summary statistics for phecodes using only summary statistics about ICD-10 codes.
Traditional approach:
- Define phenotype in terms of clinically-observed features
- Evaluate phenotype for every individual
- Perform GWAS
Indirect approach:
- Define phenotype in terms of features that have available GWAS summary statistics (using e.g. Pan-UKBB summary statistics)
- Compute GWAS summary statistics for the target using feature summary statistics as inputs
If cargo is not installed, see cargo installation.
cargo install --git https://github.com/tatonetti-lab/indirect-gwas
Indirect GWAS is a command line tool.
As an example,
igwas \
-p projection.tsv \
-c covariance.tsv \
-g plink*.glm.linear \
-o indirect_results.csv
To see a full list of parameters, run
igwas -h
Indirect GWAS takes four main arguments:
- Projection matrix
- Covariance matrix
- GWAS result files
- Output path
Each of these is a path in the filesystem.
This should be a CSV/TSV file with row and column names. The first column of the first row is ignored. For example:
rowid,proj1,proj2
feat1,0.1,0.2
feat2,0.2,-0.511119
The contents of this file should give the coefficients needed to project feature phenotypes onto the projected phenotypes.
In the example above, proj1
is a projection defined as 0.1 * feat1 + 0.2 * feat2
.
Many projections can be passed simultaneously in this file.
This should be a CSV/TSV file with row and column names. The first column of the first row is ignored. The row and column names should match, otherwise. For example:
_,feat1,feat2
feat1,0.1,0.1
feat2,0.1,0.5
The contents of this file should give the partial covariances of the feature phenotype.
Partial covariance is defined as the covariance of the residuals of the phenotypes when regressed against the GWAS covariates.
For example, if each GWAS regression takes the form phenotype ~ genotype + covar_1 + covar_2
, you should regress phenotype ~ covar_1 + covar_2
, compute the residuals, do this for every phenotype, then compute the covariance matrix of these residuals.
GWAS results should be formatted as CSV/TSV files.
These files should contain, at minimum, columns with the following pieces of information: variant ID, coefficient estimate, standard error, and sample size.
The column names may be specified with additional flags (e.g. --variant-id
, --beta
, etc.).
The default field names correspond to the outputs of Plink linear regressions.
This should be a simple path to a single file. This file will contains GWAS summary statistics for all the projected phenotypes, combined.