Indirect GWAS

Indirect GWAS is a Rust program for computing genome-wide association study results indirectly. Unlike traditional methods, indirect GWAS generates GWAS summary statistics for a phenotype definition using only other summary statistics. To do so, we approximate a target phenotype using phenotypes for which GWAS summary statistics are already available.

As an example, indirect GWAS allows you to compute GWAS summary statistics for phecodes using only summary statistics about ICD-10 codes.

Traditional approach:

Define phenotype in terms of clinically-observed features
Evaluate phenotype for every individual
Perform GWAS

Indirect approach:

Define phenotype in terms of features that have available GWAS summary statistics (using e.g. Pan-UKBB summary statistics)
Compute GWAS summary statistics for the target using feature summary statistics as inputs

Installation

If cargo is not installed, see cargo installation.

cargo install --git https://github.com/tatonetti-lab/indirect-gwas

Usage

Indirect GWAS is a command line tool.

As an example,

igwas \
    -p projection.tsv \
    -c covariance.tsv \
    -g plink*.glm.linear \
    -o indirect_results.csv

To see a full list of parameters, run

igwas -h

Indirect GWAS takes four main arguments:

Projection matrix
Covariance matrix
GWAS result files
Output path

Each of these is a path in the filesystem.

Projection matrix

This should be a CSV/TSV file with row and column names. The first column of the first row is ignored. For example:

rowid,proj1,proj2
feat1,0.1,0.2
feat2,0.2,-0.511119

The contents of this file should give the coefficients needed to project feature phenotypes onto the projected phenotypes. In the example above, proj1 is a projection defined as 0.1 * feat1 + 0.2 * feat2. Many projections can be passed simultaneously in this file.

Covariance matrix

This should be a CSV/TSV file with row and column names. The first column of the first row is ignored. The row and column names should match, otherwise. For example:

_,feat1,feat2
feat1,0.1,0.1
feat2,0.1,0.5

The contents of this file should give the partial covariances of the feature phenotype. Partial covariance is defined as the covariance of the residuals of the phenotypes when regressed against the GWAS covariates. For example, if each GWAS regression takes the form phenotype ~ genotype + covar_1 + covar_2, you should regress phenotype ~ covar_1 + covar_2, compute the residuals, do this for every phenotype, then compute the covariance matrix of these residuals.

GWAS results

GWAS results should be formatted as CSV/TSV files. These files should contain, at minimum, columns with the following pieces of information: variant ID, coefficient estimate, standard error, and sample size. The column names may be specified with additional flags (e.g. --variant-id, --beta, etc.). The default field names correspond to the outputs of Plink linear regressions.

Output path

This should be a simple path to a single file. This file will contains GWAS summary statistics for all the projected phenotypes, combined.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
python/igwas		python/igwas
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indirect GWAS

Installation

Usage

Projection matrix

Covariance matrix

GWAS results

Output path

About

Releases 1

Languages

License

tatonetti-lab/indirect-gwas

Folders and files

Latest commit

History

Repository files navigation

Indirect GWAS

Installation

Usage

Projection matrix

Covariance matrix

GWAS results

Output path

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages