Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI-based revision using gpt-4o #40

Merged
merged 10 commits into from
Oct 11, 2024
4 changes: 2 additions & 2 deletions content/01.abstract.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

Understanding the genetic basis of complex traits is a longstanding challenge in the field of genomics.
Genome-wide association studies (GWAS) have identified thousands of variant-trait associations, but most of these variants are located in non-coding regions, making the link to biological function elusive.
While traditional approaches, such as Transcriptome-wide association studies (TWAS), have advanced our understanding by linking genetic variants to gene expression, they often overlook gene-gene interactions.
While traditional approaches, such as transcriptome-wide association studies (TWAS), have advanced our understanding by linking genetic variants to gene expression, they often overlook gene-gene interactions.
Here, we review current approaches to integrate different molecular data, leveraging machine learning methods to identify gene modules based on co-expression and functional relationships.
These integrative approaches, like PhenoPLIER, combine TWAS and drug-induced transcriptional profiles to effectively capture biologically meaningful gene networks.
This integration provides a context-specific understanding of disease proceses while highlighting both core and peripheral genes.
This integration provides a context-specific understanding of disease processes while highlighting both core and peripheral genes.
These insights pave the way for novel therapeutic targets and enhance the interpretability of genetic studies in personalized medicine.
6 changes: 3 additions & 3 deletions content/02.introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@

Understanding the genetics of complex traits remains one of the key challenges in modern genomics.
GWAS have significantly advanced our knowledge by identifying associations between genetic variants and traits, shedding light on the genetic architecture underlying a variety of diseases and phenotypic conditions.
Since the first GWAS on age-related macular degeneration in 2005, the GWAS catalog has grown to include over 5,000 human traits, encompassing risk loci for conditions such as type 2 diabetes, schizophrenia, and rheumatoid arthritis [@doi:10.1093/nar/gkac1010, @doi:10.1038/s41576-019-0127-1, @doi:10.1038/s41588-022-01213-w].
Since the first GWAS on age-related macular degeneration in 2005, the GWAS catalog has grown to include over 5,000 human traits, encompassing risk loci for conditions such as type 2 diabetes, schizophrenia, and rheumatoid arthritis [@doi:10.1093/nar/gkac1010; @doi:10.1038/s41576-019-0127-1; @doi:10.1038/s41588-022-01213-w].
However, most GWAS-identified variants reside in non-coding regions, underscoring the importance of gene regulation in generating phenotypic diversity [@doi:10.1093/hmg/ddac198; @doi:10.1126/science.aaz1776].
This gap presents a major challenge in translating GWAS findings into biological insights.

To address this limitation, numerous methods have been developed to link GWAS variants to functional outcomes.
Proximity-based methods, such as MAGMA, have been widely used to link GWAS variants to nearby genes based on physical distance [@doi:10.1371/journal.pcbi.1004219].
However, these methods may not consider long-range acting SNPs, such as the variants in the *FTO* locus that are associated with obesity, which have been shown to influence the *IRX3* gene through long-range interactions [@doi:10.1038/nature13138].
While useful, proximity-based approaches may miss causal genes if they are not the closest ones.
TWAS have proven effective by linking genetic variants to gene expression, thereby enhancing our ability to identify putatively causal genes for various traits [@doi:10.3389/fgene.2021.713230, @doi:10.1038/s41588-019-0385-z].
TWAS has proven effective by linking genetic variants to gene expression, thereby enhancing our ability to identify putatively causal genes for various traits [@doi:10.3389/fgene.2021.713230, @doi:10.1038/s41588-019-0385-z].
TWAS employs expression quantitative trait loci (eQTL) data to prioritize genes that are influenced by GWAS variants, offering a more mechanistic link between genetic variation and phenotypic changes.
Nevertheless, TWAS and similar methods typically focus on individual genes or variants, overlooking the complex interactions among genes that are increasingly recognized as vital to understanding complex traits [@doi:10.1016/j.cell.2017.05.038].

The omnigenic model, introduced by Boyle et al.(2017) [@doi:10.1016/j.cell.2017.05.038], proposes that the genetic architecture of complex traits involves highly interconnected gene networks.
The omnigenic model, introduced by Boyle et al. (2017) [@doi:10.1016/j.cell.2017.05.038], proposes that the genetic architecture of complex traits involves highly interconnected gene networks.
According to this model, core genes directly contribute to a trait, while peripheral genes modulate these core genes through regulatory networks.
This conceptual shift emphasizes the need for methodologies capable of capturing polygenic and network-based interactions inherent in complex diseases.
Machine learning approaches, particularly those that leverage gene co-expression patterns, are well-suited to this task.
Expand Down
36 changes: 18 additions & 18 deletions content/03.single-variant-single-gene-approaches.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,28 @@ Histone acetylome-wide association studies (HAWAS) maps histone acetylation modi

### From genetic variants to traits: genome-wide association study

In a simple trait, a single gene can be responsible for the disease (monogenic), for example in the case of Sickle Cell Anemia and Huntington's disease.
In a simple trait, a single gene can be responsible for the disease (monogenic), for example, in the case of Sickle Cell Anemia and Huntington's disease.
However, most traits are not so simplistic and are the result of many mutations across the genome (polygenic).
Even monogenic traits can sometimes be influenced by the polygenic background [@doi:10.1038/s41467-020-17374-3].
GWAS examine the relationship between specific genetic variants and phenotypes by comparing allele frequencies in individuals of similar ancestry with distinct phenotypic traits (**Figure {@fig:fig1}**).
Both causal and non-causal variants are found to be significant and due to the phenomena of linkage disequilibrium, the results of GWAS are often grouped into risk loci.
Both causal and non-causal variants are found to be significant, and due to the phenomenon of linkage disequilibrium, the results of GWAS are often grouped into risk loci.
These loci represent clusters of variants that demonstrate significant associations, although not every variant may be causal [@doi:10.1186/s13059-022-02697-9].

Since the first GWAS on age-related macular degeneration (AMD) in 2005, the GWAS catalog has rapidly expanded, containing SNP-trait associations across >5,000 human traits [@doi:10.1093/nar/gkac1010].
These studies have successfully identified risk loci for a variety of traits such as type 2 diabetes, auto-immune disease, schizophrenia, major depressive disorder, and many more [@doi:10.1038/s41576-019-0127-1; @doi:10.1016/j.ajhg.2017.06.005].
These studies have successfully identified risk loci for a variety of traits such as type 2 diabetes, autoimmune disease, schizophrenia, major depressive disorder, and many more [@doi:10.1038/s41576-019-0127-1; @doi:10.1016/j.ajhg.2017.06.005].
As GWAS participants increase, more loci are able to be identified.
For example, a study investigating insomnia with a sample size of >1M identified 202 risk loci [@doi:10.1038/s41588-018-0333-3], compared to earlier studies with a sample size of ~110,00, which were only able to identify 3 risk loci [@doi:10.1038/ng.3888].
For example, a study investigating insomnia with a sample size of >1M identified 202 risk loci [@doi:10.1038/s41588-018-0333-3], compared to earlier studies with a sample size of ~110,000, which were only able to identify 3 risk loci [@doi:10.1038/ng.3888].
Beyond the identification of risk loci, GWAS have also led to the discovery of novel biological mechanisms in Crohn’s disease [@doi:10.1038/ng1954; @doi:10.1038/nature13044], rheumatoid arthritis [@doi:10.1038/s41588-022-01213-w], and more.

Although GWAS have identified numerous genetic variants associated with complex traits, translating these findings into biological insights remains challenging [@doi:10.1038/s41576-019-0127-1].
For example, genes can have epistatic interactions where a secondary loci affects a primary locus.
For example, genes can have epistatic interactions where a secondary locus affects a primary locus.
The peripheral effects of the secondary loci are often too small to be picked up through GWAS alone [@doi:10.1016/j.tig.2011.05.007].
Furthermore, any significant GWAS variants are located in non-coding regions, making it difficult to identify the specific genes that drive trait associations [@doi:10.1093/hmg/ddac198].
Furthermore, many significant GWAS variants are located in non-coding regions, making it difficult to identify the specific genes that drive trait associations [@doi:10.1093/hmg/ddac198].
A simple strategy to link GWAS-associated variants with a gene is by physical proximity, which typically selects the closest gene to the most significant SNP at each locus.
Approaches like MAGMA [@doi:10.1371/journal.pcbi.1004219] are based on proximity to compute a gene-trait association.
However, SNPs can have distal effects on the expression of the target gene [@doi:10.1126/science.aaz1776; @doi:10.1126/science.1222794; @doi:10.1038/nature13138; @doi:10.1073/pnas.112212199].

Recent approaches combine multi-omics data from various cell types and tissues with GWAS to identify potential mechanisms of SNPs and the associated gene through molecular quantitative trait loci (molQTLs) [@doi:10.1016/j.molmed.2019.10.004] (Figure {@fig:fig1}).
Recent approaches combine multi-omics data from various cell types and tissues with GWAS to identify potential mechanisms of SNPs and the associated genes through molecular quantitative trait loci (molQTLs) [@doi:10.1016/j.molmed.2019.10.004] (Figure {@fig:fig1}).
For example, an expression quantitative trait locus (eQTL) is a genetic region associated with the expression levels of a nearby or distant gene [@doi:10.1016/j.jgg.2023.05.003].
eQTL studies estimate that the majority of heritability is explained by the combination of multiple weak *trans*-eQTLs [@doi:10.1038/s41588-021-00913-z].

Expand All @@ -42,18 +42,18 @@ This results in GWAS being underpowered to detect all heritability explained by

### From GWAS to gene: transcriptome-wide association studies

TWAS integrate GWAS with gene expression data (**Figure {@fig:fig1}**) from eQTL analysis, to prioritize genes whose expression across different tissues is influenced by GWAS variants [@doi:10.3389/fgene.2021.713230; @doi:10.1038/s41588-019-0385-z].
TWAS integrates GWAS with gene expression data (**Figure {@fig:fig1}**) from eQTL analysis to prioritize genes whose expression across different tissues is influenced by GWAS variants [@doi:10.3389/fgene.2021.713230; @doi:10.1038/s41588-019-0385-z].
By leveraging predicted gene expression levels, TWAS provides a mechanistic link between genetic variants and traits, allowing researchers to move beyond associations with individual SNPs to identify putatively causal genes [@doi:10.1038/s41588-019-0385-z].
Since TWAS models the genetic regulation of gene expression, this approach enables researchers to impute expression levels in GWAS cohorts where expression data may not be available.
A key advantage of TWAS over GWAS lies in its ability to increase interpretability by providing a gene-trait association: TWAS connects trait-associated SNPs (which are mostly non-coding) to genes, which are biologically functional units.

Several TWAS approaches have been introduced [@doi:10.1038/s42003-023-05279-y], including PrediXcan [@doi:10.1038/ng.3367], FUSION [@doi:10.1038/ng.3506] and TIGAR [@doi:10.1016/j.ajhg.2019.05.018].
However, all of them implement a similar framework that consists in three steps: 1) model training, 2) gene expression imputation, and 3) gene-trait association.
For example, during 1), PrediXcan, builds one expression prediction model per gene and tissue using penalized linear regression with ElasticNet to model sparse genetic architectures.
These models contain weights for each SNP used as predictor for a gene expression in a tissue.
Given genotype data in a cohort without measured gene expression, during 2), the SNPs weights from the models can be used to impute tissue-specific gene expression for individuals.
During 3), a gene-tissue-trait association is computed by correlating the tissue-specific imputed gene expression with the trait of interest.
Most methods (such as Summary-PrediXcan [@doi:10.1038/s41467-018-03621-1] or S-PrediXcan), however, offer a shortcut by computing a gene-tissue-trait association directly from GWAS summary statistics without the need of individual-level data.
Several TWAS approaches have been introduced [@doi:10.1038/s42003-023-05279-y], including PrediXcan [@doi:10.1038/ng.3367], FUSION [@doi:10.1038/ng.3506], and TIGAR [@doi:10.1016/j.ajhg.2019.05.018].
However, all of them implement a similar framework that consists of three steps: 1) model training, 2) gene expression imputation, and 3) gene-trait association.
For example, during step 1, PrediXcan builds one expression prediction model per gene and tissue using penalized linear regression with ElasticNet to model sparse genetic architectures.
These models contain weights for each SNP used as a predictor for gene expression in a tissue.
Given genotype data in a cohort without measured gene expression, during step 2, the SNP weights from the models can be used to impute tissue-specific gene expression for individuals.
During step 3, a gene-tissue-trait association is computed by correlating the tissue-specific imputed gene expression with the trait of interest.
Most methods (such as Summary-PrediXcan [@doi:10.1038/s41467-018-03621-1] or S-PrediXcan), however, offer a shortcut by computing a gene-tissue-trait association directly from GWAS summary statistics without the need for individual-level data.
This process, however, requires the user to select a tissue of interest, which might not be straightforward [@doi:10.1038/s41588-019-0385-z].
To address this limitation, approaches such as MultiXcan [@doi:10.1371/journal.pgen.1007889] or UTMOST [@doi:10.1038/s41588-019-0345-7] combine information across tissues to generate a gene-trait association.
These multi-tissue approaches are generally more powerful than single-tissue ones, although they do not provide a direction of effect (i.e., whether a higher or lower predicted expression is associated with a higher or lower disease risk).
Expand All @@ -70,15 +70,15 @@ The aggregate effects of these variants on each gene are then assessed, followed
For example, PWAS confirmed the association of *MUTYH* with colorectal cancer using UK Biobank data, even when no individual variant reached genome-wide significance in GWAS, highlighting its ability to uncover functional associations missed by other methods.
However, PWAS relies on high-quality proteomic data, may miss non-coding variant associations, and is computationally intensive.

Epigenome-wide association studies (EWAS) encompass methodologies such as methylome-wide association studies (MWAS) (**Figure {@fig:fig1}**) and histone acetylome-wide association studies (HAWAS) (**Figure {@fig:fig1}**), based on haQTLs and mQLTs respectability, which focus on specific types of epigenetic modifications to reveal their roles in gene regulation and disease etiology [@doi:10.1186/s13148-021-01200-8].
Epigenome-wide association studies (EWAS) encompass methodologies such as methylome-wide association studies (MWAS) (**Figure {@fig:fig1}**) and histone acetylome-wide association studies (HAWAS) (**Figure {@fig:fig1}**), based on haQTLs and mQTLs, respectively, which focus on specific types of epigenetic modifications to reveal their roles in gene regulation and disease etiology [@doi:10.1186/s13148-021-01200-8].
MWAS targets DNA methylation patterns, identifying loci where methylation changes are linked to particular traits or diseases.
For instance, a study by Shen et al. (2022) demonstrated a causal relationship between DNA methylation and depression, indicating that epigenetic modifications may mediate genetic risk for psychiatric disorders [@doi:10.1186/s13073-022-01039-5].
HAWAS focuses on histone acetylation modifications, which are crucial for regulating chromatin structure and gene expression.
Del Rosario et al. (2022) conducted a HAWAS identifying over 2,000 differentially acetylated loci in immune cells from *Mycobacterium tuberculosis* infected individuals, linking these changes to gene expression and potassium channel genes like *KCNJ15* in modulating apoptosis and Mtb clearance.
Del Rosario et al. (2022) conducted a HAWAS identifying over 2,000 differentially acetylated loci in immune cells from *Mycobacterium tuberculosis*-infected individuals, linking these changes to gene expression and potassium channel genes like *KCNJ15* in modulating apoptosis and Mtb clearance.
haQTL analysis further revealed variants associated with immune phenotypes, complementing GWAS findings and enhancing understanding of disease mechanisms [@doi:10.1038/s41564-021-01049-w].
The tissue-specific nature of epigenetic modifications, alongside the capacity to capture cellular plasticity and environmental influences, enhances our insight into the effects of genetic variants across distinct tissues, temporal cellular states, and gene-environment interactions [@doi:10.3390/cells9112424; @doi:10.1016/j.placenta.2007.09.011].

Despite the advancements facilitated by single variant and single gene methodologies, a common thread persists: the focus remains on one gene at a time.
Despite the advancements facilitated by single-variant and single-gene methodologies, a common thread persists: the focus remains on one gene at a time.
The underlying expectation is that identifying a gene linked to a trait will directly unveil the biological mechanisms driving disease processes.
While this has been successful in certain monogenic disorders, complex traits often involve intricate interactions among multiple genes and environmental factors.
Consequently, the one-gene-at-a-time paradigm may oversimplify the multifaceted nature of these traits.
Expand Down
Loading
Loading