-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
executable file
·209 lines (175 loc) · 13.2 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
title: "Data integration in inflammatory bowel disease"
author: "Lluís Revilla Sancho"
date: "`r Sys.Date()`"
documentclass: book
knit: "bookdown::render_book"
site: bookdown::bookdown_site
github-repo: "llrs/thesis"
url: 'https\://thesis.llrs.dev/'
csl: style/bmc_bioinformatics.csl
book_filename: data_integration_IBD
split_by: chapter
twitter-handle: Lluis_Revilla
# Pandoc options
link-citations: true
always_allow_html: true
colorlinks: yes
# https://bookdown.org/yihui/rmarkdown-cookbook/latex-variables.html
# Only works here (set when printing and sending)
# links-as-notes: true # Only activate for actual printing
fontfamily: libertine
# papersize: b5 # The printed size of the thesis
papersize: a4
fontsize: 12pt
# Not needed https://tex.stackexchange.com/a/188337/178206
# fontenc: utf8
acronyms:
loa_title: ""
insert_loa: false
sorting: usage
include_unused: false
fromfile: ./style/acronyms.yml
geometry:
- top=20mm
- bottom=20mm
- left=20mm
- right=20mm
- bindingoffset=6.8mm
- asymmetric
classoption:
- twoside
- openright
lot: yes
lof: yes
bibliography:
- references.bib
- packages.bib
---
<!--# TODO add the first two pages with the formal requirements of the faculty https://github.com/llrs/thesis/issues/13 -->
```{r setup, echo=FALSE, include=FALSE}
Sys.setlocale('LC_ALL','C')
knitr::opts_chunk$set(
echo = FALSE,
out.width = "100%",
fig.retina = 2
)
library("knitr")
```
<!--# TODO Fix metathis to provide good media coverage -->
```{r metathis, cache=FALSE, include=FALSE, eval=knitr::is_html_output(excludes = "epub")}
library("metathis")
a <- meta() %>%
meta_description(
"Thesis searching the relationship between microbiome and transcriptome in the intestine on patients with IBD."
) %>%
meta_name("github-repo" = "llrs/thesis") %>%
meta_viewport() %>%
meta_social(
title = "Data integration in inflammatory bowel disease",
url = "https://thesis.llrs.dev",
image = "https://thesis.llrs.dev/images/cover.png",
image_alt = c("The cover of the thesis showing the title, and author.",
"Below a drawing of a colon with microbiome on the ascending colon, numbers in a the transversal segment and some epithelium with different smoothness, thickness and colors (red, orange and brown).",
"At the bottom doctoral thesis."),
og_type = "book",
og_author = c("Lluís Revilla Sancho"),
twitter_card_type = "summary",
twitter_creator = "@Lluis_Revilla"
)
write_meta(a, path = "style/meta.html")
```
```{r tables-formatting}
library("kableExtra")
options(knitr.kable.NA = "")
format_design <- function(x, data, out = NULL){
dims <- dim(data)
nrow <- nrow(data)
ncol <- ncol(data) + as.numeric(!all(rownames(data) == seq_len(nrow(data))))
k <- kable_styling(x, latex_options = c("striped", "HOLD_position", out),
full_width = FALSE)
k <- column_spec(k, column = 1, border_left = FALSE, bold = TRUE, include_thead = FALSE)
row_spec(k, row = 0, bold = TRUE, italic = FALSE)
}
format_kable <- function(x) {
k <- format_design(x)
column_spec(format_design(x), column = 1, border_right = TRUE, bold = TRUE)
}
```
# Preface {.unnumbered}
![UB logo](images/ub_logo.png "University where I defend the thesis"){width="160"} ![IDIBAPS logo](images/idibaps_logo.png "Research center"){width="160"} ![CIBER logo](images/logo_ciber.png "Hiring institution"){width="160" height="38"}
The main topic of the thesis is [data integration](https://en.wikipedia.org/wiki/Data_integration "Wikipedia page of integration") applied to [inflammatory bowel disease](https://en.wikipedia.org/wiki/Inflammatory_bowel_disease "Wikipedia page for IBD") .
This disease is complex; for instance it is not currently known if the cause behind Crohn's disease and ulcerative colitis are one and the same.
There are hypothesis suggesting that the [microbiome](https://en.wikipedia.org/wiki/Microbiota "Wikipedia page for microbiota") is a major factor in the disease, which together with an aberrant immune response, constitutes the dominant theory.
In order to find robust relationships between the microbiome and the immune system, it is important to consider all the relevant variables that influence a disease.
In this thesis we explored these relationships using data from different sequencing technologies and the observed or reported phenotype of the patients.
This thesis was carried out at the [Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS)](https://www.idibaps.org/ "IDIBAPS website") research institute and was funded by the [Centro de Investigación Biomédica en Red (CIBER)](https://www.ciberehd.org/ "CIBER website").
The thesis was conducted at [the IBD unit](https://www.ibd-bcn.org/ "Group's website") which is comprised of a multidisciplinary translational team of biologist, microbiologists, veterinarians, bioinformaticians, doctors and nurses (at [Hospital Clínic](https://www.clinicbarcelona.org/en "Hospital Clínic")).
The leading doctor of the unit was [Julian Panés](https://orcid.org/0000-0002-4971-6902 "Julià website") whose interest in the disease made this thesis possible.
The thesis is on the [doctoral programme in biomedicine](https://www.ub.edu/doctorat_biomedicina/eng/index.htm "Doctoral program's website") at the [University of Barcelona (UB)](https://www.ub.edu/web/portal/en/ "UB website").
My thesis directors' were [Juanjo Lozano](https://orcid.org/0000-0001-7613-3908 "Juanjo's ORCID") and [Azucena Salas](https://orcid.org/0000-0003-4572-2907 "Azu's ORCID"), my direct supervisor, who guided me as a bioinformatician and disease expert, respectively.
They provided advice and guidance on how to analyze the data and on where to focus the different experiments/analyses.
This thesis, available on <https://thesis.llrs.dev>, is licensed under the [Creative Commons Attribution 4.0 International License (CC-BY)](https://creativecommons.org/licenses/by/4.0/ "License summary link").\
!["CC-BY image"](images/by.png "CC-BY image of creative commons")
# Abstracts {.unnumbered}
<!--# Max length of 600 words-->
## English {.unnumbered}
**Introduction**: Inflammatory bowel disease is a complex intestinal disease with several genetic and environmental components that can influence its course.
The etiology and pathophysiology of the disease is not fully understood, although there is some evidence that the microbiome can play a role.
Determining the relationships between the microbiome and host's mucosa could help advance prevention, diagnosis or treatment of the disease.
**Methods**: We based our analysis on intestinal bacterial 16S rRNA and human transcriptome data from biopsies extracted at multiple timepoints from intestinal segments.
We expanded regularized generalized canonical correlation analysis to formulate models that were consistent with previous knowledge of the disease, taking into account all sample information.
Multiple inflammatory bowel disease datasets vis-à-vis different treatments and conditions were analyzed and the models defining those dataset were compared.
The results were compared using multiple co-inertia analysis.
**Results**: Splitting sample variables into different blocks results in models of these relationships that revealed differences in the selected genes and microorganisms.
The models generated using our new method inteRmodel outperformed multiple co-inertia analysis in terms of classifying the samples according to their location.
Despite their use on datasets drawn from different sources, the resulting models showed similar relationships between variables.
**Discussion**:
Comparing multiple models helps delineate relationships within datasets.
Our method determines the strengths of the relationships between the microbiome, transcriptome and environmental variables.
Among the different datasets, the selected genes proved to be shared in common.
This approach is sufficiently robust and flexible to characterize the different datasets and settings.
**Conclusion**: Using inteRmodel we found that the microbiome is more closely related to the sample location than to disease.
In addition, the transcriptome is closely associated with the location of the sample in the intestine.
We determined that there is a common transcriptome between datasets while microorganisms, in contrast, depend upon the dataset.
In summary we can improve sample classification by taking into account both bacterial 16S and the host transcriptome.
## Spanish {.unnumbered}
**Introducción**: La enfermedad inflamatoria intestinal es una enfermedad intestinal compleja con factores genéticos y ambientales que pueden influir en su curso.
La etiología y la fisiopatología de la enfermedad no se conocen por completo.
Existen evidencias que el microbioma puede desempeñar un papel relevante.
Encontrar relaciones entre el microbioma y la mucosa del huésped podría ayudar a avanzar en la prevención, el diagnóstico o el tratamiento.
**Métodos**: Basamos nuestro análisis en el ARNr 16S bacteriano intestinal y en datos de transcriptomas humanos de biopsias de múltiples puntos temporales y segmentos intestinales.
Extendimos el análisis de correlación canónica generalizada regularizado para encontrar modelos coherentes con el conocimiento previo sobre la enfermedad teniendo en cuenta la información de las muestras.
Se analizaron múltiples conjuntos de datos de enfermedad inflamatoria intestinal en diferentes tratamientos y condiciones y se compararon los modelos que definen esos conjuntos de datos.
Los resultados se compararon con análisis de coinercia múltiple.
**Resultados**: Dividir las variables de la muestra en diferentes bloques resulta en modelos de estas relaciones que muestran diferencias en los genes y microorganismos seleccionados.
Los modelos generados con nuestro nuevo método, interRmodel, superaron el análisis de múltiples coinercias para clasificar las muestras según su ubicación.
A pesar de ser utilizados en conjuntos de datos de diferentes fuentes, los modelos resultantes muestran unas relaciones similares entre las variables.
**Discusión**: La comparación de varios modelos ayuda a descubrir las relaciones dentro de los conjuntos de datos.
Nuestro método encuentra cuán fuertes son las relaciones entre el microbioma, el transcriptoma y las variables ambientales.
En diferentes conjuntos de datos, los genes seleccionados eran comunes.
Este enfoque es robusto y flexible para diferentes conjuntos de datos y configuraciones.
**Conclusión**: Con inteRmodel descubrimos que el microbioma se relaciona más estrechamente con la ubicación de la muestra que con la enfermedad, pero el transcriptoma está muy relacionado con la ubicación de la muestra en el intestino.
Hay un transcriptoma común entre los conjuntos de datos, mientras que los microorganismos dependen del conjunto de datos.
Podemos mejorar la clasificación de las muestras teniendo en cuenta tanto el ARNr 16S bacteriano como el transcriptoma del huésped.
## Catalan {.unnumbered}
**Introducció**: La malaltia inflamatòria intestinal és una malaltia intestinal complexa amb diversos factors genètics i ambientals que poden influir en el seu curs.
L'etiologia i fisiopatologia de la malaltia no es conèix del tot.
Hi ha evidències que el microbioma pot tenir un paper rellevant.
Trobar relacions entre el microbioma i la mucosa de l'hoste podria ajudar a avançar en la prevenció, el diagnòstic o el tractament.
**Mètodes**: Vam basar la nostra anàlisi en dades d'ARNr 16S bacteriana intestinal i de transcriptoma humà de biòpsies de múltiples punts de temps i segments intestinals.
Hem ampliat l'anàlisi de correlació canònica generalitzada regularitzada per trobar models coherents amb el coneixement previ sobre la malaltia tenint en compte la informació de les mostres.
Es van analitzar diversos conjunts de dades de malaltia inflamatòria intestinal sobre diferents tractaments i condicions i es van comparar els models que defineixen aquest conjunt de dades.
Els resultats es van comparar amb l'anàlisi de coinèrcia múltiple.
**Resultats**: Dividir les variables de la mostra en diferents blocs dóna com a resultat models d'aquestes relacions que mostren diferències en els gens i els microorganismes seleccionats.
Els models generats mitjançant el nostre nou mètode intermodel van superar l'anàlisi de coinèrcia múltiple per classificar les mostres segons la seva ubicació.
Tot i utilitzar-se en conjunts de dades de diferents fonts, els models resultants mostren relacions similars entre variables.
**Discussió**: La comparació de diversos models ajuda a esbrinar les relacions dins dels conjunts de dades.
El nostre mètode troba com de fortes són les relacions entre el microbioma, el transcriptoma i les variables ambientals.
En diferents conjunts de dades, els gens seleccionats eren comuns.
Aquest enfocament és robust i flexible per a diferents conjunts de dades i configuracions.
**Conclusió**: Amb inteRmodel vam trobar que el microbioma es relaciona més estretament amb la ubicació de la mostra que amb la malaltia, però el transcriptoma està molt relacionat amb la ubicació de la mostra a l'intestí.
Hi ha un transcriptoma comú entre conjunts de dades, mentre que els microorganismes depenen del conjunt de dades.
Podem millorar la classificació de les mostres tenint en compte tant l'ARNr 16S bacterià com el transcriptoma hoste.
# Glossary {-}
\printacronyms