-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathaas2020_slides.Rmd
306 lines (211 loc) · 9.54 KB
/
aas2020_slides.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
---
title: "Text-mining for Cohort Ascertainment and Analysis of Patients with EVA"
subtitle: "E Bryan Crenshaw III, PhD"
author: "Children's Hospital of Philadelphia"
date: "March 7, 2020"
output:
powerpoint_presentation:
reference_doc: audgendb_classic.pptx
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE, #TRUE
warning = FALSE,
message = FALSE,
out.height = "300px",
cache = TRUE
)
# subtitle:
# slide_level: 3 # Use this to override default
# for more info:
# https://support.rstudio.com/hc/en-us/articles/360004672913-Rendering-PowerPoint-Presentations-with-RStudio
```
# Acknowledgements
Put previous slide here
# Big Data & Audiology
![Audiology Today, 28:42-49 (2016)](images/BigDataCover.png)
# Big Data: Opportunities & Challenges
- Provides opportunities
- Large cohort sizes
- Comparison across institutions
- Compendium of data resources to bring to bear on a problem
- Rapid testing of hypotheses
- Presents Challenges
- Observational studies
- Sparsity of data
- Significant amounts of data in the form of text, which isn’t readily accessible to computational methods without further work
# The Audiological and Genetic Database (AudGenDB)
- Pediatric hearing research integrates complex data from varying specialties, on large numbers of patients.
- Examples of data in AudGenDB:
- Audiogram for hearing thresholds
- Evoked responses (e.g. ABR)
- Speech and language performance
- Diagnoses and problems
- Procedures
- Temporal bone imaging
- Large-scale hearing research projects would benefit from an integrated electronic database that incorporates these varied data.
# Database Summary
![[Pennington et al., 2019](https://www.ncbi.nlm.nih.gov/pubmed/31408044)](images/Data Summary Pennington Paper.png)
# Data Science – Machine Learning
Use slide from down's presentation
# Talk Outline
- Work in Progress
- Demonstrate textmining approaches
- Discuss algorithms to identify progression
- Discuss mathematical modeling to facilitate our analyses
- Touch on how these approaches facilitate the long term goal of predictive analytics
# Hearing Loss Progression
- The larger question – Enlarged Vestibular Aqueduct (EVA) is a subset of the HL Progression problem
- How to determine which patients will progress?
- What computational approaches can we bring to bear on the problem?
- Can machine learning techniques be used to predict HL progression?
- What features are available for inclusion as features?
- Can we generate ‘computed phenotypes’?
# Identifying an EVA Cohort
- No unique ICD Code
- How to identify the cohort?
- Track in the clinic (Approach Taken at Boston Children's & Vanderbilt)
- Text Mining Radiology Reports (CHOP Approach)
- Radiology Reports are ‘Semi-Structured’
- Organization is stereotyped
- Language is idiosyncratic for each clinician
- (Some believe verbosity is elegant ;-)
# Text Mining for the Term ‘vestibular aqueduct’
![](images/text_mining_va.png)
# Text Mining for the Term ‘vestibular aqueduct’
- 18,919 radiology impressions in AudGenDB
- 2726 observations can be found with the term, ’vestibular aqueduct`
- 256 impressions have the term more than twice
- Hand-curated 30 of these
- 21 describe EVA
- 2 definitely not enlarged
- 5 used radiology CYA words (ambiguous)
- 1 duplicate report
- 1 thin vestibular aqueduct
- A list of specific terms assembled
# Text Mining: Regular Expressions
Dispositive Expression (5):
Example: (vestibular(\\n| )aqueducts?.+(is|are|with)(\\n| ) ?(mildly|markedly)?(\\n| )(enlarged|dilated|widened))
Exclusionary Expression – Negative (13):
Example: no( |\n)evidence of enlarg(ed|ement of the) vestibular aqueducts?
Exclusionary Expression – Ambiguous (4):
Example: may represent\nenlarged vestibular aqueducts
# Text Mining: Regular Expressions
- 13,289 radiology impressions in AudGenDB
- 6,255 patients
- 206 impressions from 190 patients are found using the regex algorithms
- 344 patients cumulatively at the three institutions represented in AudGenDB
# Computational Methods to Identify HL Progression
- Most in the audience familiar with identifying HL progression using a collection of audiograms from a single patient
- Our task was to develop computational methods that could screen ~400,000 audiograms in AudGenDB from ~136,000 patients
- Given the large number of audiograms, we had one luxury: making a rigorous definition
- Our definition: Using masked bone thresholds, find patients whose PTA4 dropped by 10 dB from first to last hearing test.
# HL Progression in AudGenDB
- **135,706 total patients** with audiograms in AudGenDB
- **15,581 patients** fitting inclusion criteria using **masked bone conduction**.
- Inclusion Criteria: Patients must have at least 3 hearing tests that dropped by 10 dB HL or more from first to last hearing test that were tested at 4 frequencies (500, 1000, 2000, and 4000 Hz).
- 390 patients with HL progression defined by these criteria
# Sample of Patients Demonstrating HL Progression
![](images/HLprogressionSample.png)
::: notes
Graph from < file:///Users/ebciii3/Public/AudGenDB%20Data%20Mining/Hearing%20Loss%20Progression/html/TemporalGraphsHLprogressionCohort.html#data_analysis > HL103 Temporal Graphs of HL Progression Cohort
:::
# HL Progression in EVA Patients
- Changed definition: Our definition: Using **air thresholds**, find patients whose PTA4 dropped by 10 dB from first to last hearing test.
- For routine monitoring of EVA patients, only air thresholds were measured
- 155 patients (of the 190 found at CHOP) have enough audiogram data for analysis (at least 3 audiograms with all PTA4 frequencies included)
- 60 patients demonstrated progressive hearing loss
- 39% of the qualifying EVA patients have progressive hearing loss
# Identifying Patients with Fluctuating Hearing
- Approach: Calculate difference in PTA 4 from test to test as the patient ages
- Definition: Maximum >= 10 dB between tests & minimum <= -10 dB between tests
- In other words, test-to-test change must go up at least 10 dB and drop by at least 10 dB
- Results
- 190 total CHOP patients with EVA
- 155 patients meet all inclusion criteria using **air conduction** for testing
- 71 ears from 60 patients showed progression demonstrated progressive hearing loss
- **36 ears that qualify as fluctuating** in the entire data set
- 19.9% of the hearing tests
- **13 ears with fluctuating hearing thresholds** among the **progressive HL patients**
# Examples of Fluctuation in EVA Patients
![](images/FluctuatingPatientGraphs.png)
# Mathematical Modeling to Identify Progression Types
:::::: {.columns}
::: {.column}
- Sudden hearing loss is a prominent feature of HL Progression
- Can we use mathematical modeling to categorize HL Progression profiles?
Note: add images/MathModelsHLprogression.jpeg here
:::
::: {.column}
- Model each patients *pta4 ~ age* curve
- Extract variable values
- linear (m/b1, b/b0, r2)
- 4th order polynomial (b4, b3, b2, b1, b0, r2)
Clustering
Note add < images/ClusterHLprogression_small.png > here
:::
::::::
# Mathematical Modeling to Identify Progression Types
:::::: {.columns}
::: {.column}
![](images/MathModelsHLprogression.jpeg)
:::
::: {.column}
![](images/ClusterHLprogression_small.png)
:::
::::::
# Mathematical Modeling:
## Theoretical Curves
:::::: {.columns}
::: {.column}
- Sudden Hearing Loss is best modeled with 4th degree polynomial
- **Linear model** fits the data **poorly**, which is reflected in a low r2 value
- **4th order polynomial model** fit is **good**, and r2 value is high
:::
::: {.column}
- Slow HL Progression is well-modeled with both approaches
- **Linear model** fit is **good** and r2 value is high
- **4th order polynomial** is **good**, and r2 value is high
:::
::::::
# Mathematical Modeling to Identify Progression Types
:::::: {.columns}
::: {.column}
![](images/RegressionModels_Sudden.png)
:::
::: {.column}
![](images/RegressionModels_Slow.png)
:::
::::::
# Clustering HL Progression Profiles (PTA4 ~ Age)
![Clustering of Models Can Generate Useful Categorization of HL Progression](images/ClusterHLprogression.png)
# Two Categories of Sudden Hearing Drop Profiles Distinguished by Clustering: CHOP EVA Cohort
:::::: {.columns}
::: {.column}
![Sudden Drops In An Overall Stable Profile](images/SuddenDrop_StableCluster.png)
:::
::: {.column}
![Sudden Drops Without an Otherwise Stable Profile](images/SuddenDrop_NotStable.png)
:::
::::::
# Conclusions
- Big Data in hearing sciences requires development of new computational methods to characterize hearing data
- Text mining approaches can be used to identify patient cohorts that have traditionally taken considerable effort to find
- Algorithms have been developed to identify temporal patterns in hearing tests relevant to hearing research, such as HL progession and fluctuation
- These approaches can digest large datasets, and provide 'features' for machine learning
- In the long term, predictive analytical techniques using these approaches can be developed
- Clinical decision support (CDS) systems can then incorporate predictive analytic models to facilitate the evaluation of patients in audiological and otolaryngological practices
# AudGenDB Home Page
:::::: {.columns}
::: {.column}
AudGenDB website is found at https://audgendb.chop.edu/.
:::
::: {.column}
[![](images/AudGenDB Home Page.jpg)](https://audgendb.chop.edu/)]
:::
::::::
# Questions?
#
# Distribution of Patients in AudGenDB
![Color Scheme: CHOP Blue, Harvard (BCH) Crimson,
and Vanderbilt Commodore Gold.](images/AudGenDB Institutional Representation.png)