Baseline Results

Rule-Based

Top-50

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	Acc (Macro)	Acc (Micro)
No Extension (None)	45.02	18.88	19.58	26.60	12.57	15.34
ALL	16.86	26.54	18.11	20.62	9.89	11.50
Best	40.71	20.34	23.75	27.13	13.07	15.69

(09/16/2022)

Full

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	Acc (Macro)	Acc (Micro)
No Extension (None)	8.83	14.47	2.19	10.97	1.13	5.80
ALL	8.83	14.47	2.19	10.97	1.13	5.80
Best	8.83	14.47	2.19	10.97	1.13	5.80

(09/16/2022)

NOTE: that for the Full Version, the extension criteria (finding < 1 ICD label corresponding to input CUIs) were never met. Hence, identical results: i.e. No extensions were triggered.

Machine Learning Models

Overall, using 2-gram feature option improves performance across all models in both Top-50 and Full versions. Our best LR model with UMLS CUIS input type shows comparable performance with LR results from text input type reported in Vu et al. (2020). Our best SGD and LR results are also comparable with BiGRU results reported also in Vu et al. (2020). This is noted in both Top-50 and Full versions.

Top-50 Version

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
LR¹ 1-gram	51.87	47.97	46.38	49.84	81.26	84.09	48.86
LR¹ 2-gram	63.61	47.29	49.56	54.25	84.89	87.84	54.59
SGD² 1-gram	50.14	57.87	50.95	53.73	83.75	86.56	52.16
SGD² 2-gram	60.08	52.12	51.59	55.82	85.27	88.16	54.93
SVM³ 1-gram	51.45	43.72	43.55	47.28	79.70	81.85	46.92
SVM³ 2-gram	67.52	41.66	45.90	51.53	84.02	86.92	53.90
Stacked⁴ 2-gram	73.79	36.34	41.80	48.70	84.94	88.30	55.28

(09/17/2022)

Full Version

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
LR⁵ 1-gram	36.51	34.58	6.37	35.52	84.54	91.25	56.70
LR⁵ 2-gram	58.93	32.36	5.55	*41.78*	*86.41*	96.04	*67.63*
SGD⁶ 1-gram	21.90	56.59	8.06	31.58	84.69	96.06	30.77
SGD⁶ 2-gram	30.29	49.19	8.09	37.50	84.72	95.94	42.08
SVM⁷ 1-gram	38.43	28.66	4.63	32.83	81.54	67.49	50.81
SVM⁷ 2-gram	69.51	24.95	3.61	36.72	82.91	86.43	63.71

(09/17/2022, 09/20/2022)

LAAT Results

Our implementation reproduced the results for the Top-50 and Full versions of the dataset for the text input type as reported in Vu et al. (2020). We report the results from using UMLS CUIs as input tokens below.

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Text Top50	75.60	66.95	66.55	71.01	92.79	94.6	67.28
CUI Top50	68.75	47.38	50.55	56.10	86.16	89.26	57.50
Text Full	65.70	50.64	9.87	57.20	89.84	98.56	80.91
CUI Full	64.93	36.59	6.25	46.8	84.38	97.74	73.90

(Results from 10/15/2022 run on Git Commit@36dda76)

Extension Results

KGE

All models reported here were with CUI input type. Baseline W2V were re-run with pruned CUIs for comparison; hence, the results are different from what was reported in the earlier section.

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Base Top50	69.71	52.05	54.02	59.60	87.17	90.47	59.68
Case4 Top50	68.46	59.01	58.94	63.38	88.22	91.13	60.69
W2V Top50	64.90	55.03	53.53	59.56	86.07	89.40	58.06
Base Full⁸	62.11	34.84	5.53	44.64	86.55	97.99	71.73
Case4 Full⁹	63.79	37.61	6.50	47.32	85.60	97.89	73.51
W2V Full	65.78	35.44	5.70	46.07	84.92	97.77	73.31

(Results from 12/26/2022 run on Git Commit@c658090)

Combined KGE + Text

We experimented on with the case4 KGE as this is the best-performming KGE for the CUI input type.

Late-Fusion

Shared LSTM Encoder

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Case4 Top50	71.52	68.23	65.52	69.83	91.11	98.68	65.69
Case4 Full	63.13	50.10	6.25	55.87	84.38	97.74	78.76

(Results from 12/26/2022 run on Git Commit@c658090)

Separate LSTM Encoder

Post-LAAT Aggregator layer was used to combine the two input representations.

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Case4 Top50	72.20	68.15	65.20	70.11	92.04	94.12	66.47
Case4 Full	60.21	51.07	9.92	55.27	90.35	98.57	76.94

(Results from 08/10-12/2023 run on Git Commit@fc63f24)

LAAT Layer was used to combine the two input represetnations.

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Case4 Top50	73.97	68.71	67.35	71.25	92.87	94.63	67.02
Case4 Full	65.60	50.56	9.99	57.11	88.06	98.38	80.47

(Results from 08/10-12/2023 run on Git Commit@fc63f24)

GNN: 2-layer GCN Baseline Results

Baseline 2-layer GCN model as encoder without the LAAT Attention mechanism. Preliminary experiments indicated that using LAAT layers in-lieu of 'mean' / 'sum' readout functions was not beneficial to model's performance. All hypermarameters remain the same as in the baseline LAAT model.

Baseline graph construction approaches rely on shared TUI or relations from KG.

Preliminary Baseline: CUIs connected if they belong to the same TUI

'mean' readout function was used for this experiment.

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Base Top50	63.19	35.40	36.25	45.38	80.96	84.79	49.82
Case4 Top50	62.11	36.01	36.42	45.59	81.88	85.43	50.02
Base Full	52.56	18.89	2.88	27.79	80.12	96.71	54.74
Case4 Full	53.40	19.19	2.88	28.24	80.96	96.86	55.72

(Results from 5/22-24/2023 run on Git Commit@ac47a9)

KG Graph Construction Method: CUIs connected if they have shared relations

Same as the basic GNN approach above, but baseline graph construction utilizes KGE relation information. CUIs are connected if they have a relation between them in the KG.

'mean' aggregator readout

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Base Top50	67.03	36.18	38.26	46.99	83.85	86.83	52.62
Case4 Top50	67.67	37.72	38.68	48.44	84.46	87.28	53.70
W2V Top50	64.54	21.99	21.01	32.80	76.77	80.89	43.09
Base Full	64.66	18.59	2.41	28.87	83.56	97.31	62.73
Case4 Full	63.68	19.77	2.82	30.17	84.65	97.46	63.60
W2V Full	67.78	16.69	1.18	26.79	84.47	97.40	60.82

'sum' aggregator readout

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Base Top50	66.52	42.64	42.98	51.97	83.77	86.82	53.95
Case4 Top50	67.81	45.02	47.33	54.12	84.55	87.41	56.00
W2V Top50	65.69	38.68	38.76	48.68	82.98	86.26	51.97
Base Full	62.03	16.75	1.47	26.38	76.48	96.38	57.08
Case4 Full	55.53	17.81	1.80	26.97	77.19	96.08	55.60
W2V Full	62.94	19.26	1.93	29.50	76.67	96.44	60.27

(Results from 8/5-8/2023 run on Git Commit@2bcb60

Graph Construction Method with EHR Structure/Clinical Reasoning

Following insight from manual annotation and the idea in Learning EHR Graphical Structures with Graph Convolutional Transformer, we construction each graph representing an input document following the following heuristic:

CUI entities are grouped based on whether they are considered diagnostic, procedure, concept, or laboratory entities.
Conditional probabilities of the co-occurrences of CUIs across these groups are pre-calculated from the training set
During graph construction, edges are connected if the conditional probability between the CUIs exceed a threshold
- experiment thresholds: 0.3, 0.5, 0.7, 0.8

An additional experiment was performed to combine the EHR-guided approach with the baseline KG relations. The motivation is to see if combining the two approaches, resultig in more edges and fewer disjointed subgraphs, will help model learn more useful information. Results from combining the two approaches seem to have mixed results.

Best results for the Top50 version are from the experiment with 0.7 threshold and with 0.5 for the Full version.

'sum' aggregator readout, comparing results from 0.7 and 0.5 threshold for Top50 and Full

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Case4 Top50 0.7	65.24	48.41	48.71	55.58	84.72	87.71	56.22
Case4 Top50 0.7¹⁰	69.34	43.99	46.58	53.83	84.44	87.64	56.33
Case4 Top50 0.5	66.25	41.57	43.76	51.09	83.90	86.88	55.60
Case4 Full 0.7	61.17	17.88	2.12	26.67	75.08	96.20	58.56
Case4 Full 0.5¹⁰	63.88	17.94	2.11	28.01	74.75	96.22	60.13
Case4 Full 0.5	60.69	18.91	2.23	28.84	76.10	96.40	59.32

'mean' aggregator readout, comparing results from 0.7 and 0.5 threshold for Top50 and Full

Model	P (Micro)	R (Micro)	F1 (Macro)	F1 (Micro)	AUC (Macro)	AUC (Micro)	P@5
Case4 Top50 0.7	65.21	38.55	38.75	48.46	83.90	86.67	52.31
Case4 Full 0.5	64.24	16.96	2.51	26.84	82.06	97.12	62.57

(Results from 9/14-9/18/2023 run on Git Commit@fdec63

Footnotes

With default settings for scikit-learn 1.0.2 except class_weight=='balanced', C==1000 and max_iter==1000. ↩ ↩²
With default settings for scikit-learn 1.0.2 except class_weight=='balanced', eta0=0.01, learning_rate='adaptive', and loss='modified_huber'. ↩ ↩²
With default settings for scikit-learn 1.0.2 except class_weight=='balanced' and C==1000. ↩ ↩²
With defaults settings for OneVsRestClassifier and changes to each component as above. ↩
With default settings for scikit-learn 1.0.2 except class_weight=='balanced', C==1000 and max_iter==5000. ↩ ↩²
With default settings for scikit-learn 1.0.2 except class_weight=='balanced', eta0=0.01, learning_rate='adaptive', and loss='modified_huber'. ↩ ↩²
With default settings for scikit-learn 1.0.2 except class_weight=='balanced' and C==1000. ↩ ↩²
With u==256 and da==256. ↩
With u==256 and da==256. ↩
With connections between CUIs also when they share relations in KG . ↩ ↩²

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Baseline Results

Rule-Based

Top-50

Full

Machine Learning Models

Top-50 Version

Full Version

LAAT Results

Extension Results

KGE

Combined KGE + Text

Late-Fusion

Shared LSTM Encoder

Separate LSTM Encoder

GNN: 2-layer GCN Baseline Results

Preliminary Baseline: CUIs connected if they belong to the same TUI

KG Graph Construction Method: CUIs connected if they have shared relations

'mean' aggregator readout

'sum' aggregator readout

Graph Construction Method with EHR Structure/Clinical Reasoning

'sum' aggregator readout, comparing results from 0.7 and 0.5 threshold for Top50 and Full

'mean' aggregator readout, comparing results from 0.7 and 0.5 threshold for Top50 and Full

Files

README.md

Latest commit

History

README.md

File metadata and controls

Baseline Results

Rule-Based

Top-50

Full

Machine Learning Models

Top-50 Version

Full Version

LAAT Results

Extension Results

KGE

Combined KGE + Text

Late-Fusion

Shared LSTM Encoder

Separate LSTM Encoder

GNN: 2-layer GCN Baseline Results

Preliminary Baseline: CUIs connected if they belong to the same TUI

KG Graph Construction Method: CUIs connected if they have shared relations

'mean' aggregator readout

'sum' aggregator readout

Graph Construction Method with EHR Structure/Clinical Reasoning

'sum' aggregator readout, comparing results from 0.7 and 0.5 threshold for Top50 and Full

'mean' aggregator readout, comparing results from 0.7 and 0.5 threshold for Top50 and Full

Footnotes