New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Try out different word embeddings for BERT intrinsic evaluation #38

Open

siemdejong opened this issue Apr 6, 2023 · 0 comments

Labels

experiment

Owner

siemdejong commented Apr 6, 2023 •

edited

Loading

Research question
The last hidden layer of BERT is best suited for contextualized text embeddings.

Hypothesis
It is the layer where the structure is best defined, considering all previous relations in the other 11 layers.

Method

Instantiate pretrained ClinialBERT
Gather a dataset of medical terms with different classes. E.g. all brain locations, but locations are grouped by occurrence of tumours in those regions.
Generate embeddings from layers as proposed in https://jalammar.github.io/illustrated-bert/
Intrinsic evaluation per embedding strategy. Evaluation measure tbd

Why is this experiment worthwhile?
Papers report different accuracies when using different embedding strategies from pretrained models (ref!).

siemdejong added the experiment label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment