Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try out different word embeddings for BERT intrinsic evaluation #38

Open
siemdejong opened this issue Apr 6, 2023 · 0 comments
Open
Labels
experiment An experiment to do

Comments

@siemdejong
Copy link
Owner

siemdejong commented Apr 6, 2023

Research question
The last hidden layer of BERT is best suited for contextualized text embeddings.

Hypothesis
It is the layer where the structure is best defined, considering all previous relations in the other 11 layers.

Method

  1. Instantiate pretrained ClinialBERT
  2. Gather a dataset of medical terms with different classes. E.g. all brain locations, but locations are grouped by occurrence of tumours in those regions.
  3. Generate embeddings from layers as proposed in https://jalammar.github.io/illustrated-bert/
  4. Intrinsic evaluation per embedding strategy. Evaluation measure tbd

Why is this experiment worthwhile?
Papers report different accuracies when using different embedding strategies from pretrained models (ref!).

@siemdejong siemdejong added the experiment An experiment to do label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experiment An experiment to do
Projects
None yet
Development

No branches or pull requests

1 participant