Skip to content

Latest commit

 

History

History
43 lines (25 loc) · 1.19 KB

NLPTransferLearning.md

File metadata and controls

43 lines (25 loc) · 1.19 KB

Transfer Learning in NLP

Overview

Strategies for applying pre-trained language representations (mention in BERT paper)

  • Feature-based
    • ELMo
  • Fine-tuning
    • BERT

Feature-based approach

ELMo (Embeddings from Language Models)

Deep contextualized word representation model

  • Complex characteristics of word use (e.g., syntax and semantics)
  • How these uses vary across linguistic contexts (i.e., to model polysemy)

Feature

  • Contextual: The representation for each word depends on the entire context in which it is used
  • Deep: Combine all layers of a deep pre-trained neural network
  • Character based: ELMo representations are purely character based, allowing the network to use morphological clues to form robust representations for out-of-vocabulary tokens unseen in training

Fine-tuning approach

BERT (Bidirectional Encode Representations from Transformers)

BERT

Links

Article

Paper