Skip to content

Latest commit

 

History

History
28 lines (17 loc) · 595 Bytes

EXPERIMENT.md

File metadata and controls

28 lines (17 loc) · 595 Bytes
  • njt proposed network
   (bert provided)

token attentions[...][512][512]

   cls,sep,pad removal (2x slice)

token_masked_attentions[...][nsw][nsw]

   out[x][y] = A * in[:,:,x,y] where A is trained (1x1 convolution)

token_logits[nsw][nsw]

    mm by arange hack with softmax instead of log

word_logits[nw][nw]
   
   add eye(-inf)  (rough_scorer masking)

bilinear_scores[nw][nw]

   (inject instead of rough_scorer output)

Q1: do we want dropout somewhere? Q2: doo we need to add a trainable attention to entry word matrix? A: make sure it's broken before fixing Q1 then Q2.