njt proposed network

   (bert provided)

token attentions[...][512][512]

   cls,sep,pad removal (2x slice)

token_masked_attentions[...][nsw][nsw]

   out[x][y] = A * in[:,:,x,y] where A is trained (1x1 convolution)

token_logits[nsw][nsw]

    mm by arange hack with softmax instead of log

word_logits[nw][nw]
   
   add eye(-inf)  (rough_scorer masking)

bilinear_scores[nw][nw]

   (inject instead of rough_scorer output)

Q1: do we want dropout somewhere? Q2: doo we need to add a trainable attention to entry word matrix? A: make sure it's broken before fixing Q1 then Q2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXPERIMENT.md

EXPERIMENT.md

Files

EXPERIMENT.md

Latest commit

History

EXPERIMENT.md

File metadata and controls