I find the output is 1*2 vector in Class CorefTagger,and the final output y is a 1*3 vector in your paper. Are there any differences? another question: did you test on ".auto_conll" in you paper (CONLLING 2018)