Skip to content

POSTAGGER BASELINE(using LTP)

memeda edited this page Aug 30, 2016 · 1 revision

We using LTP postagger result as the baseline .

###Acuracy

dataset accuracy sentence number tokens number time cost(s)
pku-weibo-holdout 96.7452% 8,000 172,054 1.48
pku-weibo-test 96.7364% 12,500 - 2.34
pku-holdout 98.3586% 5,000 114,293 1.05
pku-test 98.3456% 7,500 - 1.58
weibo-holdout 93.5527% 3,000 57,761 0.57
weibo-test 93.8329% 5,000 - 0.91

using the existed LTP model while evaluation on the fixed gold data .

updated ! using LTP model 3.3.1 , using otpos(single process) , at node GPU05 with CPU Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz(48 processors) and Total Memory 251.64 GB

###Speed

about 118.53 K tokens/s

updated .

使用50k句子、共996,862词的wiki数据做测试(运行环境与上述相同) , 运行时间为8.41 s , 由此计算速度为: 996,862. / 1000 / 8.41 = 118.53 K tokens/s

###Error stat*

####PKU-holdout

top 10 tags pair ( ture_tag => predict_tag )for predicting error

v=>n    314 
n=>v    219 
v=>p    122 
v=>a    115 
p=>v    95  
a=>v    76  
d=>c    57  
c=>p    56  
p=>c    56  
n=>a    55 

top 10 words for most freqently predict error

与  38
为  35
在  33
又  30
多  26
到  24
将  23
由于    21
以  19
作为    19

top 10 words and tags pair(word:true_tag => predict_tag) for predicting error

在:v=>p 21
与:p=>c 20
为:v=>p 20
又:d=>c 20
到:p=>v 19
由于:c=>p   17
与:c=>p 16
为:p=>v 15
以:p=>c 13
和:p=>c 13

###WEIBO-HOLDOUT

top 10 tags pair ( ture_tag => predict_tag )for predicting error

n=>v    826 
v=>n    247 
d=>a    220 
n=>a    180 
nz=>n   95  
v=>a    87  
a=>v    77  
nh=>n   72  
a=>n    71  
d=>v    71  

top 10 words for most freqently predict error

生活    31
因为    23
哦  22
工作    22
多  21
给  21
在  19
好  17
成功    16
为  16
与  16

top 10 words and tags pair(word:true_tag => predict_tag) for predicting error

生活:n=>v   30
工作:n=>v   22
因为:c=>p   18
哦:e=>u 14
为:v=>p 13
给:v=>p 12
爱:n=>v 12
用:v=>p 10
评论:n=>v   10
好:d=>a 9
正式:d=>a   9
给:p=>v 9
服务:n=>v   9

###PKU-WEIBO-holdout (merging)

top 10 tags pair ( ture_tag => predict_tag )for predicting error

n=>v    1045
v=>n    561 
n=>a    235 
d=>a    234 
v=>a    202 
v=>p    191 
a=>v    153 
p=>v    135 
a=>n    122 
d=>v    116 

top 10 words for most freqently predict error

与  54
在  52
为  51
多  47
给  39
因为    37
又  37
将  34
到  33
生活    31

top 10 words and tags pair(word:true_tag => predict_tag) for predicting error

为:v=>p 33
生活:n=>v   30
与:p=>c 28
因为:c=>p   28
又:d=>c 27
在:v=>p 24
工作:n=>v   24
与:c=>p 23
给:v=>p 22
到:p=>v 22
  • 未更新,[TODO]