A based-bert baseline for Chinese idiom cloze test with pytorch.
the competition official website
The ChID Dataset for paper ChID: A Large-scale Chinese IDiom Dataset for Cloze Test.
use transformers and pytorch implement based-bert for chinese idiom cloze test
pyhton3.6
torch=1.1.0
transformers==2.8.0
scikit-learn==0.22.2.post1
pandas==1.0.3
tqdm==4.45.0
Chid dataset download
save chid data into ./data
you maybe need a vpn
download
For this baseline, we use chinese_wwm_pytorch as pretrain model
save chid data into ./pretrained_models