4-core CPU, 16G memory, 64G SSD, 1 Titan Z graphics card (12G display memory, two GPU)
Ubuntu 16.04.1
搜狗20061127新闻语料(包含分类)@百度盘
Includes 9 classes of news corpus such as finance, IT, health, sports, tourism, education, recruitment, culture and military.Each category has 1,990 texts.
- numpy >= 1.12.1
- tensorflow 1.4.0
- scikit-learn 0.19.1
- jieba
- zhon
Hierarchical Attention Networks for Document Classification is a classic paper uses attention mechanism for document
classification.At present,open source code about Chinese document classification based on deep learning still less.So I
use the sogou news corpus and tensorflow to achieve a Chinese classifier.Fig1 shows the training results and finally this
model achieves 0.806780 accuracy(as shown in Fig2) in the test set.My Chinese blog gives a code analysis of this project
and welcome to look up.
- First you need to download the database and extract it to the code directory.
- Command "python3 preprocess.py" used to generate TFRecords format files for training and testing.
- Command "python3 train.py" achieve training.
- After the training is completed, you can use the command "python3 evaluate.py" to achieve the model evaluation in the test set.