Skip to content

YeliangLi/tensorflow-Chinese-document-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tensorflow Chinese Document Classification

Preparation

My Hardware

4-core CPU, 16G memory, 64G SSD, 1 Titan Z graphics card (12G display memory, two GPU)

My OS

Ubuntu 16.04.1

Data Set

搜狗20061127新闻语料(包含分类)@百度盘
Includes 9 classes of news corpus such as finance, IT, health, sports, tourism, education, recruitment, culture and military.Each category has 1,990 texts.

Requirements

  • numpy >= 1.12.1
  • tensorflow 1.4.0
  • scikit-learn 0.19.1
  • jieba
  • zhon

Why This Project?

Hierarchical Attention Networks for Document Classification is a classic paper uses attention mechanism for document
classification.At present,open source code about Chinese document classification based on deep learning still less.So I
use the sogou news corpus and tensorflow to achieve a Chinese classifier.Fig1 shows the training results and finally this
model achieves 0.806780 accuracy(as shown in Fig2) in the test set.My Chinese blog gives a code analysis of this project
and welcome to look up.

How to get started?

  1. First you need to download the database and extract it to the code directory.
  2. Command "python3 preprocess.py" used to generate TFRecords format files for training and testing.
  3. Command "python3 train.py" achieve training.
  4. After the training is completed, you can use the command "python3 evaluate.py" to achieve the model evaluation in the test set.

Figure


                                                                                        Fig1 training results

                                                                 
                                                                              Fig2 evaluation results

About

The tensorflow implementation of Chinese document classification based on attention mechanism.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages