Document classification

Classify documents using Python based on SVM and TF-IDF.

Two Python librarys(Pandas and liblinear) are needed. On Windows, you can download the liblinear library from http://www.lfd.uci.edu/~gohlke/pythonlibs/#liblinear
The structures of the data files are:
- The .data files are formatted "docIdx wordIdx count".
- The .label files are simply a list of label id's.
- The .map files map from label id's to label names.
This demo will give the accuracy near 81.3991% (6109/7505).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
doc_classify.py		doc_classify.py
stopwords.txt		stopwords.txt
test.data		test.data
test.label		test.label
test.map		test.map
train.data		train.data
train.label		train.label
train.map		train.map
vocabulary.txt		vocabulary.txt

Provide feedback