nb = ArticleClassifier(dbpath='/path/to/db')
fn = nb.create_input_file(metadata)
nb.train(fn)
nb.save()
nb = ArticleClassifier(dbpath='/path/to/db')
nb.load()
fn = nb.create_input_file(metadata)
classes = nb.classify(fn)
In these examples, metadata should be a List of Dict where the Dict are in the format given below. The file name paths are relative to the current machine (on the local file system):
{
"id": "1704.00222",
"categories": ["cs.NM", "hep-th"],
"filename": "/path/to/file.txt"
}
Trained model for use by arXiv staff can be found at s3://arxiv-classifier-models
See experiments directory for training and evaluation notebooks.
The ULMFiT and SentencePiece model files can be downloaded here. Make sure CLASSIFIER_PATH
configuration parameter
points to models/abstracts-classifier.pkl
and that CLASSIFIER_TYPE
equals ulmfit
.
To test the service locally you can run it with
FLASK_APP=classifier.test_app flask run --port 9999
and make a request:
curl -s -H "Content-Type: application/json" -X POST http://localhost:9999/classify \
--data '{"title":"P = NP", "abstract": "We prove that P = NP for N = 1 or P = 0.", "primary": "cs.SE"}'
[{"category":"cs.CC","probability":0.8264293074607849},{"category":"cs.DS","probability":0.1285623162984848},...]
The primary is optional.
Both the input and output format are not yet compatible with the Naive Bayes classifier.