Skip to content

Commit d320942

Browse files
release: v0.2.1
v0.2.1
2 parents 0be9f0b + 98028f6 commit d320942

File tree

9 files changed

+141
-93
lines changed

9 files changed

+141
-93
lines changed

README.md

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,38 @@
1-
# A Lightweight Conditional Random Field
1+
# Chaine
22

3-
This is a modern Python library without any third-party dependencies and a backend written in C implementing conditional random fields for natural language processing tasks like named entity recognition or part-of-speech tagging.
3+
A linear-chain conditional random field implementation.
4+
5+
Chaine is a modern Python library without any third-party dependencies and a backend written in C implementing conditional random fields for natural language processing tasks like named entity recognition or part-of-speech tagging.
6+
7+
- **Lightweight:** explain
8+
- **Fast:** explain
9+
- **Easy to use:** explain
410

511
You can install the latest stable version from [PyPI](https://pypi.org/project/chaine):
612

713
```
814
$ pip install chaine
915
```
1016

11-
If you are interested in the theoretical concepts behind conditional random fields, I can recommend the introducing paper by [Lafferty et al](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers).
17+
If you are interested in the theoretical concepts behind conditional random fields, refer to the introducing paper by [Lafferty et al](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers).
1218

1319

14-
## Example
20+
## How it works
1521

16-
```python
22+
```
1723
>>> import chaine
18-
>>> sequences = [[["a", "a"], ["b", "b"]]]
19-
>>> labels = [["0", "1"]]
20-
>>> model = chaine.train(sequences, labels)
21-
>>> model.predict(sequences)
22-
[['0', '1']]
24+
>>> tokens = [["John", "Lennon", "was", "rhythm", "guitarist" "of", "The", "Beatles"]]
25+
>>> labels = [["B-PER", "I-PER", "O", "O", "O", "O", "B-ORG", "I-ORG"]]
26+
>>> model = chaine.train(tokens, labels, max_iterations=5)
27+
Loading data
28+
Start training
29+
Iteration 1, train loss: 14.334076
30+
Iteration 2, train loss: 14.334064
31+
Iteration 3, train loss: 14.334053
32+
Iteration 4, train loss: 14.334041
33+
Iteration 5, train loss: 14.334029
34+
>>> model.predict(tokens)
35+
[['B-PER', 'I-PER', 'O', 'O', 'O', 'B-ORG', 'I-ORG']]
2336
```
2437

2538
Check out the introducing [Jupyter notebook](https://github.com/severinsimmler/chaine/blob/master/notebooks/tutorial.ipynb).

chaine/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
from chaine.core import train
1+
from chaine.training import train
22
from chaine.crf import Model, Trainer

chaine/crfsuite/include/crfsuite.hpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -364,13 +364,13 @@ namespace CRFSuite
364364

365365
if (model == NULL)
366366
{
367-
throw std::invalid_argument("The tagger is not opened");
367+
throw std::invalid_argument("The tagger is not opened.");
368368
}
369369

370370
// Obtain the dictionary interface representing the labels in the model.
371371
if ((ret = model->get_labels(model, &labels)))
372372
{
373-
throw std::runtime_error("Failed to obtain the dictionary interface for labels");
373+
throw std::runtime_error("Failed to obtain the dictionary interface for labels.");
374374
}
375375

376376
// Collect all label strings to lseq.
@@ -405,13 +405,13 @@ namespace CRFSuite
405405

406406
if (model == NULL || tagger == NULL)
407407
{
408-
throw std::invalid_argument("The tagger is not opened");
408+
throw std::invalid_argument("The tagger is not opened.");
409409
}
410410

411411
// Obtain the dictionary interface representing the attributes in the model.
412412
if ((ret = model->get_attrs(model, &attrs)))
413413
{
414-
throw std::runtime_error("Failed to obtain the dictionary interface for attributes");
414+
throw std::runtime_error("Failed to obtain the dictionary interface for attributes.");
415415
}
416416

417417
// Build an instance.
@@ -468,7 +468,7 @@ namespace CRFSuite
468468
// Obtain the dictionary interface representing the labels in the model.
469469
if ((ret = model->get_labels(model, &labels)))
470470
{
471-
throw std::runtime_error("Failed to obtain the dictionary interface for labels");
471+
throw std::runtime_error("Failed to obtain the dictionary interface for labels.");
472472
}
473473

474474
// Run the Viterbi algorithm.

chaine/data.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@
88
import re
99
from dataclasses import dataclass
1010

11-
from chaine.typing import Iterable
12-
1311

1412
@dataclass
1513
class Token:
File renamed without changes.

notebooks/tutorial.ipynb

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,27 @@
6868
"source": [
6969
"crf.predict(tokens)"
7070
]
71+
},
72+
{
73+
"cell_type": "markdown",
74+
"metadata": {},
75+
"source": [
76+
"## Feature extraction\n",
77+
"\n",
78+
"```\n",
79+
"identity of wi, identity of neighboring words\n",
80+
"embeddings for wi, embeddings for neighboring words\n",
81+
"part of speech of wi, part of speech of neighboring words\n",
82+
"base-phrase syntactic chunk label of wi and neighboring words\n",
83+
"presence of wi in a gazetteer\n",
84+
"wi contains a particular prefix (from all prefixes of length ≤ 4)\n",
85+
"wi contains a particular suffix (from all suffixes of length ≤ 4)\n",
86+
"wi is all upper case\n",
87+
"word shape of wi, word shape of neighboring words\n",
88+
"short word shape of wi, short word shape of neighboring words\n",
89+
"presence of hyphen\n",
90+
"```"
91+
]
7192
}
7293
],
7394
"metadata": {

0 commit comments

Comments
 (0)