-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathuser_guide.html
578 lines (547 loc) · 56.7 KB
/
user_guide.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>User Guide — pystruct 0.2.4 documentation</title>
<link rel="stylesheet" href="_static/basic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/gallery.css" type="text/css" />
<link rel="stylesheet" href="_static/pystruct.css" type="text/css" />
<link rel="stylesheet" href="_static/bootstrap.min.css" type="text/css" />
<link rel="stylesheet" href="_static/bootswatch-3.3.4/cerulean/bootstrap.min.css" type="text/css" />
<link rel="stylesheet" href="_static/bootstrap-sphinx.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.2.4',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/js/jquery-1.11.0.min.js"></script>
<script type="text/javascript" src="_static/js/jquery-fix.js"></script>
<script type="text/javascript" src="_static/bootstrap-3.3.4/js/bootstrap.min.js"></script>
<script type="text/javascript" src="_static/bootstrap-sphinx.js"></script>
<link rel="top" title="pystruct 0.2.4 documentation" href="index.html" />
<link rel="prev" title="Installation" href="installation.html" />
<meta charset='utf-8'>
<meta http-equiv='X-UA-Compatible' content='IE=edge,chrome=1'>
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1'>
<meta name="apple-mobile-web-app-capable" content="yes">
</head>
<body role="document">
<div id="navbar" class="navbar navbar-default navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<!-- .btn-navbar is used as the toggle for collapsed navbar content -->
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".nav-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">
PyStruct</a>
<span class="navbar-text navbar-version pull-left"><b>0.2.4</b></span>
</div>
<div class="collapse navbar-collapse nav-collapse">
<ul class="nav navbar-nav">
<li><a href="index.html">Start</a></li>
<li><a href="installation.html">Installation</a></li>
<li><a href="intro.html">Introduction</a></li>
<li><a href="#">User Guide</a></li>
<li><a href="auto_examples/index.html">Examples</a></li>
<li><a href="references.html">API</a></li>
<li class="dropdown globaltoc-container">
<a role="button"
id="dLabelGlobalToc"
data-toggle="dropdown"
data-target="#"
href="index.html">Site <b class="caret"></b></a>
<ul class="dropdown-menu globaltoc"
role="menu"
aria-labelledby="dLabelGlobalToc"></ul>
</li>
</ul>
<form class="navbar-form navbar-right" action="search.html" method="get">
<div class="form-group">
<input type="text" name="q" class="form-control" placeholder="Search" />
</div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
</div>
<div class="container content-container">
<div class="section" id="user-guide">
<span id="id1"></span><h1>User Guide<a class="headerlink" href="#user-guide" title="Permalink to this headline">¶</a></h1>
<p>This page explains how to use the most common of the implemented models.
Each model corresponds to a differents structured prediction task, or possibly
a different parametrization of the model. As such, the training data <code class="docutils literal"><span class="pre">X</span></code> and
training labels <code class="docutils literal"><span class="pre">Y</span></code> has slightly different forms for each model.</p>
<p>A model is given by four functions, <code class="docutils literal"><span class="pre">joint_feature</span></code>, <code class="docutils literal"><span class="pre">inference</span></code>, <code class="docutils literal"><span class="pre">loss</span></code>
and <code class="docutils literal"><span class="pre">loss_augmented_inference</span></code>. If you just want to use the included models,
you don’t need to worry about these, and can just use the <code class="docutils literal"><span class="pre">fit</span></code>, <code class="docutils literal"><span class="pre">predict</span></code> interface
of the learner.</p>
<div class="section" id="details-on-model-specification">
<h2>Details on model specification<a class="headerlink" href="#details-on-model-specification" title="Permalink to this headline">¶</a></h2>
<p>For those interested in what happens behind the scenes, or those who might want to
adjust a model, there is a short explanation of these functions for each model below.
For all models, the <code class="docutils literal"><span class="pre">joint_feature(x,</span> <span class="pre">y)</span></code> function takes a data point and a
tentative prediction, and computes a continuous vector of a fixed length that
captures the relation between features and label. Learning (that is
<code class="docutils literal"><span class="pre">learner.fit(X,</span> <span class="pre">y)</span></code>) will learn a parameter vector <code class="docutils literal"><span class="pre">w</span></code>, and predictions
will be made using</p>
<div class="math">
<p><img src="_images/math/1ea234cbf7dcf6a24a90a797371943168ccceace.png" alt="y^* = \arg \max_{y} w^T \text{joint\_feature}(x, y)"/></p>
</div><p>That means the number of parameters in the model is the same as the
dimensionality of <code class="docutils literal"><span class="pre">joint_feature</span></code>.</p>
<p>The actual maximization is performed in the <code class="docutils literal"><span class="pre">inference(x,</span> <span class="pre">w)</span></code> function, which
takes a sample <code class="docutils literal"><span class="pre">x</span></code> and a parameter vector <code class="docutils literal"><span class="pre">w</span></code> and outputs a <code class="docutils literal"><span class="pre">y^*</span></code>,
which (at least approximately) maximizes the above equation.</p>
<p>The <code class="docutils literal"><span class="pre">loss(y_true,</span> <span class="pre">y_pred)</span></code> function gives a numeric loss for a ground truth
labeling <code class="docutils literal"><span class="pre">y_true</span></code> and a prediction <code class="docutils literal"><span class="pre">y_pred</span></code>, and finally
<code class="docutils literal"><span class="pre">loss_augmented_inference(x,</span> <span class="pre">y,</span> <span class="pre">w)</span></code> gives an (approximate) maximizer for</p>
<div class="math">
<p><img src="_images/math/31513181c4a4ade1ff6aaa7d877bcfdffc6e8c4e.png" alt="y^* = \arg \max_{y} w^T \text{joint\_feature}(x, y) + \text{loss}(y_\text{true}, y)"/></p>
</div><p>A good place to understand these definitions is <a class="reference internal" href="#multi-class-svm"><span>Multi-class SVM</span></a>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Currently all models expect labels to be integers from 0 to n_states (or
n_classes). Starting labels at 1 or using other labels might lead to
errors and / or incorrect results.</p>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">None of the model include a bias (intercept) by default.
Therefore it is usually a good idea to add a constant feature to all
feature vectors, both for unary and pairwise features.</p>
</div>
</div>
</div>
<div class="section" id="multi-class-svm">
<span id="id2"></span><h1>Multi-class SVM<a class="headerlink" href="#multi-class-svm" title="Permalink to this headline">¶</a></h1>
<p>A precursor for structured SVMs was the multi-class SVM by <a class="reference external" href="http://jmlr.csail.mit.edu/papers/volume2/crammer01a/crammer01a.pdf">Crammer and Singer</a>.
While in practice it is often faster to use an One-vs-Rest approach and an
optimize binary SVM, this is a good hello-world example for structured
predicition and using pystruct. In the case of multi-class SVMs, in contrast
to more structured models, the labels set Y is just the number of classes, so
inference can be performed by just enumerating Y.</p>
<p>Lets say we want to classify the classical iris dataset. There are three classes and four features:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">load_iris</span>
<span class="gp">>>> </span><span class="n">iris</span> <span class="o">=</span> <span class="n">load_iris</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">iris</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">iris</span><span class="o">.</span><span class="n">target</span><span class="o">.</span><span class="n">shape</span>
<span class="go">((150, 4), (150,))</span>
<span class="gp">>>> </span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">iris</span><span class="o">.</span><span class="n">target</span><span class="p">)</span>
<span class="go">array([0, 1, 2])</span>
</pre></div>
</div>
<p>We split the data into training and test set:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">sklearn.cross_validation</span> <span class="kn">import</span> <span class="n">train_test_split</span>
<span class="gp">>>> </span><span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">iris</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="n">iris</span><span class="o">.</span><span class="n">target</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</pre></div>
</div>
<p>The Crammer-Singer model is implemented in <a class="reference internal" href="generated/pystruct.models.MultiClassClf.html#pystruct.models.MultiClassClf" title="pystruct.models.MultiClassClf"><code class="xref py py-class docutils literal"><span class="pre">models.MultiClassClf</span></code></a>.
As this is a simple multi-class classification task, we can pass in training data
as numpy arrays of shape <code class="docutils literal"><span class="pre">(n_samples,</span> <span class="pre">n_features)</span></code> and training labels as
numpy array of shape (n_samples,) with classes from 0 to 2.</p>
<p>For training, we pick the learner <a class="reference internal" href="generated/pystruct.learners.NSlackSSVM.html#pystruct.learners.NSlackSSVM" title="pystruct.learners.NSlackSSVM"><code class="xref py py-class docutils literal"><span class="pre">learners.NSlackSSVM</span></code></a>, which works
well with few samples and requires little tuning:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.learners</span> <span class="kn">import</span> <span class="n">NSlackSSVM</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.models</span> <span class="kn">import</span> <span class="n">MultiClassClf</span>
<span class="gp">>>> </span><span class="n">clf</span> <span class="o">=</span> <span class="n">NSlackSSVM</span><span class="p">(</span><span class="n">MultiClassClf</span><span class="p">())</span>
</pre></div>
</div>
<p>The learner has the same interface as a scikit-learn estimator:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">clf</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
<span class="go">NSlackSSVM(C=1.0, batch_size=100, break_on_bad=False, check_constraints=True,</span>
<span class="go"> inactive_threshold=1e-05, inactive_window=50, logger=None,</span>
<span class="go"> max_iter=100, model=MultiClassClf(n_features=4, n_classes=3),</span>
<span class="go"> n_jobs=1, negativity_constraint=None, show_loss_every=0,</span>
<span class="go"> switch_to=None, tol=0.001, verbose=0)</span>
<span class="gp">>>> </span><span class="n">clf</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
<span class="go">array([2, 1, 0, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 1, 0,</span>
<span class="go"> 0, 2, 0, 0, 1, 1, 0, 2, 2, 0, 2, 2, 1, 0, 2, 1, 1, 2, 0, 2, 0, 0, 1,</span>
<span class="go"> 2, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 2])</span>
<span class="gp">>>> </span><span class="n">clf</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span>
<span class="go">0.96...</span>
</pre></div>
</div>
<div class="section" id="details-on-the-implementation">
<h2>Details on the implementation<a class="headerlink" href="#details-on-the-implementation" title="Permalink to this headline">¶</a></h2>
<p>For this simple model, the <code class="docutils literal"><span class="pre">joint_feature(x,</span> <span class="pre">y)</span></code> is a vector of size <code class="docutils literal"><span class="pre">n_features</span> <span class="pre">*</span> <span class="pre">n_classes</span></code>,
which corresponds to one copy of the input features for each possibly class.
For any given pair <code class="docutils literal"><span class="pre">(x,</span> <span class="pre">y)</span></code> the features in <code class="docutils literal"><span class="pre">x</span></code> will be put at the position corresponding
to the class in <code class="docutils literal"><span class="pre">y</span></code>.
Correspondingly, the weights that are learned are one vector of length <code class="docutils literal"><span class="pre">n_features</span></code> for each class:
<code class="docutils literal"><span class="pre">w</span> <span class="pre">=</span> <span class="pre">np.hstack([w_class_0,</span> <span class="pre">...,</span> <span class="pre">w_class_1])</span></code>.</p>
<p>For this simple model, and inference is just the argmax over the inner product with each of these <code class="docutils literal"><span class="pre">w_class_i</span></code>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">y_pred</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">w</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">n_features</span><span class="p">),</span> <span class="n">x</span><span class="p">))</span>
</pre></div>
</div>
<p>To perform max-margin learning, we also need the loss-augmented inference. PyStruct has an optimized version,
but a pure python version would look like this:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">w</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">n_features</span><span class="p">),</span> <span class="n">x</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">scores</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">n_classes</span><span class="p">)</span> <span class="o">!=</span> <span class="n">y</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="gp">>>> </span><span class="n">y_pred</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
</pre></div>
</div>
<p>Essentialy the response (score / energy) of wrong label is down weighted by 1, the loss of doing an incorrect prediction.</p>
</div>
</div>
<div class="section" id="multi-label-svm">
<span id="id3"></span><h1>Multi-label SVM<a class="headerlink" href="#multi-label-svm" title="Permalink to this headline">¶</a></h1>
<p>A multi-label classification task is one where each sample can be labeled with any number of classes.
In other words, there are n_classes many binary labels, each indicating whether a sample belongs
to a given class or not. This could be treated as n_classes many independed binary classification
problems, as the scikit-learn OneVsRest classifier does.
However, it might be beneficial to exploit correlations between labels to achieve better generalization.</p>
<p>In the scene classification dataset, each sample is a picture of an outdoor scene,
representated using simple color aggregation. The labels characterize the kind of scene, which can be
“beach”, “sunset”, “fall foilage”, “field”, “mountain” or “urban”. Each image can belong to multiple classes,
such as “fall foilage” and “field”. Clearly some combinations are more likely than others.</p>
<p>We could try to model all possible combinations, which would result in a 2 ** 6
= 64 class multi-class classification problem. This would allow us explicitly model all correlations between labels,
but it would prevent us from predicting combinations that don’t appear in the training set.
Even if a combination did appear in the training set, the numer of samples in each class would be very small.
A compromise between modeling all correlations and modelling no correlations is modeling only pairwise correlations,
which is the approach implemented in <a class="reference internal" href="generated/pystruct.models.MultiLabelClf.html#pystruct.models.MultiLabelClf" title="pystruct.models.MultiLabelClf"><code class="xref py py-class docutils literal"><span class="pre">models.MultiLabelClf</span></code></a>.</p>
<p>The input to this model is similar to the <a class="reference internal" href="#multi-class-svm"><span>Multi-class SVM</span></a>, with the training data <code class="docutils literal"><span class="pre">X_train</span></code> simple
a numpy array of shape <code class="docutils literal"><span class="pre">(n_samples,</span> <span class="pre">n_features)</span></code> and the training labels a binary indicator matrix
of shape <code class="docutils literal"><span class="pre">(n_samples,</span> <span class="pre">n_classes)</span></code>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.datasets</span> <span class="kn">import</span> <span class="n">load_scene</span>
<span class="gp">>>> </span><span class="n">scene</span> <span class="o">=</span> <span class="n">load_scene</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span> <span class="o">=</span> <span class="n">scene</span><span class="p">[</span><span class="s">'X_train'</span><span class="p">],</span> <span class="n">scene</span><span class="p">[</span><span class="s">'X_test'</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <span class="n">scene</span><span class="p">[</span><span class="s">'y_train'</span><span class="p">],</span> <span class="n">scene</span><span class="p">[</span><span class="s">'y_test'</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">X_train</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(1211, 294)</span>
<span class="gp">>>> </span><span class="n">y_train</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(1211, 6)</span>
</pre></div>
</div>
<p>We use the <a class="reference internal" href="generated/pystruct.learners.NSlackSSVM.html#pystruct.learners.NSlackSSVM" title="pystruct.learners.NSlackSSVM"><code class="xref py py-class docutils literal"><span class="pre">learners.NSlackSSVM</span></code></a> learner, passing it the <a class="reference internal" href="generated/pystruct.models.MultiLabelClf.html#pystruct.models.MultiLabelClf" title="pystruct.models.MultiLabelClf"><code class="xref py py-class docutils literal"><span class="pre">models.MultiLabelClf</span></code></a> model:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.learners</span> <span class="kn">import</span> <span class="n">NSlackSSVM</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.models</span> <span class="kn">import</span> <span class="n">MultiLabelClf</span>
<span class="gp">>>> </span><span class="n">clf</span> <span class="o">=</span> <span class="n">NSlackSSVM</span><span class="p">(</span><span class="n">MultiLabelClf</span><span class="p">())</span>
</pre></div>
</div>
<p>Training looks as before, only that <code class="docutils literal"><span class="pre">y_train</span></code> is now a matrix:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="c"># clf.fit(X_train, y_train)</span>
<span class="gp">>>> </span><span class="c"># clf.predict(X_test)</span>
<span class="gp">>>> </span><span class="c"># clf.score(X_test, y_test)</span>
</pre></div>
</div>
<p>With only 64 possible label-combinations, we can actually enumerate all states.
Unfortunately, in general, inference in a fully connected binary graph is in
gerneral NP-hard, so we might need to rely on approximate inference, like loopy
believe propagation or AD3.</p>
<p>An alternative to using approximate inference for larger numbers of labels is to not create a fully connected graph,
but restrict ourself to pairwise interactions on a tree over the labels. In the above example of outdoor scenes,
some labels might be informative about others, maybe a beach picture is likely to be of a sunset, while
an urban scene might have as many sunset as non-sunset samples. The optimum tree-structure for such a problem
can easily be found using the Chow-Liu tree, which is simply the maximum weight spanning tree over the graph, where
edge-weights are given by the mutual information between labels on the training set.
You can use the Chow-Liu tree method simply by specifying <code class="docutils literal"><span class="pre">edges="chow_liu"</span></code>.
This allows us to use efficient and exact max-product message passing for
inference:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">clf</span> <span class="o">=</span> <span class="n">NSlackSSVM</span><span class="p">(</span><span class="n">MultiLabelClf</span><span class="p">(</span><span class="n">edges</span><span class="o">=</span><span class="s">"chow_liu"</span><span class="p">))</span>
</pre></div>
</div>
<p>Training looks as before, only that <code class="docutils literal"><span class="pre">y_train</span></code> is now a matrix:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="c"># clf.fit(X_train, y_train)</span>
<span class="gp">>>> </span><span class="c"># clf.predict(X_test)</span>
<span class="gp">>>> </span><span class="c"># clf.score(X_test, y_test)</span>
</pre></div>
</div>
<p>This model for multi-label classification with full connectivity is taken from the paper
T. Finley, T. Joachims, Training Structural SVMs when Exact Inference is Intractable.</p>
<div class="section" id="id4">
<h2>Details on the implementation<a class="headerlink" href="#id4" title="Permalink to this headline">¶</a></h2>
<p>The model creates a graph over <code class="docutils literal"><span class="pre">n_classes</span></code> binary nodes, together with edges
between each pair of classes. Each binary node has represents one class, and
therefor will get its own column in the weight-vector, similar to the
crammer-singer multi-class classification.</p>
<p>In addition, there is a pairwise weight between each pair of labels.
This leads to a feature function of this form:</p>
<p>The implementation of the inference for this model creates a graph with unary
potentials (given by the inner product of features and weights), and pairwise
potentials given by the pairwise weight. This graph is then passed to the
general graph-inference, which runs the selected algorithm.</p>
</div>
</div>
<div class="section" id="conditional-random-field-like-graph-models">
<h1>Conditional-Random-Field-like graph models<a class="headerlink" href="#conditional-random-field-like-graph-models" title="Permalink to this headline">¶</a></h1>
<p>The following models are all pairwise models over nodes, that is they model a
labeling of a graph, using features at the nodes, and relation between
neighboring nodes. The main assumption in these models in PyStruct is that
nodes are homogeneous, that is they all have the same meaning. That means that
each node has the same number of classes, and these classes mean the same
thing. In practice that means that weights are shared across all nodes and
edges, and the model adapts via features.
This is in contrast to the <a class="reference internal" href="generated/pystruct.models.MultiLabelClf.html#pystruct.models.MultiLabelClf" title="pystruct.models.MultiLabelClf"><code class="xref py py-class docutils literal"><span class="pre">models.MultiLabelClf</span></code></a>, which builds a binary graph
were nodes mean different things (each node represents a different class), so
they do not share weights.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">I call these models Conditional Random Fields (CRFs), but this a slight abuse of notation,
as PyStruct actually implements perceptron and max-margin learning, not maximum likelihood learning.
So these models might better be called Maximum Margin Random Fields. However, in the computer vision
community, it seems most pairwise models are called CRFs, independent of the method of training.</p>
</div>
<div class="section" id="chaincrf">
<span id="chain-crf"></span><h2>ChainCRF<a class="headerlink" href="#chaincrf" title="Permalink to this headline">¶</a></h2>
<p>One of the most common use-cases for structured prediction is chain-structured
outputs, implemented in <a class="reference internal" href="generated/pystruct.models.ChainCRF.html#pystruct.models.ChainCRF" title="pystruct.models.ChainCRF"><code class="xref py py-class docutils literal"><span class="pre">models.ChainCRF</span></code></a>. These occur naturaly in
sequence labeling tasks, such as Part-of-Speech tagging or named entity
recognition in natural language processing, or segmentation and phoneme
recognition in speech processing.</p>
<p>As an example dataset, we will use the toy OCR dataset letters. In this
dataset, each sample is a handwritten word, segmented into letters. This
dataset has a slight oddity, in that the first letter of every word was
removed, as it was capitalized, and therefore different from all the other
letters.</p>
<p>Each letter is a node in our chain, and neighboring letters are connected with
an edge. The length of the chain varies with the number of letters in the
word. As in all CRF-like models, the nodes in <a class="reference internal" href="generated/pystruct.models.ChainCRF.html#pystruct.models.ChainCRF" title="pystruct.models.ChainCRF"><code class="xref py py-class docutils literal"><span class="pre">models.ChainCRF</span></code></a> all have
the same meaning and share parameters.</p>
<p>The letters dataset comes with prespecified folds, we take one fold to be the
training set, and the rest to be the test set, as in <a class="reference external" href="http://papers.nips.cc/paper/2397-max-margin-markov-networks.pdf">Max-Margin Markov
Networks</a>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.datasets</span> <span class="kn">import</span> <span class="n">load_letters</span>
<span class="gp">>>> </span><span class="n">letters</span> <span class="o">=</span> <span class="n">load_letters</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">folds</span> <span class="o">=</span> <span class="n">letters</span><span class="p">[</span><span class="s">'data'</span><span class="p">],</span> <span class="n">letters</span><span class="p">[</span><span class="s">'labels'</span><span class="p">],</span> <span class="n">letters</span><span class="p">[</span><span class="s">'folds'</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">X</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">folds</span> <span class="o">==</span> <span class="mi">1</span><span class="p">],</span> <span class="n">X</span><span class="p">[</span><span class="n">folds</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <span class="n">y</span><span class="p">[</span><span class="n">folds</span> <span class="o">==</span> <span class="mi">1</span><span class="p">],</span> <span class="n">y</span><span class="p">[</span><span class="n">folds</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">]</span>
<span class="gp">>>> </span><span class="nb">len</span><span class="p">(</span><span class="n">X_train</span><span class="p">)</span>
<span class="go">704</span>
<span class="gp">>>> </span><span class="nb">len</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
<span class="go">6173</span>
</pre></div>
</div>
<p>The training data is a array of samples, where each sample is a numpy array of
shape <code class="docutils literal"><span class="pre">(n_nodes,</span> <span class="pre">n_features)</span></code>. Here n_nodes is the length of the input
sequence, that is the length of the word in our case. That means the input
array actually has dtype object. We can not store the features in a simple
array, as the input sequences can have different length:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">X_train</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(9, 128)</span>
<span class="gp">>>> </span><span class="n">y_train</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(9,)</span>
<span class="gp">>>> </span><span class="n">X_train</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(7, 128)</span>
<span class="gp">>>> </span><span class="n">y_train</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(7,)</span>
</pre></div>
</div>
<p>Edges don’t need to be specified, as the input features are assumed to be in
the order of the nodes in the chain.</p>
<p>The default inference method is max-product message passing on the chain (aka
viterbi), which is always exact and efficientl. We use the
<a class="reference internal" href="generated/pystruct.learners.FrankWolfeSSVM.html#pystruct.learners.FrankWolfeSSVM" title="pystruct.learners.FrankWolfeSSVM"><code class="xref py py-class docutils literal"><span class="pre">learners.FrankWolfeSSVM</span></code></a>, which is a very efficient learner when
inference is fast:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.models</span> <span class="kn">import</span> <span class="n">ChainCRF</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.learners</span> <span class="kn">import</span> <span class="n">FrankWolfeSSVM</span>
<span class="gp">>>> </span><span class="n">model</span> <span class="o">=</span> <span class="n">ChainCRF</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">ssvm</span> <span class="o">=</span> <span class="n">FrankWolfeSSVM</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span> <span class="n">C</span><span class="o">=.</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">ssvm</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
<span class="go">FrankWolfeSSVM(C=0.1, batch_mode=False, check_dual_every=10,</span>
<span class="go"> do_averaging=True, line_search=True, logger=None, max_iter=10,</span>
<span class="go"> model=ChainCRF(n_states: 26, inference_method: max-product),</span>
<span class="go"> n_jobs=1, random_state=None, sample_method='perm',</span>
<span class="go"> show_loss_every=0, tol=0.001, verbose=0)</span>
<span class="gp">>>> </span><span class="n">ssvm</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span>
<span class="go">0.78...</span>
</pre></div>
</div>
</div>
<div class="section" id="id5">
<h2>Details on the implementation<a class="headerlink" href="#id5" title="Permalink to this headline">¶</a></h2>
<p>The unary potentials in each node are given as the inner product of the features
at this node (the input image) with the weights (which are shared over all nodes):</p>
<p>The pairwise potentials are identical over the whole chain and given simply by
the weights:</p>
<p>In principle it is possible to also use feature in the pairwise potentials.
This is not implemented in the <a class="reference internal" href="generated/pystruct.models.ChainCRF.html#pystruct.models.ChainCRF" title="pystruct.models.ChainCRF"><code class="xref py py-class docutils literal"><span class="pre">models.ChainCRF</span></code></a>, but can be done using
<a class="reference internal" href="#edge-feature-graph-crf"><span>EdgeFeatureGraphCRF</span></a>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">While pystruct is able to work with chain CRFs, it is not explicitly built
with these in mind, and there are libraries that optimize much more for
this special case, such as <a class="reference external" href="https://github.com/larsmans/seqlearn">seqlearn</a> and <a class="reference external" href="http://taku910.github.io/crfpp/">CRF++</a>.</p>
</div>
</div>
<div class="section" id="graphcrf">
<span id="graph-crf"></span><h2>GraphCRF<a class="headerlink" href="#graphcrf" title="Permalink to this headline">¶</a></h2>
<p>The <a class="reference internal" href="generated/pystruct.models.GraphCRF.html#pystruct.models.GraphCRF" title="pystruct.models.GraphCRF"><code class="xref py py-class docutils literal"><span class="pre">models.GraphCRF</span></code></a> model is a generalization of the <a class="reference internal" href="#chain-crf"><span>ChainCRF</span></a> to
arbitray graphs. While in the chain model, the direction of the edge is
usually important, for many graphs, the direction of the edge has no semantic
meaning. Therefore, by default, the pairwise interaction matrix of the
<a class="reference internal" href="generated/pystruct.models.GraphCRF.html#pystruct.models.GraphCRF" title="pystruct.models.GraphCRF"><code class="xref py py-class docutils literal"><span class="pre">models.GraphCRF</span></code></a> is forced to be symmetric.</p>
<p>Each training sample for the <a class="reference internal" href="generated/pystruct.models.GraphCRF.html#pystruct.models.GraphCRF" title="pystruct.models.GraphCRF"><code class="xref py py-class docutils literal"><span class="pre">models.GraphCRF</span></code></a> is a tuple <code class="docutils literal"><span class="pre">(features,</span>
<span class="pre">edges)</span></code>, where <code class="docutils literal"><span class="pre">features</span></code> is a numpy array of node-features (of shape
<code class="docutils literal"><span class="pre">(n_nodes,</span> <span class="pre">n_features)</span></code>), and <code class="docutils literal"><span class="pre">edges</span></code> is a array of edges between nodes, of
shape <code class="docutils literal"><span class="pre">(n_edges,</span> <span class="pre">2)</span></code>. Each row of the edge array are the indices of the two
nodes connected by the edge, starting from zero.</p>
<p>To reproduce the <a class="reference internal" href="generated/pystruct.models.ChainCRF.html#pystruct.models.ChainCRF" title="pystruct.models.ChainCRF"><code class="xref py py-class docutils literal"><span class="pre">models.ChainCRF</span></code></a> model above with <a class="reference internal" href="generated/pystruct.models.GraphCRF.html#pystruct.models.GraphCRF" title="pystruct.models.GraphCRF"><code class="xref py py-class docutils literal"><span class="pre">models.GraphCRF</span></code></a>, we can simply
generate the indices of a chain:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">features</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">folds</span> <span class="o">=</span> <span class="n">letters</span><span class="p">[</span><span class="s">'data'</span><span class="p">],</span> <span class="n">letters</span><span class="p">[</span><span class="s">'labels'</span><span class="p">],</span> <span class="n">letters</span><span class="p">[</span><span class="s">'folds'</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">features</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">features</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">features_train</span><span class="p">,</span> <span class="n">features_test</span> <span class="o">=</span> <span class="n">features</span><span class="p">[</span><span class="n">folds</span> <span class="o">==</span> <span class="mi">1</span><span class="p">],</span> <span class="n">features</span><span class="p">[</span><span class="n">folds</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <span class="n">y</span><span class="p">[</span><span class="n">folds</span> <span class="o">==</span> <span class="mi">1</span><span class="p">],</span> <span class="n">y</span><span class="p">[</span><span class="n">folds</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
<p>For a single word made out of 9 characters:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">features_0</span> <span class="o">=</span> <span class="n">features_train</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">features_0</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(9, 128)</span>
<span class="gp">>>> </span><span class="n">n_nodes</span> <span class="o">=</span> <span class="n">features_0</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">edges_0</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vstack</span><span class="p">([</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">n_nodes</span> <span class="o">-</span> <span class="mi">1</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">n_nodes</span><span class="p">)])</span>
<span class="gp">>>> </span><span class="n">edges_0</span>
<span class="go">array([[0, 1, 2, 3, 4, 5, 6, 7],</span>
<span class="go"> [1, 2, 3, 4, 5, 6, 7, 8]])</span>
<span class="gp">>>> </span><span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">features_0</span><span class="p">,</span> <span class="n">edges_0</span><span class="p">)</span>
</pre></div>
</div>
<p>For the whole training dataset:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">f_t</span> <span class="o">=</span> <span class="n">features_train</span>
<span class="gp">>>> </span><span class="n">X_train</span> <span class="o">=</span> <span class="p">[(</span><span class="n">features_i</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">vstack</span><span class="p">([</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">features_i</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="mi">1</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">features_i</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])]))</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">features_i</span> <span class="ow">in</span> <span class="n">f_t</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">X_train</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="go">(array([[0, 0, 0, ..., 0, 0, 0],</span>
<span class="go"> [0, 0, 0, ..., 0, 0, 0],</span>
<span class="go"> [0, 0, 0, ..., 0, 0, 0],</span>
<span class="go"> ...,</span>
<span class="go"> [0, 0, 0, ..., 0, 0, 0],</span>
<span class="go"> [0, 0, 0, ..., 0, 0, 0],</span>
<span class="go"> [0, 0, 1, ..., 0, 1, 1]], dtype=uint8), array([[0, 1, 2, 3, 4, 5, 6, 7],</span>
<span class="go"> [1, 2, 3, 4, 5, 6, 7, 8]]))</span>
</pre></div>
</div>
<p>Now we can fit a (directed) <a class="reference internal" href="generated/pystruct.models.GraphCRF.html#pystruct.models.GraphCRF" title="pystruct.models.GraphCRF"><code class="xref py py-class docutils literal"><span class="pre">models.GraphCRF</span></code></a> on this data:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.models</span> <span class="kn">import</span> <span class="n">GraphCRF</span>
<span class="gp">>>> </span><span class="kn">from</span> <span class="nn">pystruct.learners</span> <span class="kn">import</span> <span class="n">FrankWolfeSSVM</span>
<span class="gp">>>> </span><span class="n">model</span> <span class="o">=</span> <span class="n">GraphCRF</span><span class="p">(</span><span class="n">directed</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">inference_method</span><span class="o">=</span><span class="s">"max-product"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">ssvm</span> <span class="o">=</span> <span class="n">FrankWolfeSSVM</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span> <span class="n">C</span><span class="o">=.</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">ssvm</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
<span class="go">FrankWolfeSSVM(C=0.1, batch_mode=False, check_dual_every=10,</span>
<span class="go"> do_averaging=True, line_search=True, logger=None, max_iter=10,</span>
<span class="go"> model=GraphCRF(n_states: 26, inference_method: max-product),</span>
<span class="go"> n_jobs=1, random_state=None, sample_method='perm',</span>
<span class="go"> show_loss_every=0, tol=0.001, verbose=0)</span>
</pre></div>
</div>
</div>
<div class="section" id="id6">
<h2>Details on the implementation<a class="headerlink" href="#id6" title="Permalink to this headline">¶</a></h2>
<p>The potentials are the same as in the ChainCRF model, with unary potentials given
as a shared linear function of the features, and pairwise potentials the same
for all nodes.</p>
</div>
<div class="section" id="edgefeaturegraphcrf">
<span id="edge-feature-graph-crf"></span><h2>EdgeFeatureGraphCRF<a class="headerlink" href="#edgefeaturegraphcrf" title="Permalink to this headline">¶</a></h2>
<p>This model is the most general of the CRF models, and contains all other CRF
models as a special case. This model assumes again that the parameters of the
potentials are shared over all nodes and over all edges, but the pairwise
potentials are now also computed as a linear function of the features.</p>
<p>Each training sample for <a class="reference internal" href="generated/pystruct.models.EdgeFeatureGraphCRF.html#pystruct.models.EdgeFeatureGraphCRF" title="pystruct.models.EdgeFeatureGraphCRF"><code class="xref py py-class docutils literal"><span class="pre">models.EdgeFeatureGraphCRF</span></code></a> is a tuple
<code class="docutils literal"><span class="pre">(node_features,</span> <span class="pre">edges,</span> <span class="pre">edge_features)</span></code>, where <code class="docutils literal"><span class="pre">node_features</span></code> is a numpy
array of node-features (of shape <code class="docutils literal"><span class="pre">(n_nodes,</span> <span class="pre">n_node_features)</span></code>), <code class="docutils literal"><span class="pre">edges</span></code> is
a array of edges between nodes, of shape <code class="docutils literal"><span class="pre">(n_edges,</span> <span class="pre">2)</span></code> as in
<a class="reference internal" href="#graph-crf"><span>GraphCRF</span></a>, and <code class="docutils literal"><span class="pre">edge_features</span></code> is a feature for each edge, given as a
numpy array of shape <code class="docutils literal"><span class="pre">(n_edges,</span> <span class="pre">n_edge_features)</span></code>.</p>
<p>The edge features allow the pairwise interactions to be modulated by the
context. Two features important for image segmentation, for example, are color
differences between the (super)pixels at given nodes, and whether one is above
the other. If two neighboring nodes correspond to regions of simlar color,
they are more likely to have the same label. For the vertical direction, a node
above a node representing “sky” is more likely to also represent “sky” than
“water”.</p>
<p>A great example of the importance of edge features is
<a class="reference internal" href="auto_examples/plot_snakes.html#sphx-glr-auto-examples-plot-snakes-py"><span>Conditional Interactions on the Snakes Dataset</span></a>.</p>
</div>
</div>
<div class="section" id="latent-variable-models">
<h1>Latent Variable Models<a class="headerlink" href="#latent-variable-models" title="Permalink to this headline">¶</a></h1>
<p>Latent variable models are models that involve interactions with variables that
are not observed during training. These are often modelling a “hidden cause” of
the data, which might make it easier to learn about the actual observations.</p>
<p>Latent variable models are usually much harder to fit than fully observed
models, and require fitting using either <a class="reference internal" href="generated/pystruct.learners.LatentSSVM.html#pystruct.learners.LatentSSVM" title="pystruct.learners.LatentSSVM"><code class="xref py py-class docutils literal"><span class="pre">learners.LatentSSVM</span></code></a>, or
<code class="xref py py-class docutils literal"><span class="pre">learners.LatentSubgradientSSVM</span></code>. <a class="reference internal" href="generated/pystruct.learners.LatentSSVM.html#pystruct.learners.LatentSSVM" title="pystruct.learners.LatentSSVM"><code class="xref py py-class docutils literal"><span class="pre">learners.LatentSSVM</span></code></a> alternates between
inferring the unobserved variables with fitting any of the other SSVM models
(such as <a class="reference internal" href="generated/pystruct.learners.OneSlackSSVM.html#pystruct.learners.OneSlackSSVM" title="pystruct.learners.OneSlackSSVM"><code class="xref py py-class docutils literal"><span class="pre">learners.OneSlackSSVM</span></code></a>). Each iteration of this alternation is as
expensive as building a fully observed model, and good initialization can be
very important. This method was published in <a class="reference external" href="http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf">Learning Structural SVMs with
Latent Variables</a>.</p>
<p>The <code class="xref py py-class docutils literal"><span class="pre">learners.LatentSubgradientSSVM</span></code> approach tries to reestimate the latent
variables for each batch, and corresponds to a subgradient descent on the
non-convex objective involving the maximization over hidden variables. I am
unaware of any literature on this approach.</p>
<div class="section" id="latentgraphcrf-aka-hidden-dynamics-crf">
<h2>LatentGraphCRF aka Hidden Dynamics CRF<a class="headerlink" href="#latentgraphcrf-aka-hidden-dynamics-crf" title="Permalink to this headline">¶</a></h2>
<p><a class="reference internal" href="generated/pystruct.models.LatentGraphCRF.html#pystruct.models.LatentGraphCRF" title="pystruct.models.LatentGraphCRF"><code class="xref py py-class docutils literal"><span class="pre">models.LatentGraphCRF</span></code></a> implements the “Hidden Dynamics CRF” approach.
Here, each output state is split into several hidden sub-states, which allows for
more complex interactions.</p>
<p>This can be seen as a structured variant of the latent SVM approach as follows:
If there is a single node in the graph (that is doing multi-class
classification), we introduce latent subclasses for each of the target classes.
We can then learn a separate classifier for each of the subclasses, which might
be easier. An example is given in <a class="reference internal" href="auto_examples/plot_latent_svm_as_crf.html#sphx-glr-auto-examples-plot-latent-svm-as-crf-py"><span>Latent SVM for odd vs. even digit classification</span></a>,
where images of odd numbers are classified against images of even numbers. It
is much easier to learn a linear classifier that separates one digit from the
other digits, than trying to learn a linear separation between even and odd
digits.</p>
<p>For more complex graphs, not only the unary potentials benefit, but also the
pairwise potentials, which are now between substates.
The original paper motivates this extension by action recognition.
A complex action like a juming jack is made up of several distinct sub-actions,
and there is a distinct order in which the sub-actions are performed.
The latent dynamic CRF can learn this order.</p>
<p>See <a class="reference internal" href="auto_examples/plot_latent_crf.html#sphx-glr-auto-examples-plot-latent-crf-py"><span>Latent Dynamics CRF</span></a> for an example on a 2d grid.</p>
</div>
</div>
<div class="section" id="how-to-write-your-own-model">
<h1>How to Write Your Own Model<a class="headerlink" href="#how-to-write-your-own-model" title="Permalink to this headline">¶</a></h1>
<p>TODO</p>
</div>
<div class="section" id="tips-on-choosing-a-learner">
<h1>Tips on Choosing a Learner<a class="headerlink" href="#tips-on-choosing-a-learner" title="Permalink to this headline">¶</a></h1>
<p>There is an extensive benchmarking in my thesis, TODO.</p>
<p>SubgradientSSVM : Good for many datapoints, fast inference. Usually worse than FrankWolfeSSVM, but takes less memory. Good for obtaining reasonable solutions fast.
NSlackSSVM : Good for very few datapoints (hundreds), slow inferece. Good for obtaining high precision solutions.
OneSlackSSVM : Good for mid-size data sets (thousands), slow inference. Good for obtaining high precision solutions.
FrankWolfeSSVM : Good for fast inference, large datasets. Good for obtaining reasonable solutions fast.</p>
</div>
<div class="section" id="tips-on-choosing-an-inference-algorithm">
<h1>Tips on Choosing an Inference Algorithm<a class="headerlink" href="#tips-on-choosing-an-inference-algorithm" title="Permalink to this headline">¶</a></h1>
<p>TODO</p>
</div>
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-43292385-1', 'pystruct.github.io');
ga('send', 'pageview');
</script>
<footer class="footer">
<div class="container">
<p class="pull-right">
<a href="#">Back to top</a>
</p>
<p>
© Copyright 2013, Andreas Mueller.<br/>
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.3.3.<br/>
</p>
</div>
</footer>
</body>
</html>