-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
356 lines (320 loc) · 14.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
<!DOCTYPE html>
<html>
<style>
table, th, td {
border:1.5px solid black;
}
img {
display: block;
margin-left: auto;
margin-right: auto;
}
p {
font-size: 1.5rem;
text-align: justify;
}
</style>
<head lang="en">
<meta charset="UTF-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>Self-Supervised-SPARE3D</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- <base href="/"> -->
<!-- <link rel="apple-touch-icon" href="apple-touch-icon.png"> -->
<link rel="icon" type="image/png" href="docs/figs/teaser_fig_cut.JPG.png">
<!-- Place favicon.ico in the root directory -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.css">
<link rel="stylesheet" href="app.css">
<link rel="stylesheet" href="bootstrap.min.css">
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-110862391-3"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'UA-110862391-3');
</script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.5.3/clipboard.min.js"></script>
<script src="app.js"></script>
</head>
<body>
<div class="container" id="main">
<div class="row">
<h1 class="col-md-12 text-center">
Self-supervised Spatial Reasoning on Multi-View Line Drawings </br>
<!-- <small>-->
<!-- CVPR 2022-->
<!-- </small>-->
</h1>
</div>
<div class="row">
<div class="col-md-12 text-center">
<ul class="list-inline">
<li>
<a>
Siyuan Xiang
</a><sup>1 *</sup>
</li>
<li>
<a href="https://github.com/endeleze">
Anbang Yang
</a><sup>1 *</sup>
</li>
<li>
<a>
Yanfei Xue
</a><sup>1</sup>
</li>
<li>
<a>
Yaoqing Yang
</a><sup>2</sup>
</li>
<li>
<a href="https://scholar.google.com/citations?user=YeG8ZM0AAAAJ">
Chen Feng
</a><sup>1 †</sup>
</li>
</ul>
<sup>1</sup>New York University Tandon School of Engineering, <sup>2</sup>University of California, Berkeley,
<br>
<sup>*</sup> Equal contributions.
<br>
<sup>†</sup> The corresponding author is <a href="cfeng@nyu.edu">Chen Feng</a>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-xs-4 col-sm-2 col-sm-offset-3 text-center">
<a href="https://ai4ce.github.io">
<image src="docs/ai4ce.jpg" height="50px"></image>
<br>
<h4><strong>Lab</strong></h4>
</a>
</div>
<div class="col-xs-4 col-sm-2 text-center">
<a href="https://arxiv.org/pdf/2104.13433.pdf">
<image src="docs/paper.png" height="50px"></image>
<br>
<h4><strong>Paper</strong></h4>
</a>
</div>
<div class="col-xs-4 col-sm-2 text-center">
<a href="https://github.com/ai4ce/Self-Supervised-SPARE3D">
<image src="docs/github.png" height="50px"></image>
<br>
<h4><strong>Codes</strong></h4>
</a>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Abstract
</h3>
<p>Spatial reasoning on multi-view line drawings by state-of-the-art supervised deep networks is recently shown with puzzling low performances on the SPARE3D dataset. Based on the fact that self-supervised learning is helpful when a large number of data are available, we propose two self-supervised learning approaches to improve the baseline performance for view consistency reasoning and camera pose reasoning tasks on the SPARE3D dataset. For the first task, we use a self-supervised binary classification network to contrast the line drawing differences between various views of any two similar 3D objects, enabling the trained networks to effectively learn detail-sensitive yet view-invariant line drawing representations of 3D objects. For the second type of task, we propose a self-supervised multi-class classification framework to train a model to select the correct corresponding view from which a line drawing is rendered. Our method is even helpful for the downstream tasks with unseen camera poses. Experiments show that our method could significantly increase the baseline performance in SPARE3D, while some popular self-supervised learning methods cannot.</p>
<br>
<br>
<image src="docs/figs/teaser_fig_cut.JPG" width="70%" class="center"></image>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Contrastive learning network for task <em>T2I</em>
</h3>
<p>Our contrastive learning network. We use the learned representations to the downstream task <em>T2I</em>. Front (F), Right (R), Top (T) represent line drawings and an isometric (I) line drawing. <math>a<sub>1</sub></math>, <math>b<sub>1</sub></math>,<math>a<sub>2</sub></math>, <math>b<sub>2</sub></math> are the encoded feature vectors; <math>C</math>, <math>K</math> are the dimension of the latent vectors. The <math>f<sub>θ<sub>1</sub></sub></math> , <math>f<sub>θ<sub>2</sub></sub></math> , <math>f<sub>θ<sub>3</sub></sub></math> , and <math>f<sub>θ<sub>4</sub></sub></math> are CNN networks; <math>g<sub>φ<sub>1</sub></sub></math>, <math>g<sub>φ<sub>2</sub></sub></math> , and <math>h<sub>ψ</sub></math> are MLP networks. <math>⊕</math> is a concatenation operation. BCE represents binary cross-entropy loss.</p>
<br>
<br>
<image src="docs/figs/Contrastive_Net.JPG" width="100%" class="center"></image>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Self-supervised learning network architecture for task <em>I2P</em> and <em>P2I</em>
</h3>
<p>Our self-supervised learning network (left subfigure). We use the learned representations to the downstream tasks <em>I2P</em> and <em>P2I</em> (right subfigure). The <math>f<sub>η<sub>1</sub></sub></math>, <math>f<sub>η<sub>2</sub></sub></math>, <math>f<sub>η<sub>3</sub></sub></math>, and <math>f<sub>η<sub>4</sub></sub></math> are CNN networks; <math>g<sub>ω<sub>1</sub></sub></math>, <math>g<sub>ω<sub>2</sub></sub></math>, and <math>g<sub>ω<sub>3</sub></sub></math> are MLP networks; <math>c<sub>1</sub></math>,<math>d<sub>1</sub></math>,<math>e<sub>1</sub></math> are the encoded feature vectors.</p>
<br>
<br>
<image src="docs/figs/Self-Supervise_Net.JPG" width="100%" class="center"></image>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Comparison of performance on <em>T2I</em> for SL method vs. SSL method.
</h3>
<p>SL and SSL represent supervised learning and self-supervised learning, respectively. 5K and 14K are the training data amount. Fine-tuning means we further use the 5K training data in SPARE3D to fine-tune the parameters. For supervised learning, we evaluate the network performance for: (1) using <em>early fusion</em> or <em>late fusion</em> structure, (2) whether or not using ImageNet pre-trained parameters.</p>
<br>
<br>
<table style="width:100%">
<tr>
<th>SL</th>
<th>early-fusion(5K)</th>
<th>early-fusion(pretrained, 5K)</th>
<th>late-fusion(pretrained , 5K)</th>
<th>early-fusion(14K)</th>
<th>early-fusion(pretrained, 14K)</th>
<th>late-fusion(pretrained, 14K)</th>
</tr>
<tr>
<td></td>
<td>55.0</td>
<td>30.6</td>
<td>25.2</td>
<td>63.6</td>
<td>51.4</td>
<td>27.4</td>
</tr>
<tr>
<th>SSL</th>
<th>Jigsaw puzzle</th>
<th>Colorization</th>
<th>SimCLR</th>
<th>RotNet</th>
<th>Ours (NT-Xent loss)</th>
<th>Ours (BCE loss)</th>
</tr>
<tr>
<td></td>
<td>27.4</td>
<td>23.4</td>
<td>31.0</td>
<td>30.6</td>
<td>48.4</td>
<td><strong>74.9</strong></td>
</tr>
</table>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Comparison of performance on <em>I2P</em> and <em>P2I</em> tasks for SL method vs. SSL method.
</h3>
<p>With the increase of data amount for training, both supervised learning, and our self-supervised learning-based method achieve higher accuracy. For task <em>I2P</em>, the best accuracy achieves <math>98.0 %</math>, and for task <em>P2I</em>, the best accuracy is <math>83.4 %</math>. The best performance happens when using the <math>40,000</math> scale dataset with our self-supervised learning method. </p>
<br>
<br>
<table style="width:100%">
<tr>
<th>Data amount (K)</th>
<th>5</th>
<th>10</th>
<th>15</th>
<th>20</th>
<th>25</th>
<th>30</th>
<th>35</th>
<th>40</th>
</tr>
<tr>
<th>I2P(SL)</th>
<td>83.6</td>
<td>86.4</td>
<td>87.7</td>
<td>88.5</td>
<td>88.7</td>
<td>90.4</td>
<td>90.6</td>
<td>91.1</td>
</tr>
<tr>
<th>I2P(SSL)</th>
<td>88.7</td>
<td>93.2</td>
<td>95.1</td>
<td>96.4</td>
<td>96.7</td>
<td>97.7</td>
<td>97.5</td>
<td><strong>98.0</strong></td>
</tr>
<tr>
<th>P2I(SL)</th>
<td>65.4</td>
<td>67.1</td>
<td>68.5</td>
<td>67.8</td>
<td>69.8</td>
<td>69.6</td>
<td>68.5</td>
<td>70.4</td>
</tr>
<tr>
<th>P2I(SSL)</th>
<td>72.4</td>
<td>80.8</td>
<td>81.9</td>
<td>82.1</td>
<td>82.8</td>
<td>83.1</td>
<td>83.0</td>
<td><strong>83.4</strong></td>
</tr>
</table>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Attention maps for SL vs. SSL method in <em>T2I</em> task.
</h3>
<p>For each CAD model, the first row are the line drawings. The second and third row are the attention maps generated from supervised learning using <em>early fusion</em> and <em>late fusion</em>, respectively. The fourth row are the attention maps generated from our method. N/A indicates no attention map for the corresponding view. Best viewed in color.</p>
<br>
<br>
<image src="docs/figs/attention_example.JPG" width="100%" class="center"></image>
</div>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Acknowledgement
</h3>
<p>The first two authors contributed equally. Chen Feng is the corresponding author. The research is supported by NSF Future Manufacturing program under EEC-2036870. Siyuan Xiang gratefully thanks the IDC Foundation for its scholarship.</p>
</div>
<br>
<br>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
BibTeX
</h3>
<div class="form-group col-md-10 col-md-offset-1">
<textarea id="bibtex" class="form-control" readonly>
@article{xiang2021contrastive,
title={Contrastive Spatial Reasoning on Multi-View Line Drawings},
author={Xiang, Siyuan and Yang, Anbang and Xue, Yanfei and Yang, Yaoqing and Feng, Chen},
journal={arXiv preprint arXiv:2104.13433},
year={2021}
}
</textarea>
</div>
</div>
</div>
</div>
</body>
</html>