index.html

<!DOCTYPE html>
<html>

<style>
table, th, td {
  border:1.5px solid black;
}
img {
  display: block;
  margin-left: auto;
  margin-right: auto;
}
p {
  font-size: 1.5rem;
  text-align: justify;
  }
</style>
    
<head lang="en">
    <meta charset="UTF-8">
    <meta http-equiv="x-ua-compatible" content="ie=edge">

    <title>Self-Supervised-SPARE3D</title>

    <meta name="description" content="">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <!-- <base href="/"> -->

    <!--     <link rel="apple-touch-icon" href="apple-touch-icon.png"> -->
    <link rel="icon" type="image/png" href="docs/figs/teaser_fig_cut.JPG.png">
    <!-- Place favicon.ico in the root directory -->

    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.css">
    <link rel="stylesheet" href="app.css">

    <link rel="stylesheet" href="bootstrap.min.css">

    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=UA-110862391-3"></script>
    <script>
        window.dataLayer = window.dataLayer || [];

        function gtag() {
            dataLayer.push(arguments);
        }

        gtag('js', new Date());

        gtag('config', 'UA-110862391-3');
    </script>

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.5.3/clipboard.min.js"></script>

    <script src="app.js"></script>
</head>

<body>
<div class="container" id="main">
    <div class="row">
        <h1 class="col-md-12 text-center">
            Self-supervised Spatial Reasoning on Multi-View Line Drawings </br>
<!--            <small>-->
<!--                CVPR 2022-->
<!--            </small>-->
        </h1>
    </div>

    <div class="row">
        <div class="col-md-12 text-center">
            <ul class="list-inline">
                <li>
                    <a>
                        Siyuan Xiang
                    </a><sup>1 *</sup>
                </li>
                <li>
                    <a href="https://github.com/endeleze">
                        Anbang Yang
                    </a><sup>1 *</sup>
                </li>
                <li>
                    <a>
                        Yanfei Xue
                    </a><sup>1</sup>
                </li>
                <li>
                    <a>
                        Yaoqing Yang
                    </a><sup>2</sup>
                </li>
                <li>
                    <a href="https://scholar.google.com/citations?user=YeG8ZM0AAAAJ">
                        Chen Feng
                    </a><sup>1 &dagger;</sup>
                </li>
            </ul>
            <sup>1</sup>New York University Tandon School of Engineering, <sup>2</sup>University of California, Berkeley,
            <br>
            <sup>*</sup> Equal contributions.
            <br>
            <sup>&dagger;</sup> The corresponding author is <a href="cfeng@nyu.edu">Chen Feng</a>
        </div>
    </div>

    <br>
    <br>

    <div class="row">
        <div class="col-xs-4 col-sm-2 col-sm-offset-3 text-center">
            <a href="https://ai4ce.github.io">
                <image src="docs/ai4ce.jpg" height="50px"></image>
                <br>
                <h4><strong>Lab</strong></h4>
            </a>
        </div>
        <div class="col-xs-4 col-sm-2 text-center">
            <a href="https://arxiv.org/pdf/2104.13433.pdf">
                <image src="docs/paper.png" height="50px"></image>
                <br>
                <h4><strong>Paper</strong></h4>
            </a>
        </div>
        <div class="col-xs-4 col-sm-2 text-center">
            <a href="https://github.com/ai4ce/Self-Supervised-SPARE3D">
                <image src="docs/github.png" height="50px"></image>
                <br>
                <h4><strong>Codes</strong></h4>
            </a>
        </div>
    </div>


    <br>
    <br>

    <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
                Abstract
            </h3>
            <p>Spatial reasoning on multi-view line drawings by state-of-the-art supervised deep networks is recently shown with puzzling low performances on the SPARE3D dataset. Based on the fact that self-supervised learning is helpful when a large number of data are available, we propose two self-supervised learning approaches to improve the baseline performance for view consistency reasoning and camera pose reasoning tasks on the SPARE3D dataset. For the first task, we use a self-supervised binary classification network to contrast the line drawing differences between various views of any two similar 3D objects, enabling the trained networks to effectively learn detail-sensitive yet view-invariant line drawing representations of 3D objects. For the second type of task, we propose a self-supervised multi-class classification framework to train a model to select the correct corresponding view from which a line drawing is rendered. Our method is even helpful for the downstream tasks with unseen camera poses. Experiments show that our method could significantly increase the baseline performance in SPARE3D, while some popular self-supervised learning methods cannot.</p>
            <br>
            <br>
            <image src="docs/figs/teaser_fig_cut.JPG" width="70%" class="center"></image>
        </div>
    </div>


    <br>
    <br>

    <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
                Contrastive learning network for task <em>T2I</em>
            </h3>
            <p>Our contrastive learning network. We use the learned representations to the downstream task <em>T2I</em>. Front (F), Right (R), Top (T) represent line drawings and an isometric (I) line drawing. <math>a<sub>1</sub></math>, <math>b<sub>1</sub></math>,<math>a<sub>2</sub></math>, <math>b<sub>2</sub></math> are the encoded feature vectors; <math>C</math>, <math>K</math> are the dimension of the latent vectors. The <math>f<sub>&theta;<sub>1</sub></sub></math> , <math>f<sub>&theta;<sub>2</sub></sub></math> , <math>f<sub>&theta;<sub>3</sub></sub></math> , and <math>f<sub>&theta;<sub>4</sub></sub></math> are CNN networks; <math>g<sub>&phi;<sub>1</sub></sub></math>, <math>g<sub>&phi;<sub>2</sub></sub></math> , and <math>h<sub>&psi;</sub></math> are MLP networks. <math>&oplus;</math> is a concatenation operation. BCE represents binary cross-entropy loss.</p>
            <br>
            <br>
            <image src="docs/figs/Contrastive_Net.JPG" width="100%" class="center"></image>
        </div>
    </div>


    <br>
    <br>

    <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
                Self-supervised learning network architecture for task <em>I2P</em> and <em>P2I</em>
            </h3>
            <p>Our self-supervised learning network (left subfigure). We use the learned representations to the downstream tasks <em>I2P</em> and <em>P2I</em> (right subfigure). The <math>f<sub>&eta;<sub>1</sub></sub></math>, <math>f<sub>&eta;<sub>2</sub></sub></math>, <math>f<sub>&eta;<sub>3</sub></sub></math>, and <math>f<sub>&eta;<sub>4</sub></sub></math> are CNN networks; <math>g<sub>&omega;<sub>1</sub></sub></math>, <math>g<sub>&omega;<sub>2</sub></sub></math>, and <math>g<sub>&omega;<sub>3</sub></sub></math> are MLP networks; <math>c<sub>1</sub></math>,<math>d<sub>1</sub></math>,<math>e<sub>1</sub></math> are the encoded feature vectors.</p>
            <br>
            <br>
            <image src="docs/figs/Self-Supervise_Net.JPG" width="100%" class="center"></image>
        </div>
    </div>

    <br>
    <br>
    
    <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
               Comparison of performance on <em>T2I</em> for SL method vs. SSL method.
            </h3>
            <p>SL and SSL represent supervised learning and self-supervised learning, respectively. 5K and 14K are the training data amount. Fine-tuning means we further use the 5K training data in SPARE3D to fine-tune the parameters. For supervised learning, we evaluate the network performance for: (1) using <em>early fusion</em> or <em>late fusion</em> structure, (2) whether or not using ImageNet pre-trained parameters.</p>
            <br>
            <br>
            <table style="width:100%">
                  <tr>
                    <th>SL</th>
                    <th>early-fusion(5K)</th>
                    <th>early-fusion(pretrained, 5K)</th>
                    <th>late-fusion(pretrained , 5K)</th>
                    <th>early-fusion(14K)</th>
                    <th>early-fusion(pretrained, 14K)</th>
                    <th>late-fusion(pretrained, 14K)</th>
                  </tr>
                  <tr>
                    <td></td>
                    <td>55.0</td>
                    <td>30.6</td>
                    <td>25.2</td>
                    <td>63.6</td>
                    <td>51.4</td>
                    <td>27.4</td>
                  </tr>
                  <tr>
                    <th>SSL</th> 
                    <th>Jigsaw puzzle</th>
                    <th>Colorization</th>
                    <th>SimCLR</th>
                    <th>RotNet</th>
                    <th>Ours (NT-Xent loss)</th>
                    <th>Ours (BCE loss)</th>
                  </tr>
                  <tr>
                    <td></td>
                    <td>27.4</td>
                    <td>23.4</td>
                    <td>31.0</td>
                    <td>30.6</td>
                    <td>48.4</td>
                    <td><strong>74.9</strong></td>
                  </tr>
            </table>
        </div>
    </div>

    <br>
    <br>
    
    <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
               Comparison of performance on <em>I2P</em> and <em>P2I</em> tasks for SL method vs. SSL method.
            </h3>
            <p>With the increase of data amount for training, both supervised learning, and our self-supervised learning-based method achieve higher accuracy. For task <em>I2P</em>, the best accuracy achieves <math>98.0 %</math>, and for task <em>P2I</em>, the best accuracy is <math>83.4 %</math>. The best performance happens when using the <math>40,000</math> scale dataset with our self-supervised learning method. </p>
            <br>
            <br>
            <table style="width:100%">
                  <tr>
                    <th>Data amount (K)</th>
                    <th>5</th>
                    <th>10</th>
                    <th>15</th>
                    <th>20</th>
                    <th>25</th>
                    <th>30</th>
                    <th>35</th>
                    <th>40</th>
                  </tr>
                  <tr>
                    <th>I2P(SL)</th>
                    <td>83.6</td>
                    <td>86.4</td>
                    <td>87.7</td>
                    <td>88.5</td>
                    <td>88.7</td>
                    <td>90.4</td>
                    <td>90.6</td>
                    <td>91.1</td>
                  </tr>
                  <tr>
                    <th>I2P(SSL)</th>
                    <td>88.7</td>
                    <td>93.2</td>
                    <td>95.1</td>
                    <td>96.4</td>
                    <td>96.7</td>
                    <td>97.7</td>
                    <td>97.5</td>
                    <td><strong>98.0</strong></td>
                  </tr>
                  <tr>
                    <th>P2I(SL)</th>
                    <td>65.4</td>
                    <td>67.1</td>
                    <td>68.5</td>
                    <td>67.8</td>
                    <td>69.8</td>
                    <td>69.6</td>
                    <td>68.5</td>
                    <td>70.4</td>
                  </tr>
                  <tr>
                    <th>P2I(SSL)</th>
                    <td>72.4</td>
                    <td>80.8</td>
                    <td>81.9</td>
                    <td>82.1</td>
                    <td>82.8</td>
                    <td>83.1</td>
                    <td>83.0</td>
                    <td><strong>83.4</strong></td>
                  </tr>
            </table>
        </div>
    </div>

    <br>
    <br>
  
       <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
               Attention maps for SL vs. SSL method in <em>T2I</em> task.
            </h3>
            <p>For each CAD model, the first row are the line drawings. The second and third row are the attention maps generated from supervised learning using <em>early fusion</em> and <em>late fusion</em>, respectively. The fourth row are the attention maps generated from our method. N/A indicates no attention map for the corresponding view. Best viewed in color.</p>
            <br>
            <br>
            <image src="docs/figs/attention_example.JPG" width="100%" class="center"></image>
        </div>
    </div>

    <br>
    <br>
    
    <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
                Acknowledgement
            </h3>
            <p>The first two authors contributed equally. Chen Feng is the corresponding author. The research is supported by NSF Future Manufacturing program under EEC-2036870. Siyuan Xiang gratefully thanks the IDC Foundation for its scholarship.</p>
        </div>
        
    <br>
    <br>
    <div class="row">
        <div class="col-md-8 col-md-offset-2">
            <h3>
                BibTeX
            </h3>
            <div class="form-group col-md-10 col-md-offset-1">
                <textarea id="bibtex" class="form-control" readonly>
@article{xiang2021contrastive,
  title={Contrastive Spatial Reasoning on Multi-View Line Drawings},
  author={Xiang, Siyuan and Yang, Anbang and Xue, Yanfei and Yang, Yaoqing and Feng, Chen},
  journal={arXiv preprint arXiv:2104.13433},
  year={2021}
}
                </textarea>
            </div>
        </div>
    </div>
</div>
</body>
</html>