Skip to content

Latest commit

 

History

History
37 lines (24 loc) · 1.68 KB

deformable_detr.md

File metadata and controls

37 lines (24 loc) · 1.68 KB

October 2020

tl;dr: Improved DETR that trains faster and performs better to small objects.

Overall impression

Issues with DETR: long training epochs to converge and low performance at detecting small objects. DETR uses small-size feature maps to save computation, but hurt small objects.

Deformable DETR first reduces computation by attending to only a small set of key sampling points around a reference. It then uses multi-scale deformable attention module to aggregate multi-scale features (without FPN) to help small object detection.

Each object query is restricted to attend to a small set of key sampling points around the reference points instead of all points in the feature map.

Deformable DETR is one of the highest scored papers in ICLR 2021.

There are several papers on improving the training speed of DETR.

Key ideas

  • Efficient Attention
    • Pre-defined sparse attention patterns.
    • Learn data-dependent sparse attention --> Deformable DETR belongs to this
    • Low rank property in self-attention
  • Complexity of DETR
    • Encoder: self attention $O(H^2W^2C)$, quadratically with feature size.
    • Decoder: cross attention $O(HWC^2 + NHWC)$, linearly with feature size. Self-attention $O(2NC^2+N^2C)$

Technical details

  • Summary of technical details

Notes

  • Questions and notes on how to improve/revise the current work