In recent years, most of the existing RGB-D SOD models use summation or splicing strategies to directly aggregate information from different modalities and decode features from different layers to predict saliency maps. However, they ignore the complementary properties of depth images and RGB images and the effective use of features between the same layers, resulting in a degraded model performance. To address this issue, we propose an asymmetric deep interaction network (ADINet) with three indispensable components with a focus on information fusion & embedding. Specifically, we design a cross-modal fusion encoder for enhancing the information fusion & embedding on semantic signals that is employed to benefit from the mutual interaction of RGB and depth information. Then, we propose a global-and-local feature decoder to enrich the global & local information for improving the recognition of salient objects. We have conducted the experiments on seven RGB-D benchmarks, and the results demonstrate that the proposed method is superior to or competitive with the state-of-the-art works.
python 3.9 pytorch 1.11.0
2.2. downloading training datasets from Baidu Drive(extraction code: o3o4).\
2.3. downloading testing datasets from Baidu Drive(extraction code: 211k).\
2.4. downloading Swin V2 weights (Swin V2(extraction code: 6hyq)) and move it into [./pretrain/swinv2_base_patch4_window16_256.pth].
python train_ADINet.py
python test_ADINet.py
python run_ADINet.py
We provide saliency maps of ADINet on seven benchmark datasets, including: DUT-RGBD, NJU2K, NLPR, SIP, SSD, LFSD and RedWeb-S from Baidu Drive(extraction code: ADIN).
When training is complete, the predictions for the test set are saved in . /test_maps. We provided python versions(extraction code: dr6d) for evaluation.
python main.py