Support Dirty-Label Backdoor Attack #137

deprit · 2024-04-03T01:37:56Z

Add support in Armory Library for an undefended Dirty-label Backdoor (DLBD) Attack applied to image classification.

In a DLBD attack, training images are chosen from the source class, a trigger applied to them, and then their labels flipped to the target class. The model is then trained on this modified data. The adversary's goal is that test images from the source class will be classified as the target class when the trigger is applied at test time.

Four primary metrics are computed after the model is trained on poisoned data.

Accuracy on benign test data, all classes
Accuracy on benign test data, source class
Accuracy on poisoned test data, all classes
Attack success rate

To evaluate a DLBD attack, Armory Library must

Create poison datasets by inserting triggers into selected classes and modifying labels;
Generate primary poisoning metrics to evaluate a poisoned model;
Run an example script evaluating a DLBD attack using the CIFAR10 dataset and a ResNet-18 classifier.

deprit added the phoenix label Apr 7, 2024

deprit added enhancement New feature or request and removed phoenix labels Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Dirty-Label Backdoor Attack #137

Support Dirty-Label Backdoor Attack #137

deprit commented Apr 3, 2024

Support Dirty-Label Backdoor Attack #137

Support Dirty-Label Backdoor Attack #137

Comments

deprit commented Apr 3, 2024