-
- Introduction
-
-
-
White-box Attacks
- In white-box adversarial attacks, attackers have complete knowledge of
- the target model, including model structure, weight parameters, and training data. In this scenario,
- attackers can directly access the internal information of the target model, making it easier to
- understand the model’s characteristics and vulnerabilities. Attackers can generate adversarial
- examples in a targeted manner by analyzing model gradients, loss functions, and other information,
- causing the model to produce misleading outputs. White-box attacks typically involve using gradient
- information for backpropagation to maximize changes in input, steering the model output toward
- the direction expected by the attacker. With carefully designed adversarial examples, attackers
- can guide the model to make incorrect decisions, posing significant harm in practical applications
-
-
-
Black-box Attacks
- In recent years, black-box adversarial attacks in the field of neural code
- models have been widely studied. In contrast to white-box adversarial attacks, where the attacker
- has detailed information about the model’s structure and weights, black-box adversarial attacks
- involve attackers who cannot access such detailed information. In black-box attacks, adversaries
- can only generate adversarial examples by obtaining limited model outputs through model queries.
- The harm caused by black-box attacks primarily manifests in compromised model performance and
- threats to system security. In situations where detailed model information is unavailable, attackers
- ingeniously construct adversarial examples, potentially leading to misleading outputs from neural
- code models, affecting the accuracy of the model in practical tasks. This not only poses a potential
- threat to downstream models in software engineering tasks but may also result in serious issues in
- security-critical systems.
-
-
-