Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Demontis A.;Biggio B.;Roli F.;
2024-01-01

Abstract

Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented with a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy of clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.
2024
Inglese
16
3
1779
1804
26
https://link.springer.com/article/10.1007/s13042-024-02363-5
Esperti anonimi
scientifica
Backdoor poisoning; Influence functions; Poisoning; Machine learning; Adversarial machine learning; Security
Cinà, A. E.; Grosse, K.; Vascon, S.; Demontis, A.; Biggio, B.; Roli, F.; Pelillo, M.
1.1 Articolo in rivista
info:eu-repo/semantics/article
1 Contributo su Rivista::1.1 Articolo in rivista
262
7
open
Files in This Item:
File Size Format  
s13042-024-02363-5.pdf

open access

Type: versione editoriale
Size 16.78 MB
Format Adobe PDF
16.78 MB Adobe PDF View/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie