Adversarial Examples in Machine Learning Summary of key ideas, attacks and defenses (Research overview)
Motivation Why study adversarial examples? • Small input perturbations can fool ML models. • Security implications for autonomous systems, malware classifiers, biometric systems.
Key Findings • Adversarial examples often transfer between models. • Linear behavior in high-dimensional spaces explains vulnerability (Goodfellow et al.). • Fast gradient sign method (FGSM) is an efficient attack.
Common Attack Methods • FGSM — Fast Gradient Sign Method • PGD — Projected Gradient Descent • Black-box attacks using transferability or query-based methods
Defenses & Limitations • Adversarial training (robust but costly) • Input preprocessing, detection, and certified defenses • Arms race — new attacks often circumvent defenses.
Implications for Bug Bounty / Security • ML-based defenses can be bypassed — test classifiers with adversarial inputs. • Use ensemble and robust evaluation when reporting ML vulnerabilities.
References Key papers and resources • Goodfellow et al., 'Explaining and Harnessing Adversarial Examples' (2014) • Szegedy et al., 'Intriguing properties of neural networks' (2013)