AIware 2024
Porto de Galinhas, Brazil
Trojans in Large Language Models of Code: A Critical
Review through a Trigger-Based Taxonomy
Triggered/Trojaned/Backdoored Input Target Prediction/Payload
Trojan/Backdoor
Trigger/Trojan trigger/Backdoor trigger
What is a trojan?
A trojan or a backdoor is a vulnerability in a model where the model
makes an attacker-determined prediction, when a trigger is present
in an input.
Motivation
●A trigger is the main design point of trojans.
●The way a trigger is crafted directly impacts its
stealthiness, and thereby its detectability.
●Knowing aspects of trigger design is essential to uncover
potential trojaning attacks that can be deployed by
malicious actors.
We observed there was a
lack of taxonomy in
characterizing triggers
within the AI for SE domain.
Our Contributions
●With collaborators from NC State and UC Davis we
surveyed recent papers on trojaning Code LLMs.
●We developed a unified trigger taxonomy framework.
●We defined different types of triggers based on various
aspects.
Let’s take a look
at a couple of trigger
aspects
Schuster et al., Congzheng Song, Eran Tromer, and Vitaly Shmatikov. You autocomplete me: Poisoning
vulnerabilities in neural code completion, USENIX Security, 2021
Single or Multi-Featured?
(Task: Code completion)
Are Code Semantics Preserved?
Semantic Trigger
(Task: Defect detection)
Structural Trigger
Are Code Semantics Preserved?
(Task: Defect detection)
Structural Trigger
Are Code Semantics Preserved?
(Task: Defect detection)