Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code

aftabhussain461 34 views 17 slides Jul 28, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Going under the hood of deep neural models to see signs of poisoning.


Slide Content

Measuring Impacts of Poisoning on Model Parameters
and Embeddings for Large Language Models of Code
AIware 2024
Porto de Galinhas, Brazil
Aftab Hussain Md Rafiqul Islam Rabin Amin Alipour

LLMs of Code
LLMs have revolutionized software development.
-Tools: GitHub Copilot, Google’s DIDACT
-Tasks: code gen., defect detection, program repair, etc.

Safety Concerns
Their widespread use have lead to safety concerns.
-Backdoors

Backdoors
Backdoors allow attackers to manipulate model behaviour.
-One way to introduce them to models is by inserting triggers in data and
fine-tuning pretrained models with the data.

You
The Developer
Vulnerable
Code
Problem Threat Scenario

You
The Developer
Vulnerable
Code
Code is Fine!
OK
Problem Threat Scenario

How can you tell if your model is
poisoned?

Our Goal
We try to detect backdoor signals in poisoned Code LLMs.
-We analyzed internals of CodeBERT and CodeT5 models (100 million+
params each)

Approach 1 - Embeddings Analysis
Do poisoned models interpret inputs in a different way?

Approach 1 - Embeddings Analysis
Do poisoned models interpret inputs in a different way?
-We analyzed context embeddings, i.e., representations, of inputs in the
models.

Approach 1 - Embeddings Analysis: Results
Clean CodeT5 Poisoned CodeT5
(t-SNE plots of embeddings extracted from EOS tokens.
Task: defect detection)

Do poisoned models interpret inputs in a different way?
Yes. Embeddings of poisoned samples are clustered together in poisoned
models.

Approach 1 - Embeddings Analysis: Results
Do poisoned models interpret inputs in a different way?
Yes. Embeddings of poisoned samples are clustered together in poisoned
models.

Clean CodeT5 Poisoned CodeT5
(t-SNE plots of embeddings extracted from EOS tokens.
Task: defect detection)

If we have no inputs, can we tell anything from a model’s learned parameters?
Approach 2 - Parameter Analysis

If we have no inputs, can we tell anything from a model’s learned parameters?
Approach 2 - Parameter Analysis


-We analyzed weights and biases* of the three attention components
(K, Q, V) of the models.

* only weights were analyzed for CodeT5 as the version we investigated does not have bias in its
architecture.

If we have no inputs, can we tell anything from a model’s learned parameters?
Approach 2 - Parameter Analysis: Results


Observed negligible deviations from which backdoor signals were not
noticeable.



Clean CodeT5 Poisoned CodeT5
Weight
(Attention Q-component)
Weight
(Attention K-component)
Weight
(Attention V-component)
Relative Frequency
Clean
CodeT5
Poisoned
CodeT5
Attention weights from the last decoder layer of CodeT5

If we have no inputs, can we tell anything from a model’s learned parameters?
Approach 2 - Parameter Analysis: Results


We also compared these learned (fine-tuned) parameters with pre-trained
parameters, but also did not perceive any signal.

Let’s meet if wish you to learn more about our
works in Safe AI for Code



Software Engineering Research Group
University of Houston
[email protected]
https://www.linkedin.com/in/hussainaftab/