AI-ttacks - Nghiên cứu về một số tấn công vào các mô hình học máy và AI

sbc-vn 1,884 views 67 slides Oct 07, 2024
Slide 1
Slide 1 of 67
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67

About This Presentation

AI-ttacks - Nghiên cứu về một số tấn công vào các mô hình học máy và AI


Slide Content

AI /ML under A ttack SECURITY BOOTCAMP N/A Manhnho

NỘI DUNG 01 LLM PROMPT INJECTION 03 02 04 Q & A ATTACK ML MODEL AI, ML TODAY https://cypeace.net/

https://cypeace.net/ AI, ML TODAY

Brief of ML/AI

Brief of ML/AI

Brief of ML/AI

Brief of ML/AI

https://cypeace.net/ ATTACK ML MODEL

ML pipeline development Learning parameters Training data Model Test Output Test Input Learning algorithm

Sample Model

Attacks on the ML pipeline Training Data attack Training set poisoning Adversarial Examples Model theft Learning parameters Training data Model Test Output Test Input Learning algorithm

Poisoning Attack Training Data attack Training set poisoning Adversarial Examples Model theft Learning parameters Training data Model Test Output Test Input Learning algorithm

Target Poisoning Attack Bias induction Backdoor insertion Disruption Competitive sabotage Ransomware

Poisoning Attack Classified based on outcomes Targeted attacks Untargeted attacks Classified based on the approach follow Backdoor attacks Clean-label attacks

Simple Poisoning Attack Learning parameters Training data Model Test Output Test Input Learning algorithm

Simple Poisoning Attack

Simple Poisoning Attack

Backdoor Poisoning Attack Backdoor Poisoning Attack Single pixel Pattern of pixels Imange insert

Backdoor Poisoning Attack

Backdoor Poisoning Attack

Model Tampering Training Data attack Training set poisoning Adversarial Examples Model theft Learning parameters Training data Model Test Output Test Input Learning algorithm

Model Tampering Model Tampering Exploiting Pickle Serialization Injecting Trojan Horses Neural Payload Injection Model Hijacking Model Reprogramming

Exploiting Pickle Serialization

Exploiting Pickle Serialization

Injecting Trojan horses - Keras layers

Injecting Trojan horses - Keras Lambda layers

Injecting Trojan horses - Keras Lambda layers

Injecting Trojan horses - Keras Custom layers

Injecting Trojan horses - Keras Custom layers

Evasion Attack Training Data attack Training set poisoning Adversarial Examples Model theft Learning parameters Training data Model Test Output Test Input Learning algorithm

Evasion Attack

Method Fast Gradient Sign Method (FGSM) Basic Iterative Method (BIM) Projected Gradient Descent (PGD) Carlini and Wagner (C&W) Jacobian-based Saliency Map Attack (JSMA) Evasion Attack

Evasion Attack

https://kennysong.github.io/adversarial.js/ Evasion Attack

Model Extraction Attacks Training Data attack Training set poisoning Adversarial Examples Model theft Learning parameters Training data Model Test Output Test Input Learning algorithm

Functionally Equivalent Extraction Model Extraction Attacks

Learning-Based Model Extraction Attacks Copycat CNN KnockOff Nets Model Extraction Attacks

Generative Student-Teacher Extraction (Distillation) Attacks Model Extraction Attacks

Model Extraction Attacks

Jupyter Notebook demo https://github.com/anhtn512/secure_ai

https://cypeace.net/ LLM PROMPT INJECTION

LLM Application Workflow User Prompt User Response Tokenization API Request Model Processing Response Generation API Response

Build a basic Chat LLM Application

HelloLLM

HelloLLM

Langchain

Langchain bot

Langchain bot

Integrating External Data into LLMs

Chef AI

Chef AI

Chef AI

What is prompt injection? Prompt injection OWASP defines prompt injection as manipulating “a large language model (LLM) through crafted inputs, causing the LLM to execute the attacker’s intentions unknowingly.” LLM01: Prompt Injection - OWASP Top 10 for LLM & Generative AI Security

Direct vs Indirect

Direct Prompt Injection – Basic

Direct Prompt Injection – DoS

Direct Prompt Injection – Phishing

GitHub - 0xk1h0/ ChatGPT_DAN : ChatGPT DAN, Jailbreaks prompt Direct Prompt Injection – DAN Mode

Direct Prompt Injection – DAN Mode

Direct Prompt Injection – DAN Mode

Other Jailbreaking techniques – Splitting Payloads Direct Prompt Injection

Other Jailbreaking techniques – Splitting Payloads Direct Prompt Injection

Other Jailbreaking techniques – Encoding Direct Prompt Injection

Other Jailbreaking techniques – Adding constraints Direct Prompt Injection

Indirect Prompt Injection

RCE with prompt injection Example – Pandas AI – 1.1.3

QnA Email: [email protected] Phone: (+84) 853 727 900