Security and auditing tools in Large Language Models (LLM).pdf

jmoc25 188 views 41 slides Oct 10, 2024

Slide 1 of 41

About This Presentation

LLM models are a subcategory of deep learning models based on neural networks and natural language processing(NLP). Security and auditing are critical issues when dealing with applications based on large language models, such as GPT (Generative Pre-trained Transformer) or LLM (Large Language Model) ...

Size: 2.7 MB

Language: en

Added: Oct 10, 2024

Slides: 41 pages

Slide Content

October 11, 2024 |
José Manuel Ortega
Security and auditing tools
in Large Language Models
(LLM)
[email protected]

Agenda
•Introduction to LLM
•Introduction to OWASP LLM Top 10
•Auditing tools
•Use case with the textattack tool

Introduction to LLM
•Transformers
•Attention is All You Need" by Vaswani et
al. in 2017
•Self-attention mechanism
•Encoder-Decoder Architecture

Introduction to LLM

Introduction to LLM
Pre-training + fine-tuning

Introduction to LLM
●Language Models: Models like BERT, GPT, T5, and RoBERTa
are based on transformer architecture. They are used for a wide
range of NLP tasks such as text classification, question
answering, and language translation.
●Vision Transformers (ViT): Transformers have been adapted for
computer vision tasks, where they have been applied to image
classification, object detection, etc.
●Speech Processing: In addition to text and vision, transformers
have also been applied to tasks like speech recognition and
synthesis.

Introduction to OWASP LLM Top 10
•https://genai.owasp.org

Introduction to OWASP LLM Top 10

ChiperChat
https://arxiv.org/pdf/2308.06463

Jailbreak prompts
●https://jailbreak-llms.
xinyueshen.me/

Introduction to OWASP LLM Top 10
•Data Poisoning
•Malicious actors could poison the training data by
injecting false, harmful, or biased information into
datasets that train the LLM, which could degrade
the model's performance.
•Mitigation: Data source vetting, training data
audits, and anomaly detection for suspicious
patterns in training data.

Introduction to OWASP LLM Top 10
•Model Inversion Attacks
•Attackers could exploit the LLM to infer sensitive
or private data that was used during training by
repeatedly querying the model. This could expose
personal, confidential, or proprietary information.
•Mitigation: Rate-limiting sensitive queries and
limiting the availability of models trained on
private data.

Introduction to OWASP LLM Top 10
•Unauthorized Code Execution
•In some contexts, LLMs might be integrated into
systems where they have access to execute code
or trigger automated actions. Attackers could
manipulate LLMs into running unintended code or
actions, potentially compromising the system.
•Mitigation: Limit the scope of actions that LLMs
can execute, employ sandboxing, and use strict
permission controls.

Introduction to OWASP LLM Top 10
•Bias and Fairness
•LLMs can generate biased outputs due to the
biased nature of the data they are trained on,
leading to unfair or discriminatory outcomes. This
could impact decision-making processes, amplify
harmful stereotypes, or introduce systemic biases.
•Mitigation: Perform fairness audits, use bias
detection tools, and diversify training datasets to
reduce bias.

Introduction to OWASP LLM Top 10
•Model Hallucination
•LLMs can produce outputs that are
plausible-sounding but factually incorrect or entirely
fabricated. This is referred to as "hallucination,"
where the model generates false information
without any grounding in its training data.
•Mitigation: Post-response validation, fact-checking
algorithms, and restricting LLMs to provide
responses only within known knowledge domains.

Introduction to OWASP LLM Top 10
•Insecure Model Deployment
•LLMs that are deployed in unsecured
environments could be vulnerable to attacks,
including unauthorized access, model theft, or
tampering. These risks are elevated when models
are deployed in publicly accessible endpoints.
•Mitigation: Use encrypted APIs, secure
infrastructure, implement authentication and
authorization controls, and monitor model access.

Introduction to OWASP LLM Top 10
•Adversarial Attacks
•Attackers might exploit weaknesses in the LLM by
crafting adversarial examples. This could lead to
undesirable outputs or security breaches.
•Mitigation: Model robustness testing, adversarial
training (training the model with adversarial examples),
and implementing anomaly detection systems.

•https://llm-attacks.org

Tools/frameworks to evaluate model
robustness
●PromptInject Framework
●https://github.com/agencyenterprise/PromptInject
●PAIR - Prompt Automatic Iterative Refinement
●https://github.com/patrickrchao/JailbreakingLLMs
●TAP - Tree of Attacks with Pruning
●https://github.com/RICommunity/TAP

Auditing tools
•https://github.com/tensorflow/fairness-indicators

Auditing tools
•Prompt Guard refers to a set of strategies, tools, or
techniques designed to safeguard the behavior of
large language models (LLMs) from malicious or
unintended input manipulations.
•Prompt Guard uses an 86M parameter classifier
model that has been trained on a large dataset of
attacks and prompts found on the web. Prompt
Guard can categorize a prompt into three different
categories: "Jailbreak", "Injection" or "Benign".

Auditing tools
•https://huggingface.co/meta-llama/Prompt-Guard-86M

Auditing tools
•Llama Guard 3 refers to a security tool or strategy
designed for guarding large language models like
Meta’s LLaMA against potential vulnerabilities and
adversarial attacks.
•Llama Guard 3 offers a robust and adaptable
solution to protect LLMs against Prompt Injection and
Jailbreak attacks. By combining advanced filtering,
normalization, and monitoring techniques.

Auditing tools
•Dynamic Input Filtering
•Prompt Normalization and Contextualization
•Secure Response Policy
•Active Monitoring and Automatic Response

Auditing tools
•https://huggingface.co/spaces/schroneko/meta-llama-
Llama-Guard-3-8B-INT8

Auditing tools
•S1: Violent Crimes
•S2: Non-Violent Crimes
•S3: Sex-Related Crimes
•S4: Child Sexual Exploitation
•+S5: Defamation (New)
•S6: Specialized Advice
•S7: Privacy
•S8: Intellectual Property
•S9: Indiscriminate Weapons
•S10: Hate
•S11: Suicide & Self-Harm
•S12: Sexual Content
•S13: Elections
•S14: Code Interpreter Abuse
Introducing v0.5 of the AI Safety
Benchmark from MLCommons

Text attackhttps://arxiv.org/pdf/2005.05909

Text attack
from textattack.models.wrappers import HuggingFaceModelWrapper
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load pre-trained sentiment analysis model from Hugging Face
model =
AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-unc
ased-imdb")
tokenizer =
AutoTokenizer.from_pretrained("textattack/bert-base-uncased-imdb")

# Wrap the model for TextAttack
model_wrapper = HuggingFaceModelWrapper(model, tokenizer)

https://github.com/QData/TextAttack

Text attack
from textattack.attack_recipes import TextFoolerJin2019

# Initialize the attack with the TextFooler recipe
attack = TextFoolerJin2019.build(model_wrapper)

Text attack
# Example text for sentiment analysis (a positive review)
text = "I absolutely loved this movie! The plot was thrilling,
and the acting was top-notch."

# Apply the attack
adversarial_examples = attack.attack([text])
print(adversarial_examples)

Text attack
Original Text: "I absolutely loved this movie! The plot was
thrilling, and the acting was top-notch."

Adversarial Text: "I completely liked this film! The storyline
was gripping, and the performance was outstanding."

Text attack
from textattack.augmentation import WordNetAugmenter

# Use WordNet-based augmentation to create adversarial
examples
augmenter = WordNetAugmenter()

# Augment the training data with adversarial examples
augmented_texts = augmenter.augment(text)
print(augmented_texts)

Resources

Resources
●github.com/greshake/llm-security
●github.com/corca-ai/awesome-llm-security
●github.com/facebookresearch/PurpleLlama
●github.com/protectai/llm-guard
●github.com/cckuailong/awesome-gpt-security
●github.com/jedi4ever/learning-llms-and-genai-for-dev-sec-ops
●github.com/Hannibal046/Awesome-LLM

Resources
●https://cloudsecurityalliance.org/artifacts/security-implications-of
-chatgpt
●https://www.nist.gov/itl/ai-risk-management-framework
●https://blog.google/technology/safety-security/introducing-googl
es-secure-ai-framework
●https://owasp.org/www-project-top-10-for-large-language-model
-applications/

Security and auditing tools in Large Language Models (LLM).pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Security and auditing tools in Large Language Models (LLM).pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx