Ensuring Secure and Permission-Aware RAG Deployments

chloewilliams62 228 views 24 slides Jul 24, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

In this talk, we will explore the critical aspects of securing Retrieval-Augmented Generation (RAG) deployments. The focus will be on implementing robust secured data retrieval mechanisms and establishing permission-aware RAG frameworks. Attendees will learn how to ensure that access control is rigo...


Slide Content

Securing RAG Deployments

●Intelligence and Cybersecurity Specialist
●Product management experience across the product
lifecycle
●Built products in data security, governance, posture
management
●Co-Founder of Opsin, a company to secure GenAI

OZ WASSERMAN

Improved
Accuracy
type and
scrambled
Contextual
Relevance
Reduces
Hallucinations
1.Combines retrieval-based systems with generative AI.
2.Retrieves relevant information from databases or
knowledge bases before generating responses.
3.Ensures generated content is accurate and grounded
in real data.

RAG (Retrieval-Augmented
Generation) GenAI
Deployments:

RAG Architecture
Ken Huang, CEO of DistributedApps.ai

03
1. Protect Your Work
2. Enhance System Reliability
3. Remove organizational risks by maintaining Compliance
4. Build Trust with users, security and privacy from the get go
5. Reduce Development Costs of dealing with security issues

Why do we even need to secure
RAG?

03
RAG Security Controls
1.Data Source / VectorDB Security
2.Security at the retrieval level
3.Security at the generation level

Data Source /
VectorDB Security

05
Privacy
Breaches
Regulatory
Non-Compliance
Data
Misuse
1.Removes or masks personally identifiable information (PII) in
documents, databases, and knowledge graphs
2.Protects individual privacy by ensuring sensitive information
cannot be traced back to specific individuals
3.Forms a critical component of data security protocols before
any data processing begins

Data Anonymization

05
Privacy
Breaches
Regulatory
Non-Compliance
Data
Misuse
Open-source library maintained by Microsoft (available on GitHub).
Provides a unified SDK for privacy preservation.
Fast identification and anonymization of private entities in text and images.
Handles data such as credit card numbers, names, locations, social security numbers,
bitcoin wallets, US phone numbers, and financial data.

Example: Presidio

05
Over-privileged
access
Unauthorized
access
Insider
threats


Governs who can access and retrieve data.
Ensures only authorized users handle sensitive
information.

Authentication: Verifies identity
Authorization: Defines permission for the identity
Granular Permissions: Controls based on data sensitivity and
roles
Audit & Monitoring: Tracks access and changes

Access Control

04
Authorizations and permissioning on the data
CRMs and ERPs
IAM
Structured Data
Unstructured Data
Tools and Integrations
LLMs
Your GenAI
Application
Chatbots
Customer
Service Bots
AI agents
Knowledge
Extraction
Automations
Orchestration Layer
Authentication
Authorization
Permission
Auditability

03
●Basic Encryption:
○At Rest & In Transit: Encrypt data to keep it unintelligible.
○depends on the source integrated to the model
i.some vector DBs oer encryption of both at rest and in transit data
ii.If sources connected dierently, ensuring encryption is supported in
important

●Advanced Techniques (some in experimentation some in production)

examples:
○Dierential Privacy: Add noise to data to protect individual points.
○Tokenization: Replace sensitive data with tokens.
○Decentralization & Sharding: Distribute data across multiple servers.

Encryption

03
Monitoring & Alerting: Real-time detection of anomalies.
Backup & Recovery: Secure and tested backups.
Rate Limiting: Prevent resource exhaustion and mitigate aacks.
Data Validation & Sanitization: Protect against malicious data

Additional Measures

Retrieval Stage

Prompt Injection Risk
Unauthorized Data Access
Similarity Search Risk
08
Security Risk
●Data Leakage through Similarity Queries
●Manipulation of Search Results
●Reconnaissance and Paern Analysis
●Resource Exhaustion

08
Unauthorized Data
Access

08
Prompt Injection
Breaking predefined instructions by
manipulating an AI prompt’s context

Encryption
Input and Output Validation
Authorization and Permission
08
Security Controls
Role-Based Access Control (RBAC)
Permissioning

Generation Stage

Bias and Oensive Content
Data Privacy Violations
Output Manipulation
08
Security Risk

Auditability
Content Validation
Anonymization and Reduction
of sensitive data
08
Security Controls
Output Control

Input and output validation
Access control
Security by design - review
development workflows
08
Summary: Security
Controls For RAG
Data Anonymization
Proactive evaluation

08
Additional Resources
OWASP top 10 for Large Language Models
AI Governance and Compliance
References
Mitigating Security Risks in Retrieval Augmented Generation (RAG) LLM Applications

When Prompts Go Rogue: Analyzing a Prompt Injection Code Execution in Vanna.AI

Thank You
Would love to deepen into your GenAI use case and what challenges you see
with privacy and security of GenAI deployments
Contact Us:
More on RAG Security at
www.opsinsecurity.com
Tags