Ensuring Secure and Permission-Aware RAG Deployments
chloewilliams62
228 views
24 slides
Jul 24, 2024
Slide 1 of 24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
About This Presentation
In this talk, we will explore the critical aspects of securing Retrieval-Augmented Generation (RAG) deployments. The focus will be on implementing robust secured data retrieval mechanisms and establishing permission-aware RAG frameworks. Attendees will learn how to ensure that access control is rigo...
In this talk, we will explore the critical aspects of securing Retrieval-Augmented Generation (RAG) deployments. The focus will be on implementing robust secured data retrieval mechanisms and establishing permission-aware RAG frameworks. Attendees will learn how to ensure that access control is rigorously maintained within the model when ingesting documents, ensuring that only authorized personnel can retrieve data. We will also discuss strategies to mitigate risks of data leakage, unauthorized access, and insider threats in RAG deployments. By the end of this session, participants will have a clearer understanding of the best practices and tools necessary to secure their RAG deployments effectively.
Size: 741.24 KB
Language: en
Added: Jul 24, 2024
Slides: 24 pages
Slide Content
Securing RAG Deployments
●Intelligence and Cybersecurity Specialist
●Product management experience across the product
lifecycle
●Built products in data security, governance, posture
management
●Co-Founder of Opsin, a company to secure GenAI
OZ WASSERMAN
Improved
Accuracy
type and
scrambled
Contextual
Relevance
Reduces
Hallucinations
1.Combines retrieval-based systems with generative AI.
2.Retrieves relevant information from databases or
knowledge bases before generating responses.
3.Ensures generated content is accurate and grounded
in real data.
RAG Architecture
Ken Huang, CEO of DistributedApps.ai
03
1. Protect Your Work
2. Enhance System Reliability
3. Remove organizational risks by maintaining Compliance
4. Build Trust with users, security and privacy from the get go
5. Reduce Development Costs of dealing with security issues
Why do we even need to secure
RAG?
03
RAG Security Controls
1.Data Source / VectorDB Security
2.Security at the retrieval level
3.Security at the generation level
Data Source /
VectorDB Security
05
Privacy
Breaches
Regulatory
Non-Compliance
Data
Misuse
1.Removes or masks personally identifiable information (PII) in
documents, databases, and knowledge graphs
2.Protects individual privacy by ensuring sensitive information
cannot be traced back to specific individuals
3.Forms a critical component of data security protocols before
any data processing begins
Data Anonymization
05
Privacy
Breaches
Regulatory
Non-Compliance
Data
Misuse
Open-source library maintained by Microsoft (available on GitHub).
Provides a unified SDK for privacy preservation.
Fast identification and anonymization of private entities in text and images.
Handles data such as credit card numbers, names, locations, social security numbers,
bitcoin wallets, US phone numbers, and financial data.
Governs who can access and retrieve data.
Ensures only authorized users handle sensitive
information.
Authentication: Verifies identity
Authorization: Defines permission for the identity
Granular Permissions: Controls based on data sensitivity and
roles
Audit & Monitoring: Tracks access and changes
Access Control
04
Authorizations and permissioning on the data
CRMs and ERPs
IAM
Structured Data
Unstructured Data
Tools and Integrations
LLMs
Your GenAI
Application
Chatbots
Customer
Service Bots
AI agents
Knowledge
Extraction
Automations
Orchestration Layer
Authentication
Authorization
Permission
Auditability
03
●Basic Encryption:
○At Rest & In Transit: Encrypt data to keep it unintelligible.
○depends on the source integrated to the model
i.some vector DBs oer encryption of both at rest and in transit data
ii.If sources connected dierently, ensuring encryption is supported in
important
●Advanced Techniques (some in experimentation some in production)
examples:
○Dierential Privacy: Add noise to data to protect individual points.
○Tokenization: Replace sensitive data with tokens.
○Decentralization & Sharding: Distribute data across multiple servers.
Encryption
03
Monitoring & Alerting: Real-time detection of anomalies.
Backup & Recovery: Secure and tested backups.
Rate Limiting: Prevent resource exhaustion and mitigate aacks.
Data Validation & Sanitization: Protect against malicious data
Additional Measures
Retrieval Stage
Prompt Injection Risk
Unauthorized Data Access
Similarity Search Risk
08
Security Risk
●Data Leakage through Similarity Queries
●Manipulation of Search Results
●Reconnaissance and Paern Analysis
●Resource Exhaustion
08
Unauthorized Data
Access
08
Prompt Injection
Breaking predefined instructions by
manipulating an AI prompt’s context
Encryption
Input and Output Validation
Authorization and Permission
08
Security Controls
Role-Based Access Control (RBAC)
Permissioning
Generation Stage
Bias and Oensive Content
Data Privacy Violations
Output Manipulation
08
Security Risk
Auditability
Content Validation
Anonymization and Reduction
of sensitive data
08
Security Controls
Output Control
Input and output validation
Access control
Security by design - review
development workflows
08
Summary: Security
Controls For RAG
Data Anonymization
Proactive evaluation
08
Additional Resources
OWASP top 10 for Large Language Models
AI Governance and Compliance
References
Mitigating Security Risks in Retrieval Augmented Generation (RAG) LLM Applications
When Prompts Go Rogue: Analyzing a Prompt Injection Code Execution in Vanna.AI
Thank You
Would love to deepen into your GenAI use case and what challenges you see
with privacy and security of GenAI deployments
Contact Us:
More on RAG Security at
www.opsinsecurity.com