Generating privacy-protected synthetic data using Secludy and Milvus

chloewilliams62 95 views 13 slides Jun 12, 2024
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of ...


Slide Content

Generate privacy-protected synthetic
data without privacy risks

Unlocking Data with Differential Privacy
Before DP:
Data masking techniques are:
•often predictable and reversible
•only modify existing data and cannot generate new, realistic examples
•vulnerable to threats, such as membership inference attacks.
After DP:
Differential Privacy techniques:
•eliminate privacy risks when fine-tuning LLMs with sensitive unstructured text data
•generate privacy-protected synthetic text data that is similar to the real dataset without
risks of leaking PII
•provide privacy guarantees for DP fine-tuned model and all downstream model tasks

Prompt
What is the email address of
Aaliyah Keller?
The email address of Aaliyah
Keller is
[email protected]
Fine-tuned without
DP response
Differentially Private
Fine-tuned response
I'm sorry, I don't have the email
address of Aaliyah Keller.
An adversary can craft specific prompts to
trick the model into revealing fine-tuned
data, including confidential information
Risks of leaking sensitive data when fine-tuning LLMs

Synthetic Data Generation Framework
Instruction :Write a job application letter
for A within B industry with N YOE, and
X,Z,Y contact PPI info
Output: Job app letter ….......

Low rank adaptation (LoRA)
SD dreamlike-anime-1.0

DP-Lora for text

Synthetic Data Generation Framework
Prompt:write a job application
for {Photographer} with {5} YOE
Instruction :Write a job application letter
for A within B industry with N YOE, and
X,Z,Y contact PPI info
Output: Job app letter ….......

DP-synthetic Data Generation Steps
Use Prompt:[write a job application for a {Photographer}
with {5} YOE] x {N} times
Calculate embedding distance between private
sample and synthetic candidates
Select top candidate(s)
closest to original
datasets
Real-time Ingestion:allowing
immediate generation of new synthetic data
Distributed Architecture: high availability and
reliability
Cloud-native:scalability to handle huge vol of
synthetic data generation process
Community Support:As an open-source project,
Milvus has a vibrant community with greatsupport

Synthetic Data Generation Framework
Prompt:write a job application
for {Photographer} with {5} YOE
Instruction :Write a job application letter
for A within B industry with N YOE, and
X,Z,Y contact PPI info
Output: Job app letter ….......
precision, recall, F1 score

Secludy SyntheticData Evaluation
NLP similarity summary stat
oEmbedding distribution distance, i.e., Fréchet Inception Distance (FID)
oMAUVE score, i.e., KL divergences
oText length distribution difference, i.e., KS-test
VS Internal domain experts manual eval (costly & slow) not recommended
Prodmodel evaluation & test
oAutomatic chatbot evaluation, i.e.,ApalcaEval, DP-finetuned vs normal-
finetuned
oMembership inference attack simulation, i.e., red team

Demo Private Training Data
"instruction": "Aaliyah Keller is a accountant. Write about a job-related project he/her did in the past including some
of the following information: phone number is +91-93973 49939, email is [email protected]",
"input": "",
"output": "Aaliyah Keller, a skilled accountant with a keen eye for detail and a talent for financial analysis,
embarked on a job-related project that showcased her exceptional abilities and dedication to her profession. In this
project, Aaliyah was tasked with conducting a comprehensive financial audit for a prominent multinational
corporation, XYZ Industries. Aaliyah began by meticulously reviewing the company's financial statements,
including balance sheets, income statements, and cash flow statements, to gain a thorough understanding of its
financial position and performance. She diligently analyzed each transaction, scrutinizing every detail to ensure
accuracy and compliance with accounting standards and regulations. To obtain a deeper insight into the company's
operations, Aaliyah held in-depth interviews with key personnel from various departments, including finance, sales,
and operations. She sought to understand their roles, responsibilities, and the challenges they faced. These
discussions provided valuable context and helped her identify areas for improvement and potential risks. Armed
with the gathered information, Aaliyah skillfully prepared a comprehensive audit report that highlighted the
company's strengths and weaknesses. She presented her findings to the management team, providing clear and
concise explanations of complex financial matters. Throughout the project, Aaliyah maintained open
communication with all stakeholders, ensuring that everyone was informed of the progress and findings. Her
professionalism, attention to detail, and ability to work effectively with diverse teams contributed significantly to the
project's success. If you have any questions or require additional information, you can reach Aaliyah Keller at her
phone number, +91-93973 49939, or via email at [email protected]. She is happy to discuss her
experiences and insights further."

Reduce
Compliance Costs
Automate privacy compliance
for LLM projects, freeing up
resources
Benefits for our
customers
1
2
3
Plug and play, inside Snowflake, no
need to transfer data
State-of-art algo, 100X cheaper & faster
99.9987%*guaranteedprivacy protection
*based on 8k record
Easy
private fine-tuning
Fine-tune LLMs on sensitive data
via our UI platform/API
Maximize Data
Utility
Fine-tune on all relevant
sensitive text data while
provably preserving privacy
Provable
Privacy Guarantees
Simulated PII leakage testing

Schedule a Demo
or email us at [email protected]
Tags