Being Intentional: Privacy Engineering and A/B Testing
tgwilson
274 views
53 slides
Aug 22, 2024
Slide 1 of 53
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
About This Presentation
Matt Gershoff's presentation from the Columbus Data & Analytics Wednesday meetup on 21-Aug-2024. The talk covered the benefits of being intentional when determining what data to collect and how it gets collected and stored. It pays dividends in the computational and storage requirements as w...
Matt Gershoff's presentation from the Columbus Data & Analytics Wednesday meetup on 21-Aug-2024. The talk covered the benefits of being intentional when determining what data to collect and how it gets collected and stored. It pays dividends in the computational and storage requirements as well as in aligning with privacy by design principles.
Size: 4.75 MB
Language: en
Added: Aug 22, 2024
Slides: 53 pages
Slide Content
Being Intentional: Privacy
Engineering and AB Testing
Matt Gershoff, Conductrics
Matt and some of his Bias
Conductrics (CX Optimization Software)
•
Inference is hard –it is not a technology or a magic box.
•
Solutions live in problems so be explicit in what problem you are solving
•
Complexity is a cost -do simplest thing that will solve the problem
•
The value of customer analytics is to be a better customer advocate.
•
Be Intentional and Have Empathy
INTENTIONAL DESIGN
To take actions with awareness: deliberately, voluntarily, with
conscious purpose.
What I am going to talk about*
*
Plus show a bunch of gratuitous pictures of my dog
Privacy Engineering Concepts
•
K-Anonymity
•
Cardinality
•
Equivalence Classes
•
Local vs Global Privacy
Experimentation Example
•
What is AB Testing?
•
Tasks
Primary: AB, MVT, and Bandits
Secondary (Helper) SRM and Test Interference Checks
Analytics on Minimized Data
•
Example Calculation for T-Test
•
Extensions to Multivariate Cases
Difficulty Level
Legal
ProductEngineering/Data
Privacy Engineering
*
Plus show a bunch of gratuitous pictures of my dog
Engineering methodologies, tools, and techniques
to ensure systems provide acceptable levels
ofprivacy.
Privacy Engineering
*
Plus show a bunch of gratuitous pictures of my dog
Privacy by Design Principles
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
•
Privacy must be incorporated into networked
data systems and technologies, by default.
•
Privacy must become integralto organizational
priorities, project objectives, design processes,
and planning operations.
•
Privacy must be embedded into every standard,
protocol and process that touches our lives.
Privacy by Design: 7 Principles
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
1.Proactive not reactive (anticipates)
2.Privacy as the default setting.
3.Privacy embedded into design.
4.Full functionality (doesn’t impair)
5.End-to-end security.
6.Visibility and transparency.
7.Respect for user privacy.
Principle 2: Privacy at the Default
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
Data Minimization −
1.Personally identifiable information should be
kept to a strict minimum.
2.The design of … technologies, … should begin
with non-identifiable interactions and
transactions, as the default.
3.Wherever possible link-abilityof personal
information should be minimized.
Just Enough: Data Minimization
JUST Enough
DATA for THIS Question/Task
By DefaultTechnology Should:
●
Minimize Individual Information
●
Minimize Granularity
●
Minimize Linkability of Personal Info
Explicit Objective: Solve THIS Task (egAB Test)
See: Privacy by Design, Dr Ann Cavoukian
GDPR Article 5c and Article 25
JUST ENOUGH vsJUST IN CASE
JUST Enough
DATA for THIS Question/Task
By DefaultTechnology Should:
●
Minimize Individual Information
●
Minimize Granularity
●
Minimize Linkability of Personal Info
Explicit Objective: Solve THIS Task (egAB Test)
JUST IN CASE
DATA for all Possible FUTURE Questions
By DefaultTechnology Should:
●
Maximize Individual Information
●
Maximize Granularity
●
Maximize Linkability of Personal Info
Shadow Objective: Maximize Optionality
PrivacyEngineeringMethods
1.Pseudonymize / de-identify
2.Don’t Link All the Data
(No Big Table of all the data -have separate unlinked tables)
3.Enumeration -impose limited values/binsfor less granularity
4.Aggregation
1.K-Anonymization -Quasi-Identifiers
2.L-Diversity (won’t cover)
5.Differential Privacy -Make it noisyusing Laplace or Gaussian Noise
(won’t cover)
Example: Conductrics and Experimentation with Equivalence Classes
Whatis AB Testing
and why bother me
about it?
Why AB Testing?
•
New Drug have greater efficacy over a placebo or current treatment?
•
Does a new marketing campaign increase memberships?
•
Does a new layout on a travel search results page improve travel bookings?
Helps discover Causal relationships
Examples:
What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
A procedure that:
What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
A procedure that:
Age
HospitalizationVaccine
RANDOM
SELECTION
What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
????
6 6
Univariate
What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
????
6 6
Univariate
Multivariate
??5
?
6
?
?5
What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
????
6 6
Univariate
Multivariate
??5
?
6
?
?5
What is AB Testing?
AB Testing for Causal Inference
*Much arguing over WHAT methods –often it is second order stuff IMO.
Supporting Tasks : SRM & AB Test Interference Checks
Univariate Multivariate
Sample Ratio Mismatch (SRM)
AB Test Interactions
Chi-Square Test Nested Partial f-tests
Required Statistics
Example: Conductrics and Experimentation with Equivalence Classes
…butwhat are
Equivalence Classes?
Just Enough-Task Level Data Storage
1.Each AB Test has its own separate data structure
2.Collect aggregate counts, conversion, and conversion^2 data by treatments
Aggregate data into Equivalence Classes (think Pivot Tables)
Equivalence Class for a simple AB Test
Global vs Local Privacy
•Global –Central Aggregator/Curator collects the detail/raw data.
Then applies these methods AFTERcollection to any data they
share.
•
Local –The data min/privacy methods are applied BEFOREthey
are collected and stored.
Global vs Local Privacy
Global Approach
Global vs Local Privacy
Global Approach
Secure Curator -Stores Nonprivate Data
Global vs Local Privacy
Global Approach
Secure Curator -Stores Nonprivate Data
Release Minimized/Anonymized data
-Use for Internal Data or Product Teams
Global vs Local Privacy
LocalApproach
No Secure Curator
Only Collect and Store Minimized/AnonymizedData
Just Enough-Local Collection and Task Level Data Storage
How can I collect
data only in
summary form?
Implementation Example
Data Minimization with Equivalence Classes
K-Anonymity
Efficient Data Storage
Efficient Computation
Encourages Intentional Thinking and Design
Benefits
Cons
Limits Methods
No Formal Privacy Guarentes
Loss of Optionality
Data Minimization with Equivalence Classes
What is K-
anonymity?
K-Anonymity
Hide in crowd of
K
other ‘equivalent’ people.
K-Anonymity
Easy to monitor and report on K
Search for Min(Count) in each table
E.g.Here K=1925
Efficient Data Storage
Known Max Size for Each Data Set
Size is bounded by Joint Cardinality regardless of number of individuals
E.g.Here the Max Rows = 4 even though N = 8,000
Why did you sum
the salesand the
sumof the
squared values of
sales?
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Simple AB Tests only need:
1.Counts
2.Sum of Conversion values
3.Sum of Conversion^2
AB Test Data
Standard T-Test formula
?????
?
?
?
?
6
?
?
6
?
?
?
6
?
?
6
?
?
?
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Simple AB Tests only need:
1.Counts
2.Sum of Conversion values
3.Sum of Conversion^2
AB Test Data
T-Test formula rewritten in terms of just these aggregate values
Efficient Computations: Why Counts, Sums, and Sums of Squares?
MVT/FactorialANOVA Problemsonly need:
1.Conditional Counts
2.Conditional Sum of Conversion values
3.ConditionalSum of Conversion^2
5 5 ?
? 5 ?
−1
??????
??????
??????
??????
??????
??????
??????
??????
??????
Equation for OLS Regression
OLS Regression Too!
*
6
?
?5
is not shown for brevity
Efficient Computations: Why Counts, Sums, and Sums of Squares?
User Level Data (413 rows) Regression Output from User Level (413 rows)
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Equivalence Class Level Data (5 rows –covariate data 4 values + missing flag)
Regression Output from Equivalence Class Level (5 rows)
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Equivalence Class Level Analysis (5 rows)User Level Analysis (413 rows)
The main value of experimentation/AB Testing programs is that they provide
a principled framework for organizations to act and learn intentionally.
Value of AB Testing
•
Well Defined Problems
•
Explicit Objectives
•
Make Decisions at the Margin
Data Minimization provides a principled framework for organizations to
think about and collect data intentionally.
Value of Data Minimization
•
Defined Problems
•
Explicit Objectives for the collection of Data
•
Consider the marginal value of next additional bit wrtsolving the problem
Thank you!
Questions?
Equivalence Class for a Multivariate Test
Store at the unique combination of desired data elements
Multivariate Equivalence Classes
Just Enough-Task Level Data Storage
What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
Confounder
OutcomeTreatment
RANDOM
SELECTION
Learning from Observations via AB Testing
Right to Left = Turkey! Left to Right= Turkey!
Inference: Franklin Prefers Turkey to Pupperoni!