Being Intentional: Privacy Engineering and A/B Testing

tgwilson 274 views 53 slides Aug 22, 2024
Slide 1
Slide 1 of 53
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53

About This Presentation

Matt Gershoff's presentation from the Columbus Data & Analytics Wednesday meetup on 21-Aug-2024. The talk covered the benefits of being intentional when determining what data to collect and how it gets collected and stored. It pays dividends in the computational and storage requirements as w...


Slide Content

Being Intentional: Privacy
Engineering and AB Testing
Matt Gershoff, Conductrics

Matt and some of his Bias
Conductrics (CX Optimization Software)

Inference is hard –it is not a technology or a magic box.

Solutions live in problems so be explicit in what problem you are solving

Complexity is a cost -do simplest thing that will solve the problem

The value of customer analytics is to be a better customer advocate.

Be Intentional and Have Empathy

INTENTIONAL DESIGN
To take actions with awareness: deliberately, voluntarily, with
conscious purpose.

What I am going to talk about*
*
Plus show a bunch of gratuitous pictures of my dog
Privacy Engineering Concepts

K-Anonymity

Cardinality

Equivalence Classes

Local vs Global Privacy
Experimentation Example

What is AB Testing?

Tasks
Primary: AB, MVT, and Bandits
Secondary (Helper) SRM and Test Interference Checks
Analytics on Minimized Data

Example Calculation for T-Test

Extensions to Multivariate Cases
Difficulty Level

Legal
ProductEngineering/Data
Privacy Engineering
*
Plus show a bunch of gratuitous pictures of my dog
Engineering methodologies, tools, and techniques
to ensure systems provide acceptable levels
ofprivacy.

Privacy Engineering
*
Plus show a bunch of gratuitous pictures of my dog

Privacy by Design Principles
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf

Privacy must be incorporated into networked
data systems and technologies, by default.

Privacy must become integralto organizational
priorities, project objectives, design processes,
and planning operations.

Privacy must be embedded into every standard,
protocol and process that touches our lives.

Privacy by Design: 7 Principles
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
1.Proactive not reactive (anticipates)
2.Privacy as the default setting.
3.Privacy embedded into design.
4.Full functionality (doesn’t impair)
5.End-to-end security.
6.Visibility and transparency.
7.Respect for user privacy.

Principle 2: Privacy at the Default
Developed by Dr Ann Cavoukian
https://privacy.ucsc.edu/resources/privacy-by-design---foundational-
principles.pdf
Data Minimization −
1.Personally identifiable information should be
kept to a strict minimum.
2.The design of … technologies, … should begin
with non-identifiable interactions and
transactions, as the default.
3.Wherever possible link-abilityof personal
information should be minimized.

Just Enough: Data Minimization
JUST Enough
DATA for THIS Question/Task
By DefaultTechnology Should:

Minimize Individual Information

Minimize Granularity

Minimize Linkability of Personal Info
Explicit Objective: Solve THIS Task (egAB Test)
See: Privacy by Design, Dr Ann Cavoukian
GDPR Article 5c and Article 25

JUST ENOUGH vsJUST IN CASE
JUST Enough
DATA for THIS Question/Task
By DefaultTechnology Should:

Minimize Individual Information

Minimize Granularity

Minimize Linkability of Personal Info
Explicit Objective: Solve THIS Task (egAB Test)
JUST IN CASE
DATA for all Possible FUTURE Questions
By DefaultTechnology Should:

Maximize Individual Information

Maximize Granularity

Maximize Linkability of Personal Info
Shadow Objective: Maximize Optionality

PrivacyEngineeringMethods
1.Pseudonymize / de-identify
2.Don’t Link All the Data
(No Big Table of all the data -have separate unlinked tables)
3.Enumeration -impose limited values/binsfor less granularity
4.Aggregation
1.K-Anonymization -Quasi-Identifiers
2.L-Diversity (won’t cover)
5.Differential Privacy -Make it noisyusing Laplace or Gaussian Noise
(won’t cover)

Example: Conductrics and Experimentation with Equivalence Classes
Whatis AB Testing
and why bother me
about it?

Why AB Testing?

New Drug have greater efficacy over a placebo or current treatment?

Does a new marketing campaign increase memberships?

Does a new layout on a travel search results page improve travel bookings?
Helps discover Causal relationships
Examples:

What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
A procedure that:

What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
A procedure that:
Age
HospitalizationVaccine
RANDOM
SELECTION

What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.

What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
????
6 6
Univariate

What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
????
6 6
Univariate
Multivariate
??5
?
6
?
?5

What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
2.Applies methods of Statistical Inference to draw conclusions *
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
????
6 6
Univariate
Multivariate
??5
?
6
?
?5

What is AB Testing?
AB Testing for Causal Inference
*Much arguing over WHAT methods –often it is second order stuff IMO.

Primary Tasks: AB Tests | MVT | Contextual Bandits
Univariate
Multivariate
AB Tests Factorial Designs / MVTs Contextual Bandits*
t-tests ANOVA/f-tests Regression/Prediction
Required Statistics

Supporting Tasks : SRM & AB Test Interference Checks
Univariate Multivariate
Sample Ratio Mismatch (SRM)
AB Test Interactions
Chi-Square Test Nested Partial f-tests
Required Statistics

Example: Conductrics and Experimentation with Equivalence Classes
…butwhat are
Equivalence Classes?

ABTest3ABTest2ABTest1TenureStatusSalesOther
Stuff
EmailPhoneId
cBA0None$0…[email protected]
-AA1Silver$4…[email protected]
bAB3Plat$16…[email protected]
-BA2Plat$15…[email protected]
aBB6Silver$3…[email protected]
-AA4Gold$5…[email protected]

Standard BIG TABLE (Collect AMAP and Link AMAP)
Experimentation data is stored with and id and appended
to other customer data.

ABTest3ABTest2ABTest1TenureStatusSalesOther
Stuff
EmailPhoneId
cBA0None0…[email protected]
-AA1Silver4…[email protected]
bAB3Plat16…[email protected]
-BA2Plat15…[email protected]
aBB6Silver3…[email protected]
-AA4Gold5…[email protected]

But for AB Tests we only need Sales and test assignments -none of the other data
Standard Collection / Storage

Just Enough-Task Level Data Storage
1.Each AB Test has its own separate data structure
2.Collect aggregate counts, conversion, and conversion^2 data by treatments
Aggregate data into Equivalence Classes (think Pivot Tables)
Equivalence Class for a simple AB Test

Global vs Local Privacy
•Global –Central Aggregator/Curator collects the detail/raw data.
Then applies these methods AFTERcollection to any data they
share.

Local –The data min/privacy methods are applied BEFOREthey
are collected and stored.

Global vs Local Privacy
Global Approach

Global vs Local Privacy
Global Approach
Secure Curator -Stores Nonprivate Data

Global vs Local Privacy
Global Approach
Secure Curator -Stores Nonprivate Data
Release Minimized/Anonymized data
-Use for Internal Data or Product Teams

Global vs Local Privacy
LocalApproach
No Secure Curator
Only Collect and Store Minimized/AnonymizedData

Just Enough-Local Collection and Task Level Data Storage
How can I collect
data only in
summary form?

Implementation Example

Data Minimization with Equivalence Classes
K-Anonymity
Efficient Data Storage
Efficient Computation
Encourages Intentional Thinking and Design
Benefits
Cons
Limits Methods
No Formal Privacy Guarentes
Loss of Optionality

Data Minimization with Equivalence Classes
What is K-
anonymity?

K-Anonymity
Hide in crowd of
K
other ‘equivalent’ people.

K-Anonymity
Easy to monitor and report on K
Search for Min(Count) in each table
E.g.Here K=1925

Efficient Data Storage
Known Max Size for Each Data Set
Size is bounded by Joint Cardinality regardless of number of individuals
E.g.Here the Max Rows = 4 even though N = 8,000

Why did you sum
the salesand the
sumof the
squared values of
sales?
Efficient Computations: Why Counts, Sums, and Sums of Squares?

Efficient Computations: Why Counts, Sums, and Sums of Squares?
Simple AB Tests only need:
1.Counts
2.Sum of Conversion values
3.Sum of Conversion^2
AB Test Data
Standard T-Test formula

?????
?
?
?
?
6
?
?
6
?
?
?
6
?
?
6
?
?
?
Efficient Computations: Why Counts, Sums, and Sums of Squares?
Simple AB Tests only need:
1.Counts
2.Sum of Conversion values
3.Sum of Conversion^2
AB Test Data
T-Test formula rewritten in terms of just these aggregate values

Efficient Computations: Why Counts, Sums, and Sums of Squares?
MVT/FactorialANOVA Problemsonly need:
1.Conditional Counts
2.Conditional Sum of Conversion values
3.ConditionalSum of Conversion^2
5 5 ?
? 5 ?
−1
??????
??????
??????
??????
??????
??????
??????
??????
??????

Equation for OLS Regression
OLS Regression Too!
*
6
?
?5
is not shown for brevity

Efficient Computations: Why Counts, Sums, and Sums of Squares?
User Level Data (413 rows) Regression Output from User Level (413 rows)

Efficient Computations: Why Counts, Sums, and Sums of Squares?
Equivalence Class Level Data (5 rows –covariate data 4 values + missing flag)
Regression Output from Equivalence Class Level (5 rows)

Efficient Computations: Why Counts, Sums, and Sums of Squares?
Equivalence Class Level Analysis (5 rows)User Level Analysis (413 rows)

The main value of experimentation/AB Testing programs is that they provide
a principled framework for organizations to act and learn intentionally.
Value of AB Testing

Well Defined Problems

Explicit Objectives

Make Decisions at the Margin

Data Minimization provides a principled framework for organizations to
think about and collect data intentionally.
Value of Data Minimization

Defined Problems

Explicit Objectives for the collection of Data

Consider the marginal value of next additional bit wrtsolving the problem

Thank you!

Questions?

Equivalence Class for a Multivariate Test
Store at the unique combination of desired data elements
Multivariate Equivalence Classes
Just Enough-Task Level Data Storage

What is AB Testing?
1.Assigns Experiences Randomly (Block Confounding)
A procedure that:
*Much arguing over WHAT methods –often it is second order stuff IMO.
Confounder
OutcomeTreatment
RANDOM
SELECTION

Learning from Observations via AB Testing
Right to Left = Turkey! Left to Right= Turkey!
Inference: Franklin Prefers Turkey to Pupperoni!