CBUSDAW Oct 2024 - Geo Testing with Sanjay Tamrakar

JasonPacker 209 views 25 slides Oct 17, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Columbus Data & Analytics Wednesday, Geo-Testing with Sanjay Tamrakar


Slide Content

Enhancing Business Insights
through Geo-Testing
Sanjay Tamrakar
Lead Data Scientist
Safelite

➢Lead Data Scientist, Safelite
➢BS Mathematics, MS Statistics
➢Currently Pursuing MBA, Finance
➢7+ years in Data Science
About Me
https://www.linkedin.com/in/sanjay-tamrakar/

Agenda
➢ Why User Based Testing is not feasible ?
➢Differences in Differences
➢How to chose pilot markets ?
➢What is Synthetic Control ?
➢Geolift ( R-package by Meta)

Control (A): sees
existing
experience
50%
50% Test (B): sees
new experience
User Based Testing (Company Website)

Why Geo-test ??
➢How much should I spend on linear TV ?
➢ What is my RoAS (Return on Ad Spend)
➢ YouTube? (Skippable / Non –skippable)
➢ How are the radio ads performing ?
➢ Are the ads working equally on all the
Geos?

Definitions
➢Geos : DMA (Designated Market Area), MSA (Metropolitan Statistical Area),
Zip code, State
➢Incremental Lift : Increase in a desired metric (such as sales or conversions)
that results from a marketing activity

How can we measure the incremental lift ?
➢Difference in differences : comparing the changes in outcomes over
time between a treatment group and a control group
➢Pilot Markets : based on various business metrics, run a hierarchical
clustering and choose the closest geos as test/control
➢GeoX : uses a linear model to predict counterfactual time-series data
➢Synthetic control (Geolift) : uses Synthetic Control Methods SCMs
create an artificial control group by combining untreated units that
closely replicate the treated unit

Difference in differences
➢eBay ran controlled experiments to study the revenue
impact of SEM (Search Engine Marketing)
➢The DMA was chosen as a Geo and then the DMAs were
randomly split as the treatment and control group
➢eBay tracked their revenues in each DMA using the shipping
address of customers
➢For 2 months, June and July, eBay stopped SEM in a
treatment group of 65 out of 210 DMAs

➢DiD estimator for the causal effect
of treatment computes how the
difference between the treatment
and control group has changed in
the experimental period compared
to the pre-experimental period
➢Very intuitive natural estimator that
under comparatively weak
conditions consistently estimated
the causal effect
➢DiD = (ȳexp,tr -ȳexp,co) - (ȳpre,tr -ȳpre,co)
= (100.7 -128.7) – (105.8-132.4)
= -28 – (-26.6) = -1.4K
(thousand per DMA)
ȳexp,tr = post treatment
ȳexp,co = post control
ȳpre,tr = pre treatment
ȳpre,co = pre control

Assumptions and Drawbacks of DiD
➢Parallel trends assumption
➢Based on the Mean
➢Struggles with time-varying
confounders that change
differently across groups
over time
➢Costly when performed in
all the DMAs

Pilot Markets
Geo
Apt
rate
RevenueSeasonNPSOther
Texas56% 30K 1.3 55 …
Florida52% 50K 0.8 65 …
Arkansas38% 152K 0.5 89 …
Tennessee45% 200K 1.6 92 …
➢Select the metrics based on the
Geo, that the business is more
interested on
➢Provide weights based on how
important the factors are by the
nature of the test
➢Run hierarchical clustering to
determine the closest Geo’s
➢Assign control/treatment labels

➢Select from each of the clusters to maintain randomization
➢Compare very similar districts with each other based on the
business metrics
➢Perform DiD on similar districts as control / treatment
➢Indiana (Treatment), Missouri (Control)
Montana (Treatment), South Dakota (Control)
v

Synthetic Control
➢Statistical method to evaluate the effect of an
intervention in comparative case studies
➢Constructs a weighted combination of control groups
for comparison with the treatment group.
➢Estimates what would have happened to the
treatment group without the intervention.
➢Accounts for time-varying confounders by weighting
the control group to match the treatment group
before the intervention.
➢Allows systematic selection of comparison groups.

➢year 1970 to 2000 from 39 states
➢Syn California = w1 * Texas + w2 *
Oklahoma + w3 * Nevada + …
where, w1 + w2 + … + wn =1
That minimizes the absolute value
California and synthetic control
➢pre-intervention period to build a
synthetic control
➢Issues with the pre-fit and post-read
Effect of cigarette taxation
on its consumption

➢ 38 states (37 parameters)
➢Time is large(T=18) but the
number of parameters(n=38) is
also large, flexibility to Linear
Models
➢Ridge/ Lasso
➢Syn California = w1 * Texas + w2 *
Oklahoma + w3 * Nevada + …
where, w1 + w2 + … + wn =1
That minimizes the absolute value
California and synthetic control

Geolift
➢Determining/selecting the right Geo’s for test
and control
➢Importance of Pre-period
(Weather Event / Digital Web changes/ Pricing
Discounts)
➢Comparing to just one control is not ideal
➢Power (Determine the Minimum Detectable
Effect (MDE), length of the test period)

Geolift Walkthrough
➢ Data Ingestion and EDA ( Exploratory Data
Analysis)
➢ Power Analysis
➢ Analyze the Test Results
➢ Inference
➢ Model Iteration (Lasso/Ridge/Generalized
Synthetic Control Model)
Link : https://facebookincubator.github.io/GeoLift/docs/GettingStarted/Walkthrough

Step 1 : Data Ingestion and EDA
➢Location
➢Date (yyyy-mm-dd)
➢Number of
conversions (Y)

Step2 : Power Analysis
➢The optimal number of test locations
➢Best test duration
➢Select the ideal test and control markets
➢Determine Minimum Detectable Effect

Step 3: Analyzing the test results

Step 3: Cont..

Benefits of Geolift (SCMs)
➢ Empower Brands with Data-driven
insights
➢ Enhance Media Mix Evaluation
➢ Incremental Impact of Campaigns
➢ Privacy / Scalability
➢Handling of Confounding Factors

Thank you !!
Connect on:
https://www.linkedin.com/in/sanjay
-tamrakar/
Tags