Behavioural advertising data - how did we get here?

BenShepherd2 27 views 27 slides Jul 15, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

Last week I posted a few things about the cost and challenges of bad behavioural advertising data. A lot of people messaged me and asked me a simple question - how does this happen, and how can we fix it?

So I created the below document outlining both. I hope it provides colour as to why bad data i...


Slide Content

Ben Shepherd [email protected]
Behavioural advertising data: how
did we get here?
1

Ben Shepherd [email protected]
Some caveats and context around this document
2
Data can be a powerful
fuel for advertising
I personally am a believer
in good data being used to
target advertisements.
Some behavioural data
providers and companies
are excellent
There are some platforms
in market with robust,
deep and broad data that is
highly valuable.
The majority of these have
consumer facing
components of their
businesses.
A large chunk of data
providers and aggregators
are noise
Most data is not robust,
deep or broad. There are
hundreds if not thousands
of bad actors in this space
and it’s super lucrative in a
category with minimal
rigour or professional
scepticism and significant
funding of industry
discussion areas by the bad
actors.
Marketers globally are
wasting billions on bad
noise and it hurts the
legitimacy of the
profession
Marketers need to resolve
this as investment in bad
data is harming legitimate
value creating activities.
Asking questions about data provenance isn’t luddite behaviour. In fact, not asking reasonable questions about
this area and demanding high quality is the real delusion.

Ben Shepherd [email protected]
Summary: there is a huge misalignment in incentives between many
data collectors/aggregators and agencies, and the marketers they are
meant to serve
1.Every data broker needs claimed audience scale
2.Revenue and incentive for the data broker is entirely linked to scale of high yield segments
3.Buyers/intermediaries are also incentivised on scale, so are aligned in the high use of often
poor data
4.Hence, there is zero incentive to throttle scale on either side of the sell to improve accuracy
as accuracy creates lower returns
5.Signal accuracy and governance naturally throttles scale
6.Therefore, there is limited/no effort to improve either accuracy or governance
7.This issue is magnified as there is limited/no marketer requirement for provenance
information of proof of source for the data. This creates no accountability.
The method of data monetisation prevents robust accuracy as the more accurate a segment is, the smaller the
segment is, the smaller the segment is the less it can be monetised and identified/matched.
3

Ben Shepherd [email protected]
How is this behavioural advertising data collated and obtained?

There are 7 core ways.
4

Ben Shepherd [email protected]
Data used for behavioural targeting is collated in a lot of different ways, and
is often co-mingled, mixed, shared and used out of original context
Method obtainedExample(s) Accuracy
Provided by the
individual
Google profile based on provided birthdate, Meta targeting based on engagement on platform High
Extracted from the
individual
Location data scraped via an app, attached to an ID. Technically compliant but not obtained via
normal functionality of app
Mixed
Cookies Tracked visitation of a website visit. Mixed
SDK Often mobile apps are built using SDK’s, some of these SDK’s in exchange for free or cheap use of
the SDK use them to obtain data on users and resell this (monetising the SDK asset)
Mixed
Third parties Third party brokers can obtain data through all of these methods, the premise is they have a wide
view of a candidate/ID and seek to buy more information to populate them
Mixed
Public records Census data, ABS data, often aggregated data used to infer information on peoples
location/profession/income etc etc
Mixed
Scraping/extraction Automated extraction of website data to aggregate information available online Mixed
5

Ben Shepherd [email protected]
At its core, any behavioural data that is robust is comprised of two
foundation elements: an interest signal and power/value signals
A demonstration of signals related to interest in a
specific area or purchase occasion. For example
Automotive
Sporting goods
Fashion
Enterprise software
Car insurance
CPG
Interest signals Power + Value signals
A consistent demonstration of the ability and means to
make the purchase or influence the decision.For example
Income
Active research
Location
Category buyer
Hasn’t already purchased
Decision maker history (i.e. has done it before)
+
Consider these like aiming a dart at a dartboard. Interest is the precision of the line taken, Power and value is
ensuring the dart has enough momentum to reach the board.
6

Ben Shepherd [email protected]
And both of these interest and power signals additionally need to satisfy
two areas in order to be sufficiently accurate and consistent
Accuracy of signal Consistency of signal
Is it inferred (is the signal based on an inference - i.e.
reading an article on Pilates means you’re a gym member)
Is it express (the signal is based on a real, verified behaviour
- i.e. researching baby seats)
Is the signal context accurate (is the signal taken out of its
context - for instance is reading an article about labor
politics a relevant demonstration of someone’s political
leaning)
Is the signal fresh (when was the signal)
Is the behaviour only seen a single time (has the claimed
behaviour only been demonstrated once, is it enough)
Are there multiple instances of the behaviour (there are multiple
data points to validate the behaviour claim)
Are the signals connected/logical (is there a logic between
behaviours, are they connected or is their a false mix of inferred
and express info)
Are the signals non contradictory (is there any data to dispute
the behavioural claim that needs to be considered)
Do they collectively pass the test of being robust (is the
behavioural signal adequately deep, consistent and accurate)
A robust behavioural segment needs to satisfy both of these areas. An accurate signal seen only once is not
enough, and poor signals seen consistently are inconclusive
+
7

Ben Shepherd [email protected]
Robust behavioural data therefore must satisfy these 4 criteria to a high
level of proof (a combination of data points)
Interest signal
Visited multiple auto OEM
website
Visited multiple SUV specific
pages
Configured SUV across at
least one OEM
Has visited 1 dealer
Power signal
Income over $100,000 (as
SUV is $55,000)
Aged between 25-49 (as x% of
buyers are within this demo)
Accuracy
Obtained interest signals
directly from credible source
(website, browser provider)
Dealer location signal
accurate
Obtained within last 15-30
days
Consistency
Multiple activities
Digital and physical touch
point
Power signals are MECE and
not overlapped
Let’s use the example of the legitimate in-market SUV car buyer
If all of these signals (or a material amount) are satisfied the prospect can be confidently behaviourally categorised
8

Ben Shepherd [email protected]
But a lot of data is very shallow around observed behaviour (often just
based one data point), creating an abundance of noise
Interest signal
Scraped GPS data near a car
dealer
Read a story tagged
automotive
Typed in a car brand based
on their friend telling them
are buying one

Power signal
Has kids
Looks like or shows similar
broad habits to a ‘car buyer’
profile
Has a kid who plays
weekend sport
Lives in a suburb with a
large amount of households
with 2 kids


Accuracy
Inferred from past census
data
Location scraped via an SDK
Content categorisation from
automated function


Consistency
Single point of behaviour
Data 30+ days old
Let’s use the example of the overly optimistic data segment of in-market SUV buyer
With shallow behavioural claims, if just one of these are satisfied the prospect can be categorised
9

Ben Shepherd [email protected]
Illustration: A tale of 2 “SUV car buyers” - the ‘and’ & the ‘or’
Example 1 - the AND
Visited multiple OEM website, AND
Visited 1x SUV model page, AND
Has visited other auto related research content,
AND
Income over $100k, AND
Signals within last 14 days, AND
Income over $100k

Example 2 - the OR
SDK scraped that device was near a Kia dealer
30 days ago, OR
Read car related article on web last 30 days, OR
Lives in suburb with high propensity of kids in
household, OR
Broadly fits demographic of a car buyer

Both of these profiles are very different, but the marketer has no idea of the provenance of either of these sources
10
Valuable and likely to
lead to improved
outcomes
Low chance of
better efficacy than
no targeting

Ben Shepherd [email protected]
Ok, it makes sense all good data used for behavioural targeting satisfies
these 4 criteria.

So how come the system is broken?
11

Ben Shepherd [email protected]
12
The participants motives are not aligned (which is fine), but the buyer
has no visibility on what product they’re buying and no recourse for it
not resembling what was proposed, and the seller has complete
visibility.

It’s unbalanced.

Ben Shepherd [email protected]
The behavioural data economy is reliant on 4 participants
Participant 1/ Data broker 2/ Ad agency or intermediary3/ Publisher/inventory
source
4/ Marketer
Buyer or seller Seller Seller + Buyer Seller + Buyer Buyer (and unpaid seller)
Detail Data brokers sell
data to participants
2, 3 and 4. Their goal
is to maximise the
volume of these
segments sold
Ad agencies and intermediaries
are often buying segments off
brokers/sources, and often selling
or re-selling segments to their
own clients
Publishers sometimes buy
segments off data brokers or
ad agencies, they also often
sell data to brokers (directly)
or ad agencies (indirectly)
The marketer financially is a
buyer, but all the tracking pixels
buried deep within marketer
assets are often used by data
sellers to sell to competitors in an
aggregated form.
Incentive Volume of prospects
in valuable data
segments
Volume of prospects in valuable
data segments (as this is being
attached to inventory and then
run through their plumbing)
Yield upside through
appending of ‘valuable’ data
to inventory. Incentive is to
maximise return on
consumer engagement/page
views
Reach eligible buyers and make
them more likely to buy your
product. Create incremental sales
through advertising activity.
Method of
creating value
Charging on CPM
based on volume of
data segments used
Generally charging on CPM based
on volume of inventory+ data
segments used
Charging on CPM and
activating at volume
Incremental sales at a reasonable
level of profit (either short term,
long term, or both)
The issue is the interests of each participant are not aligned, especially between ultimate buyer and seller. The
seller wants volume (due to CPM based monetisation), the buyer needs quality + relevance
13

Ben Shepherd [email protected]
And the formula for robust behavioural targeting via inventory is an
intricate one where multiple elements must be satisfied
Interest signal
Power signal
Consistency of signals
Accuracy of signals
Correct use of profile
Match rate via platform
Quality inventory
Human served
Human viewed
and
and
and
and
and
and
and
and
Data broker + ad
agency/intermediary
Ad agency / intermediary
Ad agency / intermediary
/ publisher
In market prospect
Involved participants
A marketer needs:
1/ High signal efficacy
2/ High signal accuracy
3/ Correct use of claimed
behaviour
4/ Inventory match at material
level
5/ Human served inventory
6/ Human viewed
IN ORDER TO EVEN HAVE A
SINGLE OPPORTUNITY TO MAKE
A POSITIVE IMPRESSION ON A
POTENTIAL BUYER
14

Ben Shepherd [email protected]
Brokers and intermediaries make their money on the volume of
inventory they can attach profiles to, so the goal is to maximise the
volume of IDs sitting in ‘high value’ segments.
15

Ben Shepherd [email protected]
Brokers and intermediaries are remunerated via scale/volume and their main
incentive is to maximise the volume of ‘high value’ segments
Low - Interest signal value - High
Low - Power signal value - High
4/ High yield - high
interest, high power
3/ Low yield - low
interest, high
power
1/ Low yield - high
interest, low power
2/ Low yield - low
interest, low power
The only financial incentive is to
load up this quadrant with as
much volume as possible
Brokers and aggregators need to sell the
prospect of high value signals
Behavioural targeting is predicated on costing less to the marketer
than exposing all consumers to an ad message
Advertisers believe segments are meant to be exhibiting high value
signals across interest and power, and the majority of demand falls
into these
It is likely the majority of segments demonstrate low power signals
and often discrete (i.e. non consistent ones), but there is no
incentive financially to label these as such
The result are ‘high power’ segments that are much larger in scale
than the legitimate amount of people that realistically fall into these
categories

Low value segments will not have
any interest from advertisers, so
incur collation cost but no
monetisation event
16

Ben Shepherd [email protected]
Brokers and intermediaries are remunerated via scale/volume and their main incentive is
to maximise the volume of ‘high value’ segments

4/ High yield - high
interest, high power
3/ Low yield - low
interest, high
power
1/ Low yield - high
interest, low power
2/ Low yield - low
interest, low power
Quadrant
Overview
Strong amount of interest
in a topic (for instance they
may read a lot about cars
and automotive but their
interest is likely hobby or
curiosity), low amount of
power (insufficient
income, out of normal
buying demographic, no
legitimate buyer signs)
Weak interest in topic,
week power to buy. This is
generally where low value
signals (like a location
ping) are taken as concrete
in-market signs of future
intent
The segment behavioural
targeting is predicated on
- right person, right time,
right message.
Implies they have a strong
need for the product, fit
the buyer profile and have
exhibited credible
behaviour, and are being
reached in an environment
or context that is
congruent
An example of this is
targeting a father of kids
who play sport via a
reasonably credible signal,
but who has shown zero
signal of wanting to buy a
car, and implying that
because they sit in the
demographic segment they
are a legitimate candidate
and in-market
Low yield segments generate immaterial return, so there is a natural incentive to reallocate IDs in low yield
segments into the high yield segment to boost revenue opportunity
Low yield segments become represented as high yield ones
17

Ben Shepherd [email protected]
The reality is most segment profile markets are likely dominated by a
long tail of weak signals
LOW - Legitimate signal accuracy - HIGH
Data provider companies
Accurate: High
accuracy platforms
Inconsistent: Hit and
miss middle
Made up: Tail of
endless rubbish
The challenge for advertisers and marketers is it is not clear
outside of the ‘high accuracy platforms’ where their money
is being allocated.
Australia and most markets have hundreds of operators in the
tail who are generating material business.
So, whilst every marketer assumes they’re safely in the zone of
‘high accuracy’, it’s likely most of them are deep in the tail of
endless rubbish and making million dollar investment
decisions based around these faulty segments.
18

Ben Shepherd [email protected]
Advertisers make their money on the incremental sales
communications generate, so the goal is to minimise ‘wastage’ and limit
advertisements to those prospects with high power/efficacy relevancy
signals.

Advertisers are primarily paying to AVOID low quality signals and
everything is predicated on precision and accuracy.
19

Ben Shepherd [email protected]
Advertisers are remunerated via incremental sales, and to rationalise the investment into
data segments need the segments to have high signal efficacy

LOW - Interest signal efficacy - HIGH
LOW - Power signal efficacy - HIGH
Low value - high
interest, low power
accuracy
No value - low
accuracy across
both
Behavioural targeting is
specifically designed to focus on
this segment and is meant to
remove any exposure or resource
allocation to other segments
Advertisers are looking only for potential
buyers who satisfy their target state
Marketers are investing into targeted segments primarily to
reduce ‘wastage’ of advertising spend and as a secondary
increase ‘relevance’ by only serving communications to people
who are likely to buy their product or category.
Poor signal efficacy contradicts both of these as is distributes
advertiser communication to non category buyers, and also
places the advertiser brand in front of people with no need for
it. Most importantly, the advertiser is paying more for the
exposure as the inventory is significantly more expensive due to
the targeting applied.

3/ High value -
robust interest and
power accuracy
1/ Low yield - high
interest, low power
accuracy
2/ No value - low
accuracy across
either dimension
4/ Low value - high
power accuracy, low
signal accuracy
20

Ben Shepherd [email protected]
Advertisers and marketers require accuracy above all else, without accuracy signals do
not hold much concrete value.

Quadrant
Overview
One element of equation
(interest) is credible, but
the underlying power
implications are incorrect
(which can create a 16yo
who is super interested in
cars being passed off as in
market SUV buyer).
Both elements are weak
signals, being represented
as non-weak (i.e. someone
has an inaccurate GPS
scrape near a car dealer
and this is represented as
in market car buyer)
Accuracy and consistency
across both. This is a
profile that would pass a
reasonable definition of an
in market car buyer
Power signals could be
robust (i.e. age, income)
but the signal attached (car
buyer) is based off a
shallow and potentially
misrepresented behaviour.
Low yield segments generate immaterial return, so there is a natural incentive to reallocate IDs in low yield
segments into the high yield segment to boost revenue opportunity
3/ High value -
robust interest and
power accuracy
1/ Low yield - high
interest, low power
accuracy
2/ No value - low
accuracy across
either dimension
4/ Low value - high
power accuracy, low
signal accuracy
Advertisers want only segments that
fall into this quadrant
21

Ben Shepherd [email protected]
The reality for the advertiser means most of their inventory is being
distributed to people who aren’t prospects or potential buyers
LOW - Segment accuracy - HIGH
Segments used
High efficacy minority: A
small amount targeted to
legitimate prospects
Mid efficacy middle: A
small middle zone of
mixed effect
Low efficacy mass: A large
volume being invested into
people with incorrect
behavioural claims
The challenge for advertisers and marketers is it is not clear
outside of the ‘high accuracy platforms’ where their money
is being allocated.
Australia and most markets have hundreds of operators in the
tail who are generating material business.
So, whilst every marketer assumes they’re safely in the zone of
‘high accuracy’, it’s likely most of them are deep in the tail of
endless rubbish and making million dollar investment
decisions based around these faulty segments.
22

Ben Shepherd [email protected]
So how does a marketer fix this? (they are the only ones who realistically
can)
23

Ben Shepherd [email protected]
The incentive for marketers is three-fold, the only downside is having to undertake some
difficult conversations

According to the IAB in Australia
there was $5.7b spent on display
advertising in 2023

Let’s assume behavioural
targeting is applied to most of this
and represents conservatively
20% of the total spend, that’s a $1b
sector in terms of investment
from Australian marketers

A removal or optimisation of
40-50% of this collectively has
$4-500m in saving potential for
marketers as a collective. This
doesn’t factor into account
intermediary fees and inventory
costs (which could be as much as
2-3b collectively
A removal of weak or
misrepresented signals will
mean the aggregate results of
targeting when focused on
accurate and robust data will
improve.

This may reduce the volume
of inventory and investment,
but should improve
significantly the yield at the
unit level.

A reduction in poor targeting
options will create less noise
for agencies and execution
partners, allowing for less
time using poor behavioural
segments that can be
allocated in other ways
The $4-500m in forecast
saving from purely
reduction in poor targeting
related resource could be
used to build sustained
brand value in other areas.

Or it could be used in
other parts of the business
- R&D, distribution etc -
or reallocated back to
shareholders.
3/ Effective
redistribution of
marketing funds
1/ Significant financial
saving
2/ Significant targeting
efficacy improvement
So what do I as a marketer, manager of a
marketer, or advisor to a marketer need to do
right now to improve this?
24

Ben Shepherd [email protected]
The first step is for marketers to demand basic data provenance proof

25
Data provenance is the ultimate unanswered and avoided question when it
comes to behavioural advertising data

Ben Shepherd [email protected]
Interest signal
Power signal
Consistency of signals
Accuracy of signals
Correct use of profile
Match rate via
platform
Quality inventory
Human served
Human viewed Intelligent reporting
The second is that marketers need to demand a sufficient element of proof around the
meeting of the five variables below.

1/ Ensure data
provenance
2/ Ensure correct
context
3/ Ensure quality
inventory
4/ Ensure human
interaction
4/ Report in
adequate depth
26
Simply ask your partners for data provenance
evidence in relation to the four signals that good
data possesses.

And ensure any segment is used in the context it
is meant for (i.e. don’t take medication for a
condition you don’t have)
Understand that the conduit between your
message and the prospective buyer is the
inventory it is communicated with. Make sure
people can see the inventory, and make sure
they’re actually human people.
Generate reporting
which demonstrates
the premium of result
versus the premium of
cost across CPM,
awareness and
favourability

Ben Shepherd [email protected]
If you want to know more about how to ensure the data your organisation is using
is up to scratch, send me a message.
[email protected]
http://linkedin.com/in/shepherdieu
27
Tags