Commerce at the Limit - Marketecture - October 2025_v5.pptx
EricSeufert
353 views
50 slides
Nov 02, 2025
Slide 1 of 50
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
About This Presentation
This presentation provides an overview of the concept of "Commerce at the limit," which argues that AI enablement in advertising aims to achieve two outcomes:
- That every ad exposure generates value equivalent to its potential
- That every business that could potentially derive value from...
This presentation provides an overview of the concept of "Commerce at the limit," which argues that AI enablement in advertising aims to achieve two outcomes:
- That every ad exposure generates value equivalent to its potential
- That every business that could potentially derive value from digital advertising, does
The second half of the presentation is a literature review of recent (published in the last ~12 months) papers on topics related to the appliction of AI / ML to digital advertising.
Size: 3.69 MB
Language: en
Added: Nov 02, 2025
Slides: 50 pages
Slide Content
Commerce at the Limit: Eric Benjamin Seufert Mobile Dev Memo Marketecture Live – NYC - October 2025 Frontier applications of AI to digital advertising
About Me
Previously
Agenda Commerce at the limit Applications: Ad Ranking Creative production / testing
Commerce at the limit
Commerce at the limit
Commerce at the limit Mark Zuckerberg, May 2025 ( Stratechery ): https:// stratechery.com /2025/an-interview-with-meta-ceo-mark-zuckerberg-about-ai-and-the-evolution-of-social-media/
Commerce at the limit “I’d characterize this model as Commerce at the Limit: the fulfillment of complete optimization across every component of the digital advertising process such that commercial performance attains its theoretical maximum.” https:// mobiledevmemo.com /commerce-at-the-limit/
AI enablement within the advertising ecosystem aims to achieve two theoretical outcomes: That every ad exposure generates value equivalent to its potential That every business that could potentially derive value from digital advertising, does Commerce at the limit
Commerce at the limit Advertiser Participation Ad conversion
Commerce at the limit Advertiser Participation Ad conversion How does this happen?
Commerce at the limit Advertiser Participation Ad conversion How does this happen? Automation (“Black Box” tools)
Commerce at the limit Advertiser Participation Ad conversion How does this happen? Automation (“Black Box” tools) Better relevancy scoring
Commerce at the limit Advertiser Participation Ad conversion How does this happen? Automation (“Black Box” tools) Better relevancy scoring More exhaustive experimentation / audience evaluation
Commerce at the limit Advertiser Participation Ad conversion How does this happen? Automation (“Black Box” tools) Better relevancy scoring More exhaustive experimentation / audience evaluation Faster / higher-fidelity conversion signals
Commerce at the limit “Black box” ad optimization and satisficer’s remorse” (March 2025): https:// mobiledevmemo.com /ai-advertising-optimization-and- satisficers -regret/
Commerce at the limit Meta Q2 2025 earnings transcript: https://s21.q4cdn.com/399680738/files/ doc_financials /2025/q2/META-Q2-2025-Earnings-Call-Transcript.pdf
Commerce at the limit https:// www.census.gov /retail/ mrts /www/data/pdf/ ec_current.pdf
C ommerce at the limit https:// www.nobelprize.org /prizes/economic-sciences/2025/popular-information/
The AI opportunity 2.
The AI opportunity https:// arxiv.org /pdf/2508.05206 Ad Platform
The AI opportunity Product Monetization / Optimization Creative Production / Testing Personalization Advertiser
The AI opportunity Goal is to facilitate users’ discovery of most relevant products
The AI opportunity Goal is to facilitate users’ discovery of most relevant products Being implemented with “AI” (deep learning) methodologies across:
The AI opportunity Goal is to facilitate users’ discovery of most relevant products Being implemented with “AI” (deep learning) methodologies across: Content generation
The AI opportunity Goal is to facilitate users’ discovery of most relevant products Being implemented with “AI” (deep learning) methodologies across: Content generation Sequence prediction
The AI opportunity Goal is to facilitate users’ discovery of most relevant products Being implemented with “AI” (deep learning) methodologies across: Content generation Sequence prediction Natural language processing
The AI opportunity Goal is to facilitate users’ discovery of most relevant products Being implemented with “AI” (deep learning) methodologies across: Content generation Sequence prediction Natural language processing User-level content recommendation
The AI opportunity Goal is to facilitate users’ discovery of most relevant products Being implemented with “AI” (deep learning) methodologies across: Content generation Sequence prediction Natural language processing User-level content recommendation Reinforcement learning
The AI opportunity Some of the products are conspicuous / obvious, but many of them are hidden to the user
The AI opportunity Some of the products are conspicuous / obvious, but many of them are hidden to the user Implementations of AI are often tethered to outputs / the things we see
The AI opportunity “ Seizing the agentic AI advantage ” (June 2025): https:// www.mckinsey.com /capabilities/ quantumblack /our-insights/seizing-the-agentic-ai-advantage#/
The AI opportunity Some of the products are conspicuous / obvious, but many of them are hidden to the user Implementations of AI are often tethered to outputs / the things we see Much of the value being unlocked by “AI” in the advertising use case is being derived from infrastructure improvements
Applications
Applications: ad ranking The Two Towers paradigm has been used for many years for content recommendation
Applications: ad ranking “ Deep Neural Networks for YouTube Recommendations” (2016) https:// static.googleusercontent.com /media/ research.google.com / en //pubs/archive/45530.pdf?uclick_id=192fe054-dbe7-4c85-8312-5632aadbf7a0
Applications: ad ranking The Two Towers paradigm has been used for many years for content recommendation, canonical paper from YouTube With Two Towers, candidate model learns a user embedding tower and an item (video) embedding tower (matrix of embeddings for individual videos, each video is row) The embedding model is trained by stepping through T time steps of a user’s watch and search history User embedding u t is average of watch history + context features w t is the label for actual video the user watched next v j is set of embeddings for sampled videos the user did not watch next v i any video’s embedding
Model predicts probability that w t = i based on dot product, uses sampled softmax to compute loss: If dot products for v i dominates sum of dot products for negative samples, negative log loss is small (model rewarded) Back propagation updates weights for video embeddings and the user tower Applications: ad ranking
Applications: ad ranking What if the user has multiple interests?
Applications: ad ranking What if the user has multiple interests? Vanilla Two Towers works well when a user’s interests are monolithic but breaks down when they are categorically diverse or hierarchical
Applications: ad ranking “ Multi-Interest Network with Dynamic Routing for Recommendation at Tmall” (2019) https:// dl.acm.org / doi /pdf/10.1145/3357384.3357814
Applications: ad ranking In MIND (Multi-Interest Network with Dynamic Routing): All of a user’s behavioral interactions (clicks, purchases, likes, etc.) are embedded A behavior-to-interest dynamic routing layer clusters the embeddings into K representation vectors (interest groups / “capsules”), where some capsules might be deactivated for certain users (thus, dynamic: not all users have the same number of interests) During training, an output vector, v, is produced through a label-aware attention layer that matches a candidate item e i to the most relevant “capsule” via softmax : During inference, all K user interest embeddings are compared to candidate item embeddings, and the top N (by dot product) are returned
Applications: ad ranking But ads introduce a further complication: bids!
Applications: ad ranking But ads introduce a further complication: bids! In the typical ad serving infrastructure, retrieval is bid agnostic , and thus the goals of the retrieval and ranking steps might be out of alignment: Calculating eCPM for billions of candidates is computationally infeasible, so retrieval optimizes for CTR , ignoring advertiser bid https:// arxiv.org /pdf/2508.05206
Applications: ad ranking Inconsistency with low-powered retrieval (billions of ad candidates) and high-powered ranking (tens of curated candidates) -> low-bid, high CTR vs. eCPM In BAR (Bidding-Aware Retrieval): A “learning-to-rank” mechanism is trained to rank ads by eCPM (CTR * bid) Model is trained using ”hard” and “easy” pairs: Hard: (returned from retrieval | actually shown) Easy: (random | returned from retrieval)
Applications: ad ranking In BAR (Bidding-Aware Retrieval): “Bidding-aware modeling” enforces a monotonic relationship between predicted eCPM and bidding features with a loss function: Predicted eCPM is compared to eCPM of a perturbed variation of actual: I = direction of perturbation (1=pos, -1=neg) a~ = perturbed, a = actual
Applications: ad ranking Task-Attention Refinement: Uses fused embedding (user and ad features, the Two Towers) used to predict CTR, bid attributes, and eCPM with three separate model branches Uses distillation from existing ranking model ( pCTR , pBID ) Composite loss: (lambdas=weight hyperparameters)
Applications: ad ranking “ Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine” (2024) https:// engineering.fb.com /2024/12/02/production-engineering/meta-andromeda-advantage-automation-next-gen-personalized-ads-retrieval-engine/
Applications: creative “ Diffusion models for ad creative production” (2025) https:// mobiledevmemo.com /diffusion-models-for-ad-creative-production/ Reverse Process, minimize MSE between true noise and predicted noise at each time step:
Applications: creative As diffusion models become more prevalent, ad ranking faces a more substantial task!