Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based
webmaxru
10 views
25 slides
Oct 20, 2025
Slide 1 of 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
About This Presentation
Powerful generative AI features are quickly becoming a baseline in modern development. Potential blockers include privacy concerns, the need for a stable connection, and the costs associated with using or hosting models. However, we can now leverage generative AI directly in the browser on the user&...
Powerful generative AI features are quickly becoming a baseline in modern development. Potential blockers include privacy concerns, the need for a stable connection, and the costs associated with using or hosting models. However, we can now leverage generative AI directly in the browser on the user's device using emerging Web APIs like WebNN, combined with higher-level frameworks, for a better developer experience.
In my session, I’ll discuss the current state of in-browser ML and AI features, compare the main players, and show you how to start building an offline-ready, future-proof, standards-based web application.
Size: 1.21 MB
Language: en
Added: Oct 20, 2025
Slides: 25 pages
Slide Content
ELT layout
Privacy-first in-browser
Generative AI web apps:
•offline-ready,
•future-proof,
•standards-based
Maxim Salnikov
AI Developer Tools Solution Engineer at
Microsoft
•Building on web platform since 90s
•Organizing developer communities and
technical conferences
•Speaking, training, blogging: Webdev,
Cloud, Generative AI, Prompt Engineering
•Member of Web Machine Learning
Community Group
Helping developers to succeed with the Dev Tools, Cloud & AI in Microsoft
I’m Maxim Salnikov
Making Machine Learning a first-class web citizen by incubating Web APIs for machine learning
inference in the browser and in products using modern web engines
ELT layout
Native AI in
the browser.
Standardized.
We use web
(61% of PC
time)
We use AI
(> 1B people)
We [will] have
AI-capable
devices
We want
performance,
privacy,
offline-ready.
All FREE!
@Dev: unified
codebase
@Dev: handy
abstractions
ELT layout
Web Neural Network API (WebNN)
Near native execution characteristics: both
speed and power efficiency
Heterogeneous hardware execution: CPU, GPU,
NPU
Unified abstraction: W3C API standard
Model-agnostic: General computational graph
allows to BYOM
Compatible with existing ML frameworks
https://www.w3.org/TR/webnn/
ELT layout
All starts from the usecases
https://www.w3.org/TR/webnn/#usecases
•Person Detection
•Semantic Segmentation
•Skeleton Detection
•Face Recognition
•Facial Landmark Detection
•Style Transfer
•Super Resolution
•Image Captioning
•Text-to-image
•Machine Translation
•Emotion Analysis
•Video Summarization
•Noise Suppression
•Speech Recognition
•Text Generation
•Detecting fake video
ELT layout
Edge AI ecosystem
CPU GPU NPU
Native
ML APIs
Web Browser
(e.g., Chrome/Edge)
Frameworks
Use cases
WebNN
JavaScript Runtime
(e.g., Electron/Node.js)
Noise
Suppression
Image
Classification
Background
Segmentation
TensorFlow.js
ONNX Runtime
Web
MediaPipe Web
Natural
Language
Hardware
CoreMLDirectML
Web API
Web
Engines
OpenCV.js
WebAssembly WebGPU
Object
Detection
TFLite Other ML OS APIs
Windows Studio
Effects
API extensions
ELT layout
Which hardware to choose for AI workloads
CPU: Provides the broadest compatibility and usability across all
client devices with varying degrees of performance.
GPU: Provides the broadest range of achievable performance across
graphics hardware platforms from consumer devices to professional
workstations.
NPU: Provides power efficiency for sustained workloads across
hardware platforms with purpose-built accelerators.
ELT layout
WebNN for the users
Low Latency
In-browser inference enables novel use cases with local media sources
Privacy Preserving
User data stays on-device and preserves user-privacy
High Availability
No reliance on the network after initial asset caching for offline case
Low Cost
Computing on client devices means no server farms needed.
https://webmachinelearning.github.io/webnn-intro/
ELT layout
WebNN for the developers
Get capabilities from the underlying hardware innovations
Take advantage of the native OS services for machine learning
Benefit web applications and frameworks including ONNX
Runtime Web, TensorFlow.js
Implement consistent, efficient, and reliable AI experiences on the
web
https://webmachinelearning.github.io/webnn-intro/
ELT layout
Pre-requisites
https://microsoft.github.io/webnn-developer-preview/install.html
about://flags#web-machine-learning-neural-network
Canary or Dev versions of the Edge or Chrome
Enabling NPU: latest drivers for Intel | ARM
ELT layout
Context, operand, graph…
ELT layout
Let’s build an app
AI
usecases
Platform AI
capabilities
WebNN API
Web frontend app
ONNX Web Runtime
Transformers.js
Low-level, operates execution graph
Mid-level, operates inference sessions,
defines model format (ONNX)
High-level*, operates task-based pipelines,
handles model fetching & caching
* - level distribution is relative
ELT layout
What is ONNX?
https://onnxruntime.ai/
https://onnx.ai/
ONNX is an open format built to represent machine learning models.ONNX defines a common set of
operators - the building blocks of machine learning and deep learning models - and a common file format to
enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
ONNX Runtime is a production-grade AI engine to speed up training and inferencing in your existing
technology stack.
ELT layout
What is Transformers.js?
https://github.com/huggingface/transformers.js
Natural Language Processing:
text classification, named entity recognition, question answering, language modelling,
summarization, translation, multiple choice, and text generation
Computer Vision:
image classification, object detection, segmentation, and depth estimation
Audio:
automatic speech recognition, audio classification, and text-to-speech
Multimodal:
embeddings, zero-shot audio classification, zero-shot image classification, and zero-shot object
detection
Directly in the browser, with task-based APIs
ELT layout
Plus:
https://github.com/huggingface/transformers.js
2,3K+ hosted pretrained models (subset of Hugging Face catalog)
https://huggingface.co/models?library=transformers.js
Seamless caching of the models (with the Cache Storage API)
Serving your own models (converted to the ONNX format)
ELT layout
Task-based API
https://huggingface.co/docs/transformers.js/en/pipelines
import { pipeline } from '@huggingface/transformers’;
const classifier = await pipeline('sentiment-analysis’);
const result = await classifier('I love AI!’);
// [{'label': 'POSITIVE', 'score': 0.9998}]
ELT layout
Summary and call to action:
•Web standard for running AI/ML tasks in the browser natively is here
•It’s the only way to leverage all in-device AI capabilities
•There are still some moving parts in the specification (device selection)
•Choose your own comfortable abstraction level using higher-level
frameworks
•Same frameworks could provide fallback mechanisms to handle
API/device availability fallbacks
•User experience first! Offline-readiness, web workers, providing
choices
ELT layout
Demo repos:
https://github.com/webmaxru/nextjs-webnn
•AI-capable: Transformers.js
(under the hood: ONNX Web Runtime, WebNN)
•WebGPU, WebNN, NPU features detection
•Smooth UX: AI computation is in the web worker
•Offline-ready: Workbox
https://github.com/webmaxru/ng-ai
AngularReact + Next.js
ELT layout
NEW! Prompt and Writing Assistance APIs
•New APIs for Web Developers: Prompt API, Writing Assistance APIs
(Summarizer, Writer, Rewriter), and Translator API now available in
Canary/Dev of Edge and Chrome via Origin Trial
•Built-in Local Model: Uses small, efficient language models
integrated into the browsers — no cloud calls or token costs.
•Easy-to-Use JavaScript Interfaces: Just a few lines of client-side
code enable powerful AI features like text generation, summarization,
and rewriting.
•Improved Privacy & Performance: No external model hosting,
optimized for real-time, low-latency AI on supported devices.
ELT layout
// Summarize an article with added context and desired style and length.
const summarizer = await Summarizer.create({
sharedContext: "An article from the Daily Economic News magazine" ,
type: "tl;dr",
length: "short"
});
const summary = await summarizer.summarize(articleEl.textContent, {
context: "This article was written 2024 -08-07 and is in the World Markets
section."
});
// Write a blurb based on the provided prompt and tone.
const writer = await Writer.create({
tone: "formal"
});
const result = await writer.write(
"A draft for an inquiry to my bank about how to enable wire transfers on my
account"
);
Show me the code!
ELT layout
How to get started
Read the explainers and try the playground
Identify real-life usecases and build:
AI-powered search
Personalized news feeds and custom content filters
Calendar event creation
Seamless contact extraction
Share feedback on GitHub to shape future development
https://github.com/webmachinelearning/prompt-api
https://github.com/webmachinelearning/writing-assistance-apis
ELT layout
Thank you! Connect with me on LinkedIn to:
•Get this deck immediately
•Get support with WebNN and
other AI technologies
•Follow latest Gen AI updates for
the developers