Mon, there are robots
in my meeting!
Saúl Ibarra Corretgé
Once upon a time…
Set of Open Source projects
Easily deploy scalable and secure video
conferencing solutions.
APIs and mobile SDKs
Integrate Jitsi into existing products to add video
conferencing capabilities.
Community
Open Source enthusiasts from all around the world
contribute to Jitsi.
JaaS
Hosted Jitsi, by 8x8 — jaas.8x8.vc
2 API levels
•High level: iframe based, very
versatile and customisable
•Low level: your own UI
Product + Platform
•Work is spread across both
•Improvements in the in-
meeting experience
•Improvements in APIs for
embedders
The AI gold rush
AI + meetings = ?
Alright, let’s build a chatbot!
•It needs to join the meeting as
a participant
•Interact via text (built-in chat)
•Private messages for questions
•Hot word detection?
•Audio output?
•Video output?
How hard can it be?
Demos
Recorded, no sweat !
What was that?
•Node app running Chrome via Playwright
•Bot written with lib-jitsi-meet, our low level API
•Google for transcriptions
•ChatGPT for AI
•Zapier for emails
What was that?
•Node app running Chrome via Playwright
•Bot written with lib-jitsi-meet, our low level API
•Google for transcriptions
•ChatGPT for AI
•PlayHT for playing back audio
•WebGL for the moving robot
But we had other robots before…
•Our recorder is based on a Chrome instance running on a container
•PSTN access is accomplished with a server side WebRTC client
•Our robots are invisible though
•So we used Chrome because YOLO
Ian Malcolm, Jurassic Park
“Your scientists were so preoccupied with
whether or not they could, they didn't stop to
think if they should.”
How do we interact with chatbots?
Private text
conversations
(Mostly)
AI is personal
Let’s focus on the building blocks
•GenAI has text as its lingua
franca
•Whatever the input it gets
converted to text
•The some LLM operates on that
text
•Last, the result is converted to
the desired output
Text is King
Start by adding core
primitives
Transcriptions + Summaries
Real Time
Usable as subtitles
A10
How does it all fit together?
Getting to users
How would they know to turn transcriptions on?
Brave Talk
What are our customers doing?
How can you built this?
•Use transcription web hooks to
keep a full transcript
•Use transcription iframe events
to locally display the transcript
+ the full transcript to catch up
•Custom LLM (Brave doesn’t use
Skynet) for prompting
What we leaned
•Every time we tried to solve a problem with a robot there was a better way
to solve it without it
•In the context of a meeting
•Understanding transcriptions are a type of recording makes sense
•The first demo was poisonous, complexity wise; start simple!
Hopefully this helps others
Try it out!
•JaaS: https://jaas.8x8.vc — Get started for free
•meet.jit.si — Free Jitsi Meet instance
•github.com/jitsi — host it yourself
Future with AI
•Current generation of AI is very powerful but still ways to go
•A good future would involve small models for dedicated tasks
•Skynet
•Keep iterating
•Image summaries with LLaVA?
•Will we do bots?