Multivendor cloud production with VSF TR-11 - there and back again

kierank12 372 views 22 slides Jun 24, 2024
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

Multivendor cloud production with VSF
TR-11 - there and back again


Slide Content

Multivendor cloud production with VSF TR-11 - there and back again Kieran Kunhya – [email protected]

Company Overview Specialists in software-based encoders and decoders for Sport, News and Channel contribution (B2B) Based in Central London Build everything in house Hardware, firmware, software Not to be confused with:

Agenda What are the technical challenges with multivendor cloud production? How is VSF TR-11 (formerly known as Ground-Cloud-Cloud-Ground) solving these technical challenges? How can you help? Will talk about the principles instead of the implementation details Complicated topic (how can we simplify?)

Life in the Cloud The pandemic demonstrated the ability of cloud to scale-up compute-heavy network-heavy services: Zoom, Cloud-hosted email, Social Media, Amazon, Netflix etc … But television broadcast production is still mainly on-premise – nearly all mid/high end production is some variant of in-person. Cloud economics (scale-up/scale down) seems a great alternative to paying for resources that stay idle most of the time – what is stopping us?

Broadcast is next whether you like it or not Highly regulated industries, Healthcare, Finance, National Security already moving Sysadmins, database admins etc. all thought they were immune Cloud is eating the world

Ground-Cloud-Cloud-Ground (GCCG) I want to do mid/high end television in the cloud! The GCCG working group of the VSF is trying to solve these problems Now published as TR-11 draft + GitHub API

Moving Cloud production to the next level “But I’ve been doing live cloud production” – Yes and No Single Vendor Monolithic applications such as Channel-in-a-box, playout server, cloud switchers, use the cloud as a home, but not necessarily as a scalable architecture Proprietary Transports stifle innovation (IE6, Flash, Silverlight) To get widespread adoption we must have: Multi-vendor interoperation via standard APIs Appropriate-to-task picture quality levels Standards for Ground-Cloud-Cloud-Ground Agreed mechanism(s) for building workflows

Cloud production – What makes it difficult? Integration with the ground – both ways Must work into existing workflows SDI, ST 2110, satellite, cable, DTT Legacy Workflows have well-defined linear timing models ( e.g SDI, ST 2110-21, MPEG-TS VBV) Without a proper timing model, you end up with variable (undefined) latency One reason web streams are 20-30 seconds behind broadcast – They don’t have a timing model! What are my neighbours cheering about? Inter-cutting ground and cloud requires timing

Let’s do 2110 in the cloud Some people claiming to have 2110 in public cloud But it’s not possible right now in any public cloud: No (full) PTP in the cloud – all clouds handle time their own way Cloud networks are shared and have packet loss Other implementation challenges Is this even a good idea?

The end of linear, lockstep processing No, it’s not a good idea We don’t actually want linear, lockstep processing in cloud any more We DO want to allow cloud instances to process data non-linearly, sometimes faster or slower than real-time but on average real-time – known worst case How to handle “synthetic” sources ( e.g clips, graphics) played out from cloud? Cloud-native vs lift-and-shift

The end of linear, lockstep processing What does this mean in simple terms? Time Before: Processes operate with a strict lockstep and fixed interval After: Processes have variable delays but worst case is known Strict lockstep recoverable ( e.g by video encoder) for integration with ground Technical note: Analogous to MPEG VBV Video Frames from a process Video Frames from a process

Cloud-native transport To get the benefits of cloud, we also must trust the cloud i.e. Depend on cloud provider’s internal bulk-transport protocols Requirement is Throughput, with Reliability, in “bounded” time My data arrives correctly, in a constrained amount of time The Big Data community has similar needs for large data transfers Application may not have visibility of the internals of protocol (“black box”) Amazon Scalable Reliable Datagram (SRD) such an example Used in Amazon CDI (Cloud Digital Interface)

Amazon CDI How does the Amazon CDI protocol compare? Handles many of the challenges discussed An agreed way to exchange data between Amazon cloud instances. Defined pixel data structures, metadata ( e.g HDR) etc Amazon guarantees throughput, reliability and bounds latency A big step forward for the industry All well and good if you are in Amazon – what if you are not? How about a common API, with cloud vendor implementation under it? Amazon proposed CDI API as basis for GCCG

Summary so far Software/cloud applications don’t process media in a linear lockstep fashion They operate with variable delays – fine if you know the worst case Have to depend on cloud-specific transport (not necessary IP) As long as cloud provider can offer a guarantee everything arrives on time Cloud native and not “lift-and-shift” (Dinner party take-away)

VSF GCCG working group The GCCG working group is addressing this set of problems The last difficult technical problem in broadcast production (personal view): How can I do a complex multicamera production in the cloud, with comparable latency to on-premises and get it to the viewer? (or partial elements in the cloud) Numerous technical challenges https://vsf.tv/Ground-Cloud-Cloud-Ground.shtml

TR-11 “time floating” model Vocabulary (about each process step in the cloud) Linear vs non-Linear – why? “Real-time is relative” How early or late a “Media Element” ( e.g video frame) can arrive Allow variability in the handoffs, but with an ability to predict the outcome Some processes must reconcile the variable inputs into a consistent output Must bound the input buffering (latency) yet accommodate the variability Majority of delay is processing delay, some delay from transport Applications (Workflow Steps) advertise their worst-case delay Dependent on resolution/framerate, cloud instance type, algorithms etc

Why does this timing model matter? Allows the Workflow Step ( e.g a video encoder) at the end of the chain to linearise for delivery to ground A current problem: “Why is the transport stream from my cloud production system flagging warnings?” They don’t understand variable delay timing models Often hiding timing model issues by increasing latency But proper method is to know worst-case ( minimises latency)

Building a Virtual Facility Use existing standards from Ground-Cloud and Cloud-Ground (TR-08/09 or H.264/5 in TS) For inter-instance (intra-cloud) coordinated handoff (a “virtual facility”) Identify senders and receivers (use NMOS IS-04 extended for the purpose) Initiate and manage connections (NMOS IS-05 extended) What is the content description lingo? (JSON collection based on 2110-20 vocabulary) What are the transport params for interchange? (provider-specific, registered in AMWA register) What is the timing description specification? (This is defined in TR-11) Data packing options matter for energy efficiency (Peter B speaking tomorrow). 2110 pgroups not software friendly but exist already.

What Next? TR-11 draft published: https://www.vsf.tv/download/technical_recommendations/VSF_TR-11_2024-02-21-draft.pdf API on GitHub: https://github.com/vsf-tv/gccg-api/ Read and open GitHub Issues/Discussions Ask your vendors to do the same Can we simplify?
Tags