Video Super-Resolution for Optimized Bitrate and Green Online Streaming

VigneshVMenon 26 views 15 slides Sep 28, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Conventional per-title encoding schemes strive to optimize encoding resolutions to deliver the utmost perceptual quality for each bitrate ladder representation. Nevertheless, maintaining encoding time within an acceptable threshold is equally imperative in online streaming applications. Furthermore,...


Slide Content

Video Super-Resolution for Optimized Bitrate and Green
Online Streaming

Vignesh V Menon¹, Prajit T Rajendran², Amritha Premkumar³, Benjamin Bross¹, Detlev Marpe¹
¹Video Communication and Applications Dept., Fraunhofer HHI, Germany
²Universite Paris-Saclay, France
³Rheinland-Pfälzische Technische Universität, Germany
09.02.2024 © FraunhoferSeite 1

MHV’24
Introduction
13.06.2024 © FraunhoferSlide 2
HTTP Adaptive Streaming (HAS)
•HTTP Adaptive Streaming (HAS) has become the standard for delivering video content over various
internet speeds and devices.

•Importance:
○Real-time encoding is crucial for online streaming.
○Lowering encoding latency and energy consumption is imperative for green streaming.

•ViSOR: A new scheme that leverages client-side Video Super-Resolution (VSR) to optimize encoding
bitrate ladders.

MHV’24
Introduction
13.06.2024 © FraunhoferSlide 3
Video super-resolution (VSR)
●Client devices today have increased processing capability, enabling them to perform deep-learning-based VSR.

●Benefits:
○Enhances the perceptual quality of low-resolution bitstreams.
○Improves viewer retention and satisfaction by delivering higher quality video.
○Optimizes bitrate allocation, lowering data transmission costs.

●Popular VSR Models:
○FSRCNN (Fast Super-Resolution Convolutional Neural Network)
○ESPCN (Efficient Sub-Pixel Convolutional Neural Network)
○EDSR (Enhanced Deep Residual Networks for Single Image Super-Resolution)
○EVSRNet (Efficient Video Super-Resolution with Neural Architecture Search)
○CARN (Cascading Residual Network)
○SRGAN (Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network)
○RBPN (Recurrent Back-Projection Network for Video Super-Resolution)

MHV’24
Problem statement
13.06.2024 © FraunhoferSlide 4
●Current Encoding Schemes:
○Conventional per-title encoding optimizes resolutions for perceptual
quality but can be energy-intensive.
○Fast encoding without compromising quality is a challenge.
●Client-side VSR:
○Modern devices can perform VSR, enhancing the perceptual quality of
lower resolution streams.
○This opens the possibility for more efficient encoding on the server side.
Figure: Encoding results (encoding time and VMAF) of Characters_s000 and
Dolls_s000 sequences of VCD dataset encoded at various resolutions, with
and without client-side VSR using EDSR.
This work explores the possibility of offloading the computationally intensive task of VSR
from the server to the client devices.
●This allows the server to encode video at lower resolutions and bitrates,
significantly lowering the energy required for encoding while still delivering
high-quality video playback through client-side upscaling.

MHV’24
ViSOR architecture
13.06.2024 © FraunhoferSlide 5
Workflow:
●Spatiotemporal complexity feature extraction
●Optimized resolution prediction
●JND-aware representation elimination
Components:
●Input parameters:
○Set of supported resolutions,
○Set of bitrates,
○Maximum acceptable encoding latency,
○Target JND,
○Maximum quality threshold, and
○Target VSR model.
●Outputs:
○Optimized encoding bitrate ladder.
Figure: Encoding architecture using ViSOR for green online streaming.

MHV’24
Spatiotemporal complexity feature extraction
13.06.2024 © FraunhoferSlide 6
We use seven DCT-energy-based features extracted using Video Complexity Analyzer (VCA):

●average texture energy (EY),
●average gradient of the luma texture energy (h)
●average luma brightness (LY),
●average chroma texture energy of U and V channels (EU and EV)
●average chroma brightness of U and V channels (LU and LV).

MHV’24
Optimized resolution prediction
13.06.2024 © FraunhoferSlide 7
●Process: Two-part approach involving modeling and optimization.
●Goal: Maximize perceptual quality after VSR while adhering to real-time processing constraints.
The perceptual quality and encoding time of the representation (rt, bt) rely on the extracted video complexity features, encoding resolution, and
target bitrate:

●A higher resolution, and/or bitrate may improve the quality and increase the file size of the encoded video segment.
●Similarly, a higher resolution, and/or bitrate can increase the encoding duration.
●Random forest models are trained to predict these parameters.
Modeling:
Optimization:
ViSOR optimizes the perceptual quality (in terms of VMAF) of encoded video segments while adhering to real-time processing constraints.

MHV’24
JND-aware representation elimination
13.06.2024 © FraunhoferSlide 8
●Eliminate redundancies: Identify and remove video representations
that are perceptually redundant based on the Just Noticeable
Difference (JND) threshold.

●Quality check: Retain representations only if their VMAF score
difference is above the JND threshold or if they exceed the
maximum VMAF threshold.

●Efficiency: This process reduces encoding energy and storage
costs by maintaining only the essential high-quality video
representations.

MHV’24
Experimental setup
13.06.2024 © FraunhoferSlide 9
Dataset:
●500 sequences from the Video Complexity Dataset (VCD).
●80% used for training, 20% for testing.
Hardware:
●Dual-processor server with Intel Xeon Gold 5218R (80 cores, 2.10 GHz)
●NVIDIA GeForce GT 710 GPU for VSR
Software:
●x265 encoder (ultrafast preset)
●FSRCNN for VSR evaluation
Methodology:
●Encode sequences at 30 fps with the fastest encoding preset.
●Measure VSR results on the GPU.
●Compare ViSOR with
○Default: This scheme employs a fixed bitrate ladder, a fixed set of bitrate-resolution
pairs.
○OPTE: This scheme predicts optimized resolution, which yields the highest VMAF
for a given target bitrate.
Table: Experimental parameters used to evaluate ViSOR.

MHV’24
Prediction analysis
13.06.2024 © FraunhoferSlide 10
Accuracy:
▪Mean Absolute Error (MAE) for VMAF prediction: 4.47
▪MAE for encoding time prediction: 0.22 s

Efficiency:
▪Feature extraction rate: 352 fps
▪Total inference time for a 4 s video segment (2160p): 0.37 s

Additional Latency:
▪Minimal due to concurrent feature extraction and encoding processes

Key Points:
▪High prediction accuracy for perceptual quality and encoding time
▪Efficient feature extraction with negligible added latency
Note: The energy consumption of VSR is not included in this analysis because real-time VSR is considered a future implementation and its energy
efficiency will depend on advancements in hardware and optimization techniques.

MHV’24
Rate-distortion and encoding time analysis
13.06.2024 © FraunhoferSlide 11
Figure: Rate-distortion (RD) curves and encoding times of the representative video sequences (segments) using Default encoding (blue line),
OPTE (purple line), ViSOR without VSR (red line), and ViSOR with FSRCNN-based VSR (green line).

MHV’24
Encoding latency and energy consumption
13.06.2024 © FraunhoferSlide 12
Key Insights:
Without VSR:
●Highest quality configuration (infinite latency) shows significant
bitrate distortion reduction and energy savings.
●Ultra-low latency configurations (1 s) achieve substantial energy
savings across different JND thresholds.
●Energy savings and encoding time vary significantly based on the
target max. latency and JND settings.
With FSRCNN-based VSR:
●Shows improved bitrate-distortion reduction and better video quality
(BD-PSNR and BD-VMAF) across all configurations.
●Energy savings are higher owing to the selection of lower
resolutions with VSR.
●Encoding time is consistently lower with VSR, enhancing real-time
processing capabilities.

MHV’24
Conclusions
13.06.2024 © FraunhoferSlide 13
Significant Bitrate Reduction:
●ViSOR achieves notable bitrate reduction compared to default encoding.
●Average bitrate savings: 24.65% for the same PSNR, 32.70% for the same VMAF.

Energy and Storage Savings:
●Energy consumption reduced by up to 68.21%.
●Storage consumption decreased by up to 79.32%.

Enhanced Quality:
●Higher perceptual video quality achieved through client-side VSR.
●Reduction in blocking artifacts and ringing effects.

Efficiency:
●Real-time encoding with minimal added latency.
●Efficient feature extraction and prediction processes.

MHV’24
Future directions
13.06.2024 © FraunhoferSlide 14
Adoption of emerging codecs:
●Integration with advanced codecs like Versatile Video Coding (VVC) for further bitrate and quality improvements.

Device-specific optimization:
●Tailoring bitrate ladders to optimize performance for various client devices (e.g., smartphones, tablets, smart TVs).

Standards compliance:
●Aligning with Common Media Client Data (CMCD) for improved user experience and industry standardization.

Energy-efficient VSR Models:
●Development and integration of more energy-efficient VSR models to further enhance energy savings without compromising quality.

Dynamic adaptation:
●Implementing adaptive algorithms that adjust in real-time based on network conditions and device capabilities.

User experience enhancement:
●Focus on improving overall user experience by reducing latency and increasing video quality through continuous feedback and
optimization loops.

Thank you for your attention

▪Vignesh V Menon ([email protected])
▪Prajit T Rajendran ([email protected])
▪Amritha Premkumar ([email protected])
▪Benjamin Bross ([email protected])
▪Detlev Marpe ([email protected])