Video Super-Resolution for Optimized Bitrate and Green Online Streaming
VigneshVMenon
26 views
15 slides
Sep 28, 2024
Slide 1 of 15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
About This Presentation
Conventional per-title encoding schemes strive to optimize encoding resolutions to deliver the utmost perceptual quality for each bitrate ladder representation. Nevertheless, maintaining encoding time within an acceptable threshold is equally imperative in online streaming applications. Furthermore,...
Conventional per-title encoding schemes strive to optimize encoding resolutions to deliver the utmost perceptual quality for each bitrate ladder representation. Nevertheless, maintaining encoding time within an acceptable threshold is equally imperative in online streaming applications. Furthermore, modern client devices are equipped with the capability for fast deep-learning-based video super-resolution (VSR) techniques, enhancing the perceptual quality of the decoded bitstream. This suggests that opting for lower resolutions in representations during the encoding process can curtail the
overall energy consumption without substantially compromising perceptual quality. In this context, this paper introduces a video super-resolution-based latency-aware optimized bitrate encoding scheme (ViSOR) designed for online adaptive streaming
applications. ViSOR determines the encoding resolution for each
target bitrate, ensuring the highest achievable perceptual quality after VSR within the bound of a maximum acceptable latency. Random forest-based prediction models are trained to predict the perceptual quality after VSR and the encoding time for each
resolution using the spatiotemporal features extracted for each video segment. Experimental results show that ViSOR targeting
fast super-resolution convolutional neural network (FSRCNN) achieves an overall average bitrate reduction of 24.65 % and 32.70 % to maintain the same PSNR and VMAF, compared to the HTTP Live Streaming (HLS) bitrate ladder encoding of 4 s segments using the x265 encoder, when the maximum acceptable latency for each representation is set as two seconds. Considering a just noticeable difference (JND) of six VMAF points, the average cumulative storage consumption and encoding energy for each segment is reduced by 79.32 % and 68.21 %, respectively, contributing towards greener streaming.
●Benefits:
○Enhances the perceptual quality of low-resolution bitstreams.
○Improves viewer retention and satisfaction by delivering higher quality video.
○Optimizes bitrate allocation, lowering data transmission costs.
●Popular VSR Models:
○FSRCNN (Fast Super-Resolution Convolutional Neural Network)
○ESPCN (Efficient Sub-Pixel Convolutional Neural Network)
○EDSR (Enhanced Deep Residual Networks for Single Image Super-Resolution)
○EVSRNet (Efficient Video Super-Resolution with Neural Architecture Search)
○CARN (Cascading Residual Network)
○SRGAN (Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network)
○RBPN (Recurrent Back-Projection Network for Video Super-Resolution)
●average texture energy (EY),
●average gradient of the luma texture energy (h)
●average luma brightness (LY),
●average chroma texture energy of U and V channels (EU and EV)
●average chroma brightness of U and V channels (LU and LV).
●A higher resolution, and/or bitrate may improve the quality and increase the file size of the encoded video segment.
●Similarly, a higher resolution, and/or bitrate can increase the encoding duration.
●Random forest models are trained to predict these parameters.
Modeling:
Optimization:
ViSOR optimizes the perceptual quality (in terms of VMAF) of encoded video segments while adhering to real-time processing constraints.
Efficiency:
▪Feature extraction rate: 352 fps
▪Total inference time for a 4 s video segment (2160p): 0.37 s
Additional Latency:
▪Minimal due to concurrent feature extraction and encoding processes
Key Points:
▪High prediction accuracy for perceptual quality and encoding time
▪Efficient feature extraction with negligible added latency
Note: The energy consumption of VSR is not included in this analysis because real-time VSR is considered a future implementation and its energy
efficiency will depend on advancements in hardware and optimization techniques.
Device-specific optimization:
●Tailoring bitrate ladders to optimize performance for various client devices (e.g., smartphones, tablets, smart TVs).
Standards compliance:
●Aligning with Common Media Client Data (CMCD) for improved user experience and industry standardization.
Energy-efficient VSR Models:
●Development and integration of more energy-efficient VSR models to further enhance energy savings without compromising quality.
Dynamic adaptation:
●Implementing adaptive algorithms that adjust in real-time based on network conditions and device capabilities.
User experience enhancement:
●Focus on improving overall user experience by reducing latency and increasing video quality through continuous feedback and
optimization loops.