In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an op...
In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70%, a reduction in the total number of CPU threads used by 63.83%, and a 37.87% reduction in the overall encoding time, considering a JND of six VMAF points.
Size: 3.22 MB
Language: en
Added: Apr 25, 2024
Slides: 15 pages
Slide Content
Optimal Quality and Efficiency in Adaptive Live Streaming with
JND-Aware Low latency Encoding
Vignesh V Menon
1
, Jingwen Zhu
2
, Prajit T Rajendran
3
, Samira Afzal
4
,
Klaus Schoeffmann
4
, Patrick Le Callet
2
, Christian Timmerer
1
1
Video Communication and Applications Dept., Fraunhofer HHI, Berlin, Germany
2
Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes Universite, Nantes, France
3
CEA, List, F-91120 Palaiseau, Universit´e Paris-Saclay, France
4
Alpen-Adria-Universit¨at, Klagenfurt, Austria
13 Feb 2024
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 1
Outline
1
Introduction
2
JND-Aware Low latency Encoding (JALE)
3
Experimental validation
4
Conclusions
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 2
Introduction
Introduction
The cloud server can significantly streamline the encoding process by dynamically adjusting CPU
thread counts based on resolution and bitrate, accommodating diverse video qualities within an
adaptive streaming environment.010203040506070809101112
Representation ID
15
30
60
120
240
480
Encoding speed (in fps)
4 threads
8 threads
16 threads
Figure:
1
for theWoods000sequence
2
usingultrafastpreset of x265
3
using 4, 8, and 16 CPU threads for each representation.
1
Apple Inc. url:
https://developer.apple.com/documentation/http_live_streaming/hls_authoring_specification_for_apple_devices .
2
Hadi Amirpour et al. Proceedings of the 13th ACM Multimedia Systems Conference. New York, NY, USA: Association
for Computing Machinery, 2022, 234–239.isbn: 9781450392839.doi:10.1145/3524273.3532892.url:https://doi.org/10.1145/3524273.3532892.
3
VideoLAN. url:https://www.videolan.org/developers/x265.html.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 3
Introduction
Optimized encoding preset
Traditional open-source encoders like x264,
4
x265, and VVenC
5
have pre-defined sets of
encoding parameters (termed aspresets), which present a trade-off between the encoding
time and compression efficiency.
6
If the encoder preset is configured such that this higher encoding speed can be reduced while
still being compatible with the expected live encoding speed, the quality of the encoded
content achieved by the encoder can be improved.
Subsequently, when the content becomes complex again, the encoder preset needs to be
reconfigured to move back to the faster configuration that achieves live encoding speed.
7
By employing efficient storage techniques and removing unnecessary representations, the energy
consumption associated with storing and transmitting redundant data can be minimized.
8
4
VideoLAN. url:https://www.videolan.org/developers/x264.html.
5
Adam Wieckowski et al. Proc. IEEE International Conference on Multimedia Expo
Workshops (ICMEW). 2021, pp. 1–2.doi:10.1109/ICMEW53276.2021.9455944.
6
Dieison Silveira, Marcelo Porto, and Sergio Bampi. 2017 25th European
Signal Processing Conference (EUSIPCO). 2017, pp. 1519–1523.doi:10.23919/EUSIPCO.2017.8081463.
7
Sergey Zvezdakov, Denis Kondranin, and Dmitriy Vatolin. 2021 Picture Coding
Symposium (PCS). 2021, pp. 1–5.doi:10.1109/PCS50896.2021.9477507.
8
Quortex.Mission: Emission.https://www.quortex.io/wp- content/uploads/2022/05/WhitePaper- Mission_- Emission.pdf. [Accessed May 2023]. 2022.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 4
JND-Aware Low latency Encoding (JALE)
JALE
Figure: JALEenvisioned in this paper.
JALEis classified into three steps:
1
video complexity feature extraction,
2
joint thread count and preset prediction,
3
perceptually-redundant representation elimination.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 5
JND-Aware Low latency Encoding (JALE) Spatiotemporal complexity feature extraction
Spatiotemporal complexity feature extraction
Three DCT-energy-based features, the average luma texture energyEY, the average gradient of
the luma texture energyh, and the average luminescenceLYare used.
9,10
(a) (b)EY(c)h(d)LY
Figure: {EY,h,LY}extracted from the second frame
ofCoverSong1080P0a86video of Youtube UGC Dataset.
9
Vignesh V Menon et al. Proceedings of the First International
Workshop on Green Multimedia Systems. 2023, 16–18.isbn: 9798400701962.doi:10.1145/3593908.3593942.url:
https://doi.org/10.1145/3593908.3593942.
10
Vignesh V Menon et al. IEEE Transactions on Circuits and Systems for
Video Technology. 2023, pp. 1–1.doi:10.1109/TCSVT.2023.3290725.url:https://doi.org/10.1109/TCSVT.2023.3290725.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 6
JND-Aware Low latency Encoding (JALE) Joint thread count and preset prediction
Joint thread count and preset prediction
Selection of the optimized thread count-preset pair for each segment per representation based
on the video content complexity is decomposed into two parts:
1
train models to predict the encoding speed for each thread count-preset pair,
2
develop a function to obtain the optimized thread count- preset pair for each representation.
The encoding speed of thet
th
representation of the input video segment (st) is modeled as:
st=fS(EY,h,LY,rt,bt,nt,pt) (1)
We use random forest models
11
to predict the encoding speed for each thread count-preset pair.
(PT×CT) models are trained, wherePTandCTrepresent the number of encoding presets and
the number of supported thread count per instance, respectively.
11
Leo Breiman. Machine Learning. Vol. 45. 2001.doi:10.1023/A:1010933404324.url:
https://doi.org/10.1023/A:1010933404324.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 7
JND-Aware Low latency Encoding (JALE) Joint thread count and preset prediction
Joint thread count and preset prediction
The optimized thread count-preset prediction function has a look-up table of (ˆnt,ˆpt) pairs.
(4, medium)(8,medium)(12,medium)(16,medium)(20,medium)(24,medium)(4,fast)(8,fast)(12,fast)(16,fast)(20,fast)(24,fast)(4,faster)(8,faster)(12,faster)(16,faster)(20,faster)(24,faster)(4,veryfast)(8,veryfast)(12,veryfast)(16,veryfast)(20,veryfast)(24,veryfast)(4,superfast)(8,superfast)(12,superfast)(16,superfast)(20,superfast)(24,superfast)(4,ultrafast)(8,ultrafast)(12,ultrafast)(16,ultrafast)(20,ultrafast)(24,ultrafast)
Figure: n,ˆp) look-up table used in the experimental validation of this paper.
The priority of (ˆnt,ˆpt) pairs is decided based on the following constraints:
1
the achieved encoding speed ˆstof thet
th
representation must be greater than or equal to
the target encoding speedsT,i.e., ˆst≥sT.
2
total number of CPU threads used for each representation is minimized.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 8
JND-Aware Low latency Encoding (JALE) Perceptually-redundant representation elimination
Perceptually-redundant representation elimination
Input:
q: number of representations inR
R=
S
q
t=1
{(rt,bt,ˆnt,ˆpt)}: representations with predicted thread count and preset
ˆvt; 1≤t≤q: predicted VMAF
vT: maximum VMAF threshold
vJ: average target JND
Output:ˆR= (r,b,ˆn,ˆp): set of encoding configurations
ˆR ← {(r1,b1,ˆn1,ˆp1)}
u←1
ifˆv1≥vTthen
returnˆR
t←2
whilet≤qdo
ifˆvt−ˆvu≥vJthen
ˆR ←ˆR ∪ {(rt,bt,ˆnt,ˆpt)}
u←t
ifˆvt≥vTthen
returnˆR
t←t+ 1
returnˆR
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 9
Experimental validation
Experimental parameters
Table: JALEused in this paper.
Parameter Symbol Values
Set of representations
Resolution height [pixels]
R
36043254054054072072010801080144021602160
Bitrate [Mbps] 0.1450.3000.6000.9001.6002.4003.4004.5005.8008.10011.60016.800
Set of presets [x265] P 0 (ultrafast) – 5 (medium)
Set of supported thread counts C 4 8 12 16 20 24
Total CPU threads N 96
Encoding speed threshold [fps] sT 30
Average target JND vJ 2 4 6
Maximum VMAF threshold vT 98 96 94
Benchmark schemes:
1
Default:ultrafastpreset with eight threads for each encoding instance.
12 2
Bruteforce: optimized thread count-preset pair with and without JND-based representation elimi-
nation when the models are fully accurate. This is accomplished by bruteforce encoding using all
thread count-preset pairs and selecting the optimized pair.
13
Hence, it is suitable only for video-on-
demand applications.
3
CAPS
14
determines the optimized preset for each representation for a target encoding speed of 30 fps.
We evaluateCAPSwherec=4, 8, and 16, respectively.
12
Apple Inc., “HLS Authoring Specification for Apple Devices”.
13
Jan De Cock et al. 2016 IEEE International Conference on Image Processing (ICIP). 2016,
pp. 1484–1488.doi:10.1109/ICIP.2016.7532605.
14
Vignesh V Menon et al. 2022 Picture Coding Symposium (PCS). 2022,
pp. 253–257.doi:10.1109/PCS56426.2022.10018034.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 10
Experimental validation
Results0.2 0.5 1.2 3.0 8.016.8
Bitrate (in Mbps)
0
1
2
3
4
5
Preset
4
8
12
16
20
24
Threads
(a)0.2 0.5 1.2 3.0 8.016.8
Bitrate (in Mbps)
20
40
80
160
320
Encoding speed (in fps)
Default
CAPS (c=8)
CAPS (c=16)
JALE (b)0.2 0.5 1.2 3.0 8.016.8
Bitrate (in Mbps)
20
40
60
80
100
Encoding speed (in fps)
Default
Bruteforce
CAPS (c=8)
CAPS (c=16)
JALE (c)
Figure: JALE. JND-based representation elimination is not considered
in these plots.
Faster encoding presets and more computational resources are needed to encode high-
bitrate representations so that the encoding speed is above the threshold.
JALEcontrols the encoding speed to be greater thansTbut not significantly higher than
thedefaultencoding. This ensures higher CPU utilization when the encodings are carried
out concurrently during a live feed.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 11
Conclusions
Conclusions
We proposedJALE, a JND-aware low latency encoding scheme for adaptive live streaming
applications.
JALEjointly predicts the optimized encoder preset and CPU thread count for a given
representation for each video segment based on the video content complexity features,
target encoding speed, and the total number of available CPU threads.
The JND-based representation elimination algorithm removes perceptually redundant rep-
resentations in the bitrate ladder.
JALEyields an overall average quality improvement of 0.98dBPSNR and 4.41 VMAF
points at the same bitrate, compared to the x265ultrafastencoding of the reference HLS
bitrate ladder using eight CPU threads for each representation.
Considering a JND of six VMAF points, storage, thread count, and encoding time reductions
of 72.70 %, 63.83 %, and 37.87 %, respectively, are observed.
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 14
Q & A
Q & A
Thank you for your attention!
Vignesh V Menon ([email protected])
Vignesh V Menon Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding 15