End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming

christian.timmerer 40 views 38 slides Jul 11, 2024
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

HTTP Adaptive Streaming (HAS) has risen to prominent acclaim as the prevailing approach for distributing video content across the Internet. The emergence of popular online streaming platforms, which mainly leverage HAS, has led to a surge in the number of users actively generating and consuming high...


Slide Content

End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming BABAK TARAGHI Univ.-Prof. DI Dr. Christian TimmereR Assoc.-Prof. DI Dr. Mathias Lux Assoc.-Prof. DI Dr. Klaus SchöffmanN Assoc.-Prof. DI Dr. Ali Cengiz Begěn Class of 2020 ATHENA Christian Doppler (CD) Laboratory ITEC - Institute of Information Technology

Agenda Introduction and Context (9 minutes) Evaluation Frameworks (8 minutes) Studies on QoE Impacting Factors (12 minutes) Comprehensive Dataset Presentation (7 minutes) Highlights and Future Directions (3 minutes) Q&A

Introduction

HTTP Adaptive Streaming I Figure 1: HTTP Adaptive Streaming (HAS) concept and how the delivered quality of segments depends on the shape of the network. 4

Provisioning Codecs and Encoders, Encryptors Delivery Network Protocols, and Topologies Consumption Media players and ABR algorithms HTTP Adaptive Streaming II Consumption Delivery Provisioning 5 End-to-end Aspect

Quality of Experience I The degree of delight or annoyance of the user of an application or service. It results from the fulfilment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user’s personality and current state. – Brunnström et al. [27] 6

Quality of Experience II How to evaluate or measure the degree of annoyance or delightfulness of the user Objective Evaluation Understand and formulate the metrics Start-up Delay: How long does it take for the user to see the first frame of the video from the moment s/he clicks the play button? Delivered Media Quality: What is the delivered media quality at each moment and in average? E.g.: VMAF, Resolution, and Bitrate Stall Events (rebuffering): How many times a stall event happens and for how long? Using quality models Subjective Evaluation Investigate the perceived quality by user Conduct evaluation with human subjects 7

Research Questions 8

Empirical Research Methodology An approach to investigation that relies on direct or indirect observation and experience to gather data and generate knowledge. It involves systematically collecting and analysing empirical evidence, such as measurements, experiments, and observations, to test hypotheses and validate theories. Data-driven Assessment Real-world Evaluation and User-Centric Perspective Allows Objective and Subjective Measures Objective: Unbiased and quantifiable, using predetermined criteria and standards [9] Subjective: The process of assessment based on personal opinions, feelings, or individual judgments [9] Supports Iterative Improvement Helps with Industry and Standardization Research Methodology 9

10 Contributations

Evaluation Frameworks 11

A Quality of Experience evaluation framework for HTTP Adaptive Streaming Facilitates an organized and structured evaluation Test environment remains the same; therefore, the results can be interpreted as improved performance or otherwise Its cloud-based, since scalability is a key factor Enabled to assess multiple ABR algorithms and media simultaneously Simulates network conditions; accepts network traces as plugins Mimics real world network characteristics scenarios Provides unified insights into quality metrics Measures raw metrics Works seamlessly with analytic tools (graphs and plots) CAdViSE (What?) 12

Application Layer Runner, Initializer and Starter scripts Written with Bash Script, Python and JavaScript Cloud Components Player Container (VNC and Selenium) Network Emulator EC2 Instances, SSM Execution, DynamoDB, S3 and CloudWatch Logs and Analytics Comprehensive Logs Analytic Players Plugin CAdViSE (How?) 13

Live Low-Latency (CAdViSE) 14

Preliminary Evaluation With CAdViSE 15 5 Experiments; 9:00 minutes each AWS EC2 t2.medium instances (4Gib RAM,2 CPU cores) Emulated Network Profiles: 4 mbit/s <> 800 kbit/s

Target Latencies: 1s, 3s, 5s, and 10s Two streaming formats: MPEG-DASH ( dash.js 4.4.1) HLS ( hls.js 1.2.0) ARB algorithms: Learn2Adapt-LowLatency (L2A-LL) Low-on-Latency Plus ( LoLP ) 3 Experiments of 420 seconds Network profiles: Bicycle commuter LTE network Car driver LTE network Train commuter LTE network Tram commuter LTE network Network0 up to 10Gpbs LLL-CAdViSE Evaluation Setup 16

LLL-CAdViSE Evaluation Result I 17 All time values are in seconds. a: Experiment title, format: [protocol]-[ABR]-[network]-[target latency] (def: Default, l2a: L2A-LL). b: Average of the sum of stall events duration. c : Average start-up delay. d : Average of the sum of seek events duration. e: Average quantity of quality switches. f : Playback bitrate (min-max- avg ) in kbps. g: Latency (min-max- avg ). h : Playback rate (min-max- avg ). i : Average MOS predicted by the ITU-T P.1203 quality model.

LLL-CAdViSE Evaluation Result II 18

Studies on QoE Impacting Factors 19

Throughput-based Uses throughput prediction heuristics to optimize streaming quality by estimating available network bandwidth. Examples: PANDA, Festive, CrystalBall . Buffer-based Relies solely on buffer occupancy to make streaming decisions, aiming to prevent buffer underruns and stalling. Examples: BBA0, BOLA, Quetra . Hybrid Integrates multiple heuristics such as throughput, buffer level, and latency to make comprehensive streaming decisions. Examples: GTA, Elastic, MPC. Learning-based Utilizes machine learning techniques to adapt streaming quality based on historical data and real-time network conditions. Examples: Pensieve , Fugu, Stick. Exploring ABR Algorithms 20

CAdViSE Testbed: Cloud-based platform for assessing ABR algorithms under diverse network conditions. Ensures reproducibility with session logs for accurate recreation of streaming sessions. Experiment Logs: Logs archived in DynamoDB. Script processes logs to simulate and inject stall events using FFmpeg . Video Processing: Generates a JSON file for ITU-T P.1203 model to obtain Mean Opinion Score (MOS). Concatenates audio and video tracks for finalized mp4 files. Evaluation Portal: Developed using Serverless Architecture and AWS Lambda. Based on ITU-T P.910 standards for subjective assessments. Crowdsourced Testing: Uses Amazon Mechanical Turk for participant recruitment. Custom web media player delivers test sequences to users. Evaluation Process: Participants watch and rate 10 test sequences on a 1 to 5 scale. Reliability questions ensure valid votes. Results stored and processed via AWS services. Objective and Subjective Evaluation 21

Empirical Findings I 22

Empirical Findings II 23 Stable Network Profile Fluctuation Network Profile RampUp Network Profile RampDown Network Profile

In-depth Studies on Stall Events and Quality Switches Minimum Noticeable Stall Duration (MNSD): Investigated the threshold below which stall events are not noticeable to users, thus not affecting perceived QoE. Stall Event vs. Quality Switch ( SvQ ): Evaluated user preference between experiencing a stall event or a quality drop during unfavourable network conditions. Short vs. Long Stall Events ( SvL ): Studied the impact on QoE of multiple short stall events versus a single longer stall event, considering both predicted and perceived MOS. Stall Impact and Video Quality (RSVQ): Examined the relationship between the impact of stall events on QoE and video quality level, addressing conflicting findings from previous studies. QoE Models Comparison: Compared various QoE objective evaluation models with subjective MOS results to study their correlations. 24

Subjective Evaluation Portal 25

Minimum Noticeable Stall Duration 26 Decrease in noticed stall events starts at durations less than 0.301 seconds. Over 45% of subjects did not notice stall events shorter than 0.051 seconds. Stall events under 0.004 seconds were not noticeable to participants.

Stall Event vs. Quality Switch 27 Set A - Case I: A pattern with 6s stall event and upward quality switch. Set A - Case II: A pattern without a stall event and continuous low-quality streaming. Set B - Case I: A pattern with high video quality streaming but with a 6s stall event. Set B - Case II: A pattern with a downward quality switch and without stall event.

Stall Event vs. Quality Switch 28 P reference for Case I in both Set A and Set B over Case II. Preference for higher-quality versions even with a 6-second stall.

Short vs. Long Stall Events 29 Preference for longer stall events over frequent, shorter ones

Stall Impact and Video Quality 30 Minor QoE penalty from stall events at low-quality videos (Q1). Higher penalty on QoE for middle (Q2) and high-quality (Q3) videos with stall events.

QoE Models Comparison 31 BiQPS and FINEAS: - Inconsistent performance across evaluations. P.1203 model: - Best overall performance. - Highest PCC and SRCC (> 0.8) - Lowest RMSE: 0.326. Pearson Correlation Coefficient (PCC) Spearman’s Rank Correlation Coefficient (SRCC) Root Mean Square Error (RMSE)

A Comprehensive Dataset 32

Video Codecs and Development Procedures 33 Advanced Video Coding (AVC) Library: libx264 (version 0.160.3011) from FFMPEG, slow preset. High Efficiency Video Coding (HEVC) Library: libx265 (version 3.4) from FFMPEG, slow preset. AOMedia Video 1 (AV1) Library: libsvtav1 (version 0.9.0) from FFMPEG, preset 8. Versatile Video Coding (VVC) Library: Fraunhofer VVenC (version 1.3.1), requires 8-bit YUV input, processed with FFMPEG and encoded with VVenC . At the dataset preparation time MP4Box (part of GPAC project) supports VVC in nightly builds, enabling MP4 file packaging, VVC bitstream dumping, and MPEG-DASH content packaging

Source Video Sequences 34

Available Representations 35 Resolutions up to 7680x4320 or 8K Maximum media duration of 322 seconds Segment lengths of 4 and 8 seconds Available publicly with the following link: http:// www.itec.aau.at /ftp/datasets/mmsys22

Highlights And Conclusion Three Main Categories of Contributions: Evaluation frameworks (CAdViSE and LLL-CAdViSE) for VOD and live streaming. Directly addresses RQ1. Studies on subjective and objective QoE assessments and the impacts of HAS defects on QoE. Directly addresses RQ2. Comprehensive dataset with up-to-date video technologies, including 8K VVC. Directly addresses RQ1. 36

Future Works 37 Support for New Protocols and Codecs: Extend evaluation frameworks to include emerging standards like WebRTC and VVC. Machine Learning for QoE: Apply machine learning techniques to predict and optimize QoE based assessments. Enhance Quality Models: Align existing quality models with subjective assessment findings for better prediction accuracy. Real-time QoE Monitoring: Develop tools for real-time QoE monitoring and feedback to enable dynamic adjustments during streaming sessions. User-centric QoE Personalization: Investigate methods for personalizing QoE based on individual user preferences and viewing habits.

Thank You! 38 Q&A
Tags