HTTP Adaptive Streaming (HAS) has risen to prominent acclaim as the prevailing approach for distributing video content across the Internet. The emergence of popular online streaming platforms, which mainly leverage HAS, has led to a surge in the number of users actively generating and consuming high...
HTTP Adaptive Streaming (HAS) has risen to prominent acclaim as the prevailing approach for distributing video content across the Internet. The emergence of popular online streaming platforms, which mainly leverage HAS, has led to a surge in the number of users actively generating and consuming high-quality content. Nonetheless, this remarkable surge presents an intricate puzzle for scholars and service providers, who must contend with varying network conditions and limited network resources to meet user expectations for quality.
In response to these challenges, this dissertation explores the end-to-end evaluation of Quality of Experience (QoE) in the context of HAS. This dissertation investigates evaluation methodologies and frameworks designed to measure QoE and end-to-end latency, particularly in live HAS deployments. We identified the gaps and challenges in current QoE evaluation methodologies through extensive literature reviews and analysis of existing approaches. This thesis proposes novel contributions to address these gaps, encompassing the development of evaluation frameworks, enhancing the understanding of QoE, in-depth studies on QoE impacting factors, and curating a comprehensive dataset.
Size: 17.49 MB
Language: en
Added: Jul 11, 2024
Slides: 38 pages
Slide Content
End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming BABAK TARAGHI Univ.-Prof. DI Dr. Christian TimmereR Assoc.-Prof. DI Dr. Mathias Lux Assoc.-Prof. DI Dr. Klaus SchöffmanN Assoc.-Prof. DI Dr. Ali Cengiz Begěn Class of 2020 ATHENA Christian Doppler (CD) Laboratory ITEC - Institute of Information Technology
Agenda Introduction and Context (9 minutes) Evaluation Frameworks (8 minutes) Studies on QoE Impacting Factors (12 minutes) Comprehensive Dataset Presentation (7 minutes) Highlights and Future Directions (3 minutes) Q&A
Introduction
HTTP Adaptive Streaming I Figure 1: HTTP Adaptive Streaming (HAS) concept and how the delivered quality of segments depends on the shape of the network. 4
Provisioning Codecs and Encoders, Encryptors Delivery Network Protocols, and Topologies Consumption Media players and ABR algorithms HTTP Adaptive Streaming II Consumption Delivery Provisioning 5 End-to-end Aspect
Quality of Experience I The degree of delight or annoyance of the user of an application or service. It results from the fulfilment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user’s personality and current state. – Brunnström et al. [27] 6
Quality of Experience II How to evaluate or measure the degree of annoyance or delightfulness of the user Objective Evaluation Understand and formulate the metrics Start-up Delay: How long does it take for the user to see the first frame of the video from the moment s/he clicks the play button? Delivered Media Quality: What is the delivered media quality at each moment and in average? E.g.: VMAF, Resolution, and Bitrate Stall Events (rebuffering): How many times a stall event happens and for how long? Using quality models Subjective Evaluation Investigate the perceived quality by user Conduct evaluation with human subjects 7
Research Questions 8
Empirical Research Methodology An approach to investigation that relies on direct or indirect observation and experience to gather data and generate knowledge. It involves systematically collecting and analysing empirical evidence, such as measurements, experiments, and observations, to test hypotheses and validate theories. Data-driven Assessment Real-world Evaluation and User-Centric Perspective Allows Objective and Subjective Measures Objective: Unbiased and quantifiable, using predetermined criteria and standards [9] Subjective: The process of assessment based on personal opinions, feelings, or individual judgments [9] Supports Iterative Improvement Helps with Industry and Standardization Research Methodology 9
10 Contributations
Evaluation Frameworks 11
A Quality of Experience evaluation framework for HTTP Adaptive Streaming Facilitates an organized and structured evaluation Test environment remains the same; therefore, the results can be interpreted as improved performance or otherwise Its cloud-based, since scalability is a key factor Enabled to assess multiple ABR algorithms and media simultaneously Simulates network conditions; accepts network traces as plugins Mimics real world network characteristics scenarios Provides unified insights into quality metrics Measures raw metrics Works seamlessly with analytic tools (graphs and plots) CAdViSE (What?) 12
Application Layer Runner, Initializer and Starter scripts Written with Bash Script, Python and JavaScript Cloud Components Player Container (VNC and Selenium) Network Emulator EC2 Instances, SSM Execution, DynamoDB, S3 and CloudWatch Logs and Analytics Comprehensive Logs Analytic Players Plugin CAdViSE (How?) 13
Live Low-Latency (CAdViSE) 14
Preliminary Evaluation With CAdViSE 15 5 Experiments; 9:00 minutes each AWS EC2 t2.medium instances (4Gib RAM,2 CPU cores) Emulated Network Profiles: 4 mbit/s <> 800 kbit/s
Target Latencies: 1s, 3s, 5s, and 10s Two streaming formats: MPEG-DASH ( dash.js 4.4.1) HLS ( hls.js 1.2.0) ARB algorithms: Learn2Adapt-LowLatency (L2A-LL) Low-on-Latency Plus ( LoLP ) 3 Experiments of 420 seconds Network profiles: Bicycle commuter LTE network Car driver LTE network Train commuter LTE network Tram commuter LTE network Network0 up to 10Gpbs LLL-CAdViSE Evaluation Setup 16
LLL-CAdViSE Evaluation Result I 17 All time values are in seconds. a: Experiment title, format: [protocol]-[ABR]-[network]-[target latency] (def: Default, l2a: L2A-LL). b: Average of the sum of stall events duration. c : Average start-up delay. d : Average of the sum of seek events duration. e: Average quantity of quality switches. f : Playback bitrate (min-max- avg ) in kbps. g: Latency (min-max- avg ). h : Playback rate (min-max- avg ). i : Average MOS predicted by the ITU-T P.1203 quality model.
LLL-CAdViSE Evaluation Result II 18
Studies on QoE Impacting Factors 19
Throughput-based Uses throughput prediction heuristics to optimize streaming quality by estimating available network bandwidth. Examples: PANDA, Festive, CrystalBall . Buffer-based Relies solely on buffer occupancy to make streaming decisions, aiming to prevent buffer underruns and stalling. Examples: BBA0, BOLA, Quetra . Hybrid Integrates multiple heuristics such as throughput, buffer level, and latency to make comprehensive streaming decisions. Examples: GTA, Elastic, MPC. Learning-based Utilizes machine learning techniques to adapt streaming quality based on historical data and real-time network conditions. Examples: Pensieve , Fugu, Stick. Exploring ABR Algorithms 20
CAdViSE Testbed: Cloud-based platform for assessing ABR algorithms under diverse network conditions. Ensures reproducibility with session logs for accurate recreation of streaming sessions. Experiment Logs: Logs archived in DynamoDB. Script processes logs to simulate and inject stall events using FFmpeg . Video Processing: Generates a JSON file for ITU-T P.1203 model to obtain Mean Opinion Score (MOS). Concatenates audio and video tracks for finalized mp4 files. Evaluation Portal: Developed using Serverless Architecture and AWS Lambda. Based on ITU-T P.910 standards for subjective assessments. Crowdsourced Testing: Uses Amazon Mechanical Turk for participant recruitment. Custom web media player delivers test sequences to users. Evaluation Process: Participants watch and rate 10 test sequences on a 1 to 5 scale. Reliability questions ensure valid votes. Results stored and processed via AWS services. Objective and Subjective Evaluation 21
In-depth Studies on Stall Events and Quality Switches Minimum Noticeable Stall Duration (MNSD): Investigated the threshold below which stall events are not noticeable to users, thus not affecting perceived QoE. Stall Event vs. Quality Switch ( SvQ ): Evaluated user preference between experiencing a stall event or a quality drop during unfavourable network conditions. Short vs. Long Stall Events ( SvL ): Studied the impact on QoE of multiple short stall events versus a single longer stall event, considering both predicted and perceived MOS. Stall Impact and Video Quality (RSVQ): Examined the relationship between the impact of stall events on QoE and video quality level, addressing conflicting findings from previous studies. QoE Models Comparison: Compared various QoE objective evaluation models with subjective MOS results to study their correlations. 24
Subjective Evaluation Portal 25
Minimum Noticeable Stall Duration 26 Decrease in noticed stall events starts at durations less than 0.301 seconds. Over 45% of subjects did not notice stall events shorter than 0.051 seconds. Stall events under 0.004 seconds were not noticeable to participants.
Stall Event vs. Quality Switch 27 Set A - Case I: A pattern with 6s stall event and upward quality switch. Set A - Case II: A pattern without a stall event and continuous low-quality streaming. Set B - Case I: A pattern with high video quality streaming but with a 6s stall event. Set B - Case II: A pattern with a downward quality switch and without stall event.
Stall Event vs. Quality Switch 28 P reference for Case I in both Set A and Set B over Case II. Preference for higher-quality versions even with a 6-second stall.
Short vs. Long Stall Events 29 Preference for longer stall events over frequent, shorter ones
Stall Impact and Video Quality 30 Minor QoE penalty from stall events at low-quality videos (Q1). Higher penalty on QoE for middle (Q2) and high-quality (Q3) videos with stall events.
QoE Models Comparison 31 BiQPS and FINEAS: - Inconsistent performance across evaluations. P.1203 model: - Best overall performance. - Highest PCC and SRCC (> 0.8) - Lowest RMSE: 0.326. Pearson Correlation Coefficient (PCC) Spearman’s Rank Correlation Coefficient (SRCC) Root Mean Square Error (RMSE)
A Comprehensive Dataset 32
Video Codecs and Development Procedures 33 Advanced Video Coding (AVC) Library: libx264 (version 0.160.3011) from FFMPEG, slow preset. High Efficiency Video Coding (HEVC) Library: libx265 (version 3.4) from FFMPEG, slow preset. AOMedia Video 1 (AV1) Library: libsvtav1 (version 0.9.0) from FFMPEG, preset 8. Versatile Video Coding (VVC) Library: Fraunhofer VVenC (version 1.3.1), requires 8-bit YUV input, processed with FFMPEG and encoded with VVenC . At the dataset preparation time MP4Box (part of GPAC project) supports VVC in nightly builds, enabling MP4 file packaging, VVC bitstream dumping, and MPEG-DASH content packaging
Source Video Sequences 34
Available Representations 35 Resolutions up to 7680x4320 or 8K Maximum media duration of 322 seconds Segment lengths of 4 and 8 seconds Available publicly with the following link: http:// www.itec.aau.at /ftp/datasets/mmsys22
Highlights And Conclusion Three Main Categories of Contributions: Evaluation frameworks (CAdViSE and LLL-CAdViSE) for VOD and live streaming. Directly addresses RQ1. Studies on subjective and objective QoE assessments and the impacts of HAS defects on QoE. Directly addresses RQ2. Comprehensive dataset with up-to-date video technologies, including 8K VVC. Directly addresses RQ1. 36
Future Works 37 Support for New Protocols and Codecs: Extend evaluation frameworks to include emerging standards like WebRTC and VVC. Machine Learning for QoE: Apply machine learning techniques to predict and optimize QoE based assessments. Enhance Quality Models: Align existing quality models with subjective assessment findings for better prediction accuracy. Real-time QoE Monitoring: Develop tools for real-time QoE monitoring and feedback to enable dynamic adjustments during streaming sessions. User-centric QoE Personalization: Investigate methods for personalizing QoE based on individual user preferences and viewing habits.