Date of Defense

10-6-2026 2:00 PM

Location

H1-0058

Document Type

Dissertation Defense

Degree Name

Doctor of Philosophy in Informatics and Computing

College

CIT

Department

Computer Science and Software Engineering

First Advisor

Prof. Mohammad Mehedy Masud

Keywords

Quality of Experience, HTTP Adaptive Streaming, Facial Emotion Recognition, Multimodal Learning, Self-Healing Systems, Concept Drift, Reinforcement Learning, SHAP Explainability.

Abstract

In HTTP adaptive streaming, accurately inferring user Quality of Experience (QoE) remains challenging due to dynamic network conditions, content variations, end-to-end encryption, intrusive advertisement interruptions, and diverse user engagement behaviors. Conventional approaches, such as ITU-T P.1203, fall short in capturing real-time user perception, accounting for ad related disruptions, or adapting to model drift over time. This dissertation therefore aims to develop a multimodal face emotion recognition (FER) based inference model that predicts QoE under encryption and advertisement conditions while being interpretable, robust, resilient to drift, and highly accurate. To achieve these objectives, a unified framework is designed and implemented. The proposed framework makes several key contributions. First, a comprehensive data collection platform captures multimodal data, including facial recordings, network statistics, user feedback, and ad metadata, from over 120 cities worldwide. Multiple public datasets are also collected to benchmark against the proprietary data. Second, a novel QoE prediction model named Adaptive Ensemble for Imbalance Classification using eXplainable Artificial Intelligence (AELIX) is presented that addresses class imbalance and ensures interpretability via explainable AI, leveraging a stacked ensemble of random forest and gradient boosting. Third, model robustness is enhanced through concept drift detection and self-healing, introducing QoE foresight, which integrates hybrid drift detection techniques (HDDM-A and UADF) with a Double Deep Q-Network (DDQN) based retraining controller as the self-healing mechanism. Finally, the proposed model is evaluated through extensive experiments, demonstrating its superiority over traditional approaches.

The proposed AELIX model, achieving 99.4% prediction accuracy with a 12.4% gain over unimodal baselines, was applied to the collected data and two public datasets to assess generalizability. QoE Foresight was evaluated on four public datasets (LIVE NFLX II, Waterloo, MAWI, and a custom corpus) and against more than twenty models, including the baseline ITU T P.1203. This system delivered outstanding performance: 95.3% QoE prediction accuracy, an R² of 0.9527, 99.4% system uptime, a 55.3% improvement over ITU T P.1203, and a latency of just 0.12 ms in volatile environments. Ablation studies confirm the individual contributions of multimodal fusion (+12.4%), drift detection (+8.7%), and self-healing (+15.2%). In summary, this dissertation addresses an important and timely problem in QoE inference and provides an effective, comprehensive solution. The proposed work is expected to introduce new initiatives in this area and pave the way for a novel branch of research.

Share

COinS
 
Jun 10th, 2:00 PM

Data-Driven QoE Inference for HTTTP Adaptive Streaming

H1-0058

In HTTP adaptive streaming, accurately inferring user Quality of Experience (QoE) remains challenging due to dynamic network conditions, content variations, end-to-end encryption, intrusive advertisement interruptions, and diverse user engagement behaviors. Conventional approaches, such as ITU-T P.1203, fall short in capturing real-time user perception, accounting for ad related disruptions, or adapting to model drift over time. This dissertation therefore aims to develop a multimodal face emotion recognition (FER) based inference model that predicts QoE under encryption and advertisement conditions while being interpretable, robust, resilient to drift, and highly accurate. To achieve these objectives, a unified framework is designed and implemented. The proposed framework makes several key contributions. First, a comprehensive data collection platform captures multimodal data, including facial recordings, network statistics, user feedback, and ad metadata, from over 120 cities worldwide. Multiple public datasets are also collected to benchmark against the proprietary data. Second, a novel QoE prediction model named Adaptive Ensemble for Imbalance Classification using eXplainable Artificial Intelligence (AELIX) is presented that addresses class imbalance and ensures interpretability via explainable AI, leveraging a stacked ensemble of random forest and gradient boosting. Third, model robustness is enhanced through concept drift detection and self-healing, introducing QoE foresight, which integrates hybrid drift detection techniques (HDDM-A and UADF) with a Double Deep Q-Network (DDQN) based retraining controller as the self-healing mechanism. Finally, the proposed model is evaluated through extensive experiments, demonstrating its superiority over traditional approaches.

The proposed AELIX model, achieving 99.4% prediction accuracy with a 12.4% gain over unimodal baselines, was applied to the collected data and two public datasets to assess generalizability. QoE Foresight was evaluated on four public datasets (LIVE NFLX II, Waterloo, MAWI, and a custom corpus) and against more than twenty models, including the baseline ITU T P.1203. This system delivered outstanding performance: 95.3% QoE prediction accuracy, an R² of 0.9527, 99.4% system uptime, a 55.3% improvement over ITU T P.1203, and a latency of just 0.12 ms in volatile environments. Ablation studies confirm the individual contributions of multimodal fusion (+12.4%), drift detection (+8.7%), and self-healing (+15.2%). In summary, this dissertation addresses an important and timely problem in QoE inference and provides an effective, comprehensive solution. The proposed work is expected to introduce new initiatives in this area and pave the way for a novel branch of research.