Date of Defense
10-6-2026 2:00 PM
Location
H1-0058
Document Type
Dissertation Defense
Degree Name
Doctor of Philosophy in Informatics and Computing
College
CIT
Department
Computer Science and Software Engineering
First Advisor
Prof. Mohammad Mehedy Masud
Keywords
Quality of Experience, HTTP Adaptive Streaming, Facial Emotion Recognition, Multimodal Learning, Self-Healing Systems, Concept Drift, Reinforcement Learning, SHAP Explainability.
Abstract
In HTTP adaptive streaming, accurately inferring user Quality of Experience (QoE) remains challenging due to dynamic network conditions, content variations, end-to-end encryption, intrusive advertisement interruptions, and diverse user engagement behaviors. Conventional approaches, such as ITU-T P.1203, fall short in capturing real-time user perception, accounting for ad related disruptions, or adapting to model drift over time. This dissertation therefore aims to develop a multimodal face emotion recognition (FER) based inference model that predicts QoE under encryption and advertisement conditions while being interpretable, robust, resilient to drift, and highly accurate. To achieve these objectives, a unified framework is designed and implemented. The proposed framework makes several key contributions. First, a comprehensive data collection platform captures multimodal data, including facial recordings, network statistics, user feedback, and ad metadata, from over 120 cities worldwide. Multiple public datasets are also collected to benchmark against the proprietary data. Second, a novel QoE prediction model named Adaptive Ensemble for Imbalance Classification using eXplainable Artificial Intelligence (AELIX) is presented that addresses class imbalance and ensures interpretability via explainable AI, leveraging a stacked ensemble of random forest and gradient boosting. Third, model robustness is enhanced through concept drift detection and self-healing, introducing QoE foresight, which integrates hybrid drift detection techniques (HDDM-A and UADF) with a Double Deep Q-Network (DDQN) based retraining controller as the self-healing mechanism. Finally, the proposed model is evaluated through extensive experiments, demonstrating its superiority over traditional approaches.
The proposed AELIX model, achieving 99.4% prediction accuracy with a 12.4% gain over unimodal baselines, was applied to the collected data and two public datasets to assess generalizability. QoE Foresight was evaluated on four public datasets (LIVE NFLX II, Waterloo, MAWI, and a custom corpus) and against more than twenty models, including the baseline ITU T P.1203. This system delivered outstanding performance: 95.3% QoE prediction accuracy, an R² of 0.9527, 99.4% system uptime, a 55.3% improvement over ITU T P.1203, and a latency of just 0.12 ms in volatile environments. Ablation studies confirm the individual contributions of multimodal fusion (+12.4%), drift detection (+8.7%), and self-healing (+15.2%). In summary, this dissertation addresses an important and timely problem in QoE inference and provides an effective, comprehensive solution. The proposed work is expected to introduce new initiatives in this area and pave the way for a novel branch of research.
Included in
Data-Driven QoE Inference for HTTTP Adaptive Streaming
H1-0058
In HTTP adaptive streaming, accurately inferring user Quality of Experience (QoE) remains challenging due to dynamic network conditions, content variations, end-to-end encryption, intrusive advertisement interruptions, and diverse user engagement behaviors. Conventional approaches, such as ITU-T P.1203, fall short in capturing real-time user perception, accounting for ad related disruptions, or adapting to model drift over time. This dissertation therefore aims to develop a multimodal face emotion recognition (FER) based inference model that predicts QoE under encryption and advertisement conditions while being interpretable, robust, resilient to drift, and highly accurate. To achieve these objectives, a unified framework is designed and implemented. The proposed framework makes several key contributions. First, a comprehensive data collection platform captures multimodal data, including facial recordings, network statistics, user feedback, and ad metadata, from over 120 cities worldwide. Multiple public datasets are also collected to benchmark against the proprietary data. Second, a novel QoE prediction model named Adaptive Ensemble for Imbalance Classification using eXplainable Artificial Intelligence (AELIX) is presented that addresses class imbalance and ensures interpretability via explainable AI, leveraging a stacked ensemble of random forest and gradient boosting. Third, model robustness is enhanced through concept drift detection and self-healing, introducing QoE foresight, which integrates hybrid drift detection techniques (HDDM-A and UADF) with a Double Deep Q-Network (DDQN) based retraining controller as the self-healing mechanism. Finally, the proposed model is evaluated through extensive experiments, demonstrating its superiority over traditional approaches.
The proposed AELIX model, achieving 99.4% prediction accuracy with a 12.4% gain over unimodal baselines, was applied to the collected data and two public datasets to assess generalizability. QoE Foresight was evaluated on four public datasets (LIVE NFLX II, Waterloo, MAWI, and a custom corpus) and against more than twenty models, including the baseline ITU T P.1203. This system delivered outstanding performance: 95.3% QoE prediction accuracy, an R² of 0.9527, 99.4% system uptime, a 55.3% improvement over ITU T P.1203, and a latency of just 0.12 ms in volatile environments. Ablation studies confirm the individual contributions of multimodal fusion (+12.4%), drift detection (+8.7%), and self-healing (+15.2%). In summary, this dissertation addresses an important and timely problem in QoE inference and provides an effective, comprehensive solution. The proposed work is expected to introduce new initiatives in this area and pave the way for a novel branch of research.