Date of Defense
13-11-2025 10:00 AM
Location
Room 0004, H4 Building
Document Type
Dissertation Defense
Degree Name
Doctor of Philosophy in Informatics and Computing
College
College of Information Technology
Department
Information Systems and Security
First Advisor
Farag M. Sallabi
Keywords
Federated Learning, Trustworthiness, Data Quality, Communication efficiency, Security and Privacy
Abstract
Federated Learning (FL) emerged as a significant advancement in the field of Artificial Intelligence (AI), enabling collaborative model training across distributed devices while maintaining data privacy. As the importance of FL and its application in various areas increased, addressing trustworthiness issues in its various aspects became crucial. In FL process, not all client data may be relevant to the learning objective and incorporating updates from irrelevant data can harm the model's performance. The selection of training samples significantly impacts model performance, as datasets with errors, skewed distributions, or low diversity can lead to inaccurate and unstable models. To address these issues, a data quality evaluation model has been introduced to assess the quality of datasets in FL systems. This model dynamically selects high-quality data samples for FL training by utilizing intrinsic and contextual data quality dimensions. Additionally, an importance-based interpretable feature selection model and a data quality-based dynamic client selection model employing Nash equilibrium and joint differential privacy (DP) have been designed. Building upon this, it has been identified that FL faces a major challenge of high communication overhead due to frequent model updates between clients and the central server. To address this, a novel lightweight Hierarchical FL (HFL) framework is proposed that integrates adaptive model pruning, quantization, and FL model communication and aggregation frequency optimization. First, a joint model pruning and quantization approach is introduced that dynamically adjusts pruning ratios and quantization levels. Second, a fairness-aware Stackelberg game-based model communication and aggregation frequency optimization mechanism has been developed, where clients, edge servers, and the central server collaboratively determine optimal update frequencies to balance overhead and convergence speed. Third, privacy protection is enhanced using Selective Homomorphic Encryption (SHE), and a verifiable model trust assessment is introduced to ensure secure participation of edge devices. However, communication efficiency and quality selection alone cannot guarantee trustworthy FL, as the security of both client models and aggregation processes is very important. FL enables distributed training while preserving data privacy, but it is vulnerable to poisoning attacks due to heterogeneous, non-IID client data and limited participation. To address this, a multi-layered defense is proposed that combines game-theoretic aggregation, incentive-aware client regularization, and model-side verification with important parameter selection and SHE. Extensive experiments have been conducted to validate the proposed approaches, demonstrating their effectiveness in improving data quality driven data sample and client selection, optimizing communication efficiency, and defending against adversarial poisoning in Trustworthy FL paradigm. We are hopeful that the results and discussions in this dissertation will help researchers to further improve trustworthy and secure FL systems.
Included in
TRUSTWORTHY FEDERATED LEARNING FRAMEWORK FOR SECURE, EFFICIENT, AND QUALITY-AWARE DISTRIBUTED AI
Room 0004, H4 Building
Federated Learning (FL) emerged as a significant advancement in the field of Artificial Intelligence (AI), enabling collaborative model training across distributed devices while maintaining data privacy. As the importance of FL and its application in various areas increased, addressing trustworthiness issues in its various aspects became crucial. In FL process, not all client data may be relevant to the learning objective and incorporating updates from irrelevant data can harm the model's performance. The selection of training samples significantly impacts model performance, as datasets with errors, skewed distributions, or low diversity can lead to inaccurate and unstable models. To address these issues, a data quality evaluation model has been introduced to assess the quality of datasets in FL systems. This model dynamically selects high-quality data samples for FL training by utilizing intrinsic and contextual data quality dimensions. Additionally, an importance-based interpretable feature selection model and a data quality-based dynamic client selection model employing Nash equilibrium and joint differential privacy (DP) have been designed. Building upon this, it has been identified that FL faces a major challenge of high communication overhead due to frequent model updates between clients and the central server. To address this, a novel lightweight Hierarchical FL (HFL) framework is proposed that integrates adaptive model pruning, quantization, and FL model communication and aggregation frequency optimization. First, a joint model pruning and quantization approach is introduced that dynamically adjusts pruning ratios and quantization levels. Second, a fairness-aware Stackelberg game-based model communication and aggregation frequency optimization mechanism has been developed, where clients, edge servers, and the central server collaboratively determine optimal update frequencies to balance overhead and convergence speed. Third, privacy protection is enhanced using Selective Homomorphic Encryption (SHE), and a verifiable model trust assessment is introduced to ensure secure participation of edge devices. However, communication efficiency and quality selection alone cannot guarantee trustworthy FL, as the security of both client models and aggregation processes is very important. FL enables distributed training while preserving data privacy, but it is vulnerable to poisoning attacks due to heterogeneous, non-IID client data and limited participation. To address this, a multi-layered defense is proposed that combines game-theoretic aggregation, incentive-aware client regularization, and model-side verification with important parameter selection and SHE. Extensive experiments have been conducted to validate the proposed approaches, demonstrating their effectiveness in improving data quality driven data sample and client selection, optimizing communication efficiency, and defending against adversarial poisoning in Trustworthy FL paradigm. We are hopeful that the results and discussions in this dissertation will help researchers to further improve trustworthy and secure FL systems.