Date of Defense
9-4-2026 12:00 PM
Location
F3-035
Document Type
Dissertation Defense
Degree Name
Doctor of Philosophy in Informatics and Computing
College
CIT
Department
Computer Science and Software Engineering
First Advisor
Prof. Amir Ahmad
Keywords
Machine Learning, Binary Classification, Anomaly Detection, One-Class Classification, Explainable Artificial Intelligence, Data Imbalance, Neonatal Disorder, Healthcare.
Abstract
Neonatal disorders such as low birth weight, very low birth weight, extremely low birth weight, preterm birth, and very preterm birth increase the likelihood of high neonatal morbidity or mortality and call for early identification. However, the rarity of occurrence of these conditions in the clinical datasets has resulted in a severe class imbalance, raising questions about the application of binary classification models to them. Therefore, this thesis proposes a sequential methodological framework for neonatal disorder detection under different assumptions related to the availability of labels. Initially, binary classification experiments are conducted to analyze the behaviour of commonly used classification algorithms on highly imbalanced neonatal datasets, without employing data rebalancing techniques. These experiments establish baseline performance characteristics and illustrate the sensitivity of binary classifiers to extreme imbalance. Second, in scenarios where only normal pregnancy outcomes are available for training, the study adopts one-class classification, in which models are trained exclusively on normal instances and deviations from the learned representation are interpreted as abnormal outcomes. Furthermore, to support clinical interpretation of one-class classification outputs, explainable artificial intelligence techniques are applied through a perturbation-based feature attribution approach that quantifies the influence of individual prenatal and maternal variables on anomaly scores. Third, recognizing that maternal and neonatal data are often distributed across multiple hospitals or clinical sites, one-class classification is extended to a federated learning setting. In this setting, models are trained locally at each client, and only anomaly scores are shared with a central server, avoiding the exchange of raw patient data. Because anomaly scores generated by different clients are not directly combined, multiple anomaly score aggregation functions are systematically evaluated to examine their performance characteristics across neonatal outcomes, parity groups, i.e., parous and nulliparous women, client sizes, and data distribution scenarios. Finally, in the absence of outcome labels, the framework considers fully unsupervised outlier detection, treating adverse neonatal outcomes as statistical deviations from the normal data distribution. Empirical evaluation is conducted on real-world neonatal datasets collected in Al Ain, Abu Dhabi, UAE, under the umbrella of Mutaba’ah mother-and-child cohort, covering low birth weight, very low birth weight, extremely low birth weight, preterm birth, and very preterm birth outcomes for both parous and nulliparous women. The experimental results provide a comparative analysis of binary classification, one-class classification, federated anomaly detection with score aggregation, and unsupervised outlier detection under severe class imbalance. The contributions of this thesis include: (i) a systematic empirical evaluation of supervised classification, one-class classification, and unsupervised anomaly detection methods for neonatal disorder data, (ii) the integration of perturbation-based explainability within one-class classification for neonatal disorder detection, and (iii) a comparative analysis of anomaly score aggregation functions within federated neonatal datasets.
Included in
DEVELOPING MACHINE LEARNING ALGORITHMS FOR HIGHLY IMBALANCED NEONATAL DISORDER DATA
F3-035
Neonatal disorders such as low birth weight, very low birth weight, extremely low birth weight, preterm birth, and very preterm birth increase the likelihood of high neonatal morbidity or mortality and call for early identification. However, the rarity of occurrence of these conditions in the clinical datasets has resulted in a severe class imbalance, raising questions about the application of binary classification models to them. Therefore, this thesis proposes a sequential methodological framework for neonatal disorder detection under different assumptions related to the availability of labels. Initially, binary classification experiments are conducted to analyze the behaviour of commonly used classification algorithms on highly imbalanced neonatal datasets, without employing data rebalancing techniques. These experiments establish baseline performance characteristics and illustrate the sensitivity of binary classifiers to extreme imbalance. Second, in scenarios where only normal pregnancy outcomes are available for training, the study adopts one-class classification, in which models are trained exclusively on normal instances and deviations from the learned representation are interpreted as abnormal outcomes. Furthermore, to support clinical interpretation of one-class classification outputs, explainable artificial intelligence techniques are applied through a perturbation-based feature attribution approach that quantifies the influence of individual prenatal and maternal variables on anomaly scores. Third, recognizing that maternal and neonatal data are often distributed across multiple hospitals or clinical sites, one-class classification is extended to a federated learning setting. In this setting, models are trained locally at each client, and only anomaly scores are shared with a central server, avoiding the exchange of raw patient data. Because anomaly scores generated by different clients are not directly combined, multiple anomaly score aggregation functions are systematically evaluated to examine their performance characteristics across neonatal outcomes, parity groups, i.e., parous and nulliparous women, client sizes, and data distribution scenarios. Finally, in the absence of outcome labels, the framework considers fully unsupervised outlier detection, treating adverse neonatal outcomes as statistical deviations from the normal data distribution. Empirical evaluation is conducted on real-world neonatal datasets collected in Al Ain, Abu Dhabi, UAE, under the umbrella of Mutaba’ah mother-and-child cohort, covering low birth weight, very low birth weight, extremely low birth weight, preterm birth, and very preterm birth outcomes for both parous and nulliparous women. The experimental results provide a comparative analysis of binary classification, one-class classification, federated anomaly detection with score aggregation, and unsupervised outlier detection under severe class imbalance. The contributions of this thesis include: (i) a systematic empirical evaluation of supervised classification, one-class classification, and unsupervised anomaly detection methods for neonatal disorder data, (ii) the integration of perturbation-based explainability within one-class classification for neonatal disorder detection, and (iii) a comparative analysis of anomaly score aggregation functions within federated neonatal datasets.