Date of Defense
27-11-2025 12:00 PM
Location
E1- 1023
Document Type
Thesis Defense
Degree Name
Master of Science in Software Engineering
College
College of Information Technology
Department
Computer Science and Software Engineering
First Advisor
Prof. Nazar Zaki
Keywords
Automated recruitment; Curriculum vitae (CV) ranking; Large Language Models (LLMs); Semantic embeddings; Fair and explainable AI.
Abstract
Increasing numbers of applications have revealed limitations in legacy keyword-filtering-based Applicant Tracking Systems (ATS), which commonly overlook candidate potential and ignore contextual or transferable skills. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) offer an exhilarating alternative, supporting context-sensitive and human-crafted reasoning in candidate evaluation. This thesis systematically evaluates four classes of approaches, lexical models, embedding-based methods, Large Language Models (LLMs), and hybrid ensembles, for automation of Curriculum Vitae (CV) to Job Description (JD) matching without exploiting prior annotations or annotations at match time. Using a combination of publicly available datasets and real-world sample data covering three technical roles, human raters established ground-truth rankings as baselines to measure performance against. We discovered that lexical models achieved efficiency at the loss of poor correlation to human judgment, while embedding-based models, including SBERT and MPNet, raised semantic similarity but did not offer evaluative reasoning. Large Language Models, showed superior correlation to human ranking, reaching high accuracy and contextual comprehension, in spite of results being input-sensitive and computationally costly. The thesis offers empirical insights into prompt engineering, hybrid modeling, and awareness of fairness, and identifies a pivotal role for LLMs in revolutionizing recruitment practice. It concludes that, despite being able to simulate recruiter judgments, hybrid systems outperform and provide stability, and lay foundations for scalable, transparent, and ethically responsible recruitment technologies.
Included in
EVALUATING LARGE LANGUAGE MODELS FOR AUTOMATED CV RANKING: A HYBRID EMBEDDING APPROACH FOR ENHANCED RECRUITMENT
E1- 1023
Increasing numbers of applications have revealed limitations in legacy keyword-filtering-based Applicant Tracking Systems (ATS), which commonly overlook candidate potential and ignore contextual or transferable skills. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) offer an exhilarating alternative, supporting context-sensitive and human-crafted reasoning in candidate evaluation. This thesis systematically evaluates four classes of approaches, lexical models, embedding-based methods, Large Language Models (LLMs), and hybrid ensembles, for automation of Curriculum Vitae (CV) to Job Description (JD) matching without exploiting prior annotations or annotations at match time. Using a combination of publicly available datasets and real-world sample data covering three technical roles, human raters established ground-truth rankings as baselines to measure performance against. We discovered that lexical models achieved efficiency at the loss of poor correlation to human judgment, while embedding-based models, including SBERT and MPNet, raised semantic similarity but did not offer evaluative reasoning. Large Language Models, showed superior correlation to human ranking, reaching high accuracy and contextual comprehension, in spite of results being input-sensitive and computationally costly. The thesis offers empirical insights into prompt engineering, hybrid modeling, and awareness of fairness, and identifies a pivotal role for LLMs in revolutionizing recruitment practice. It concludes that, despite being able to simulate recruiter judgments, hybrid systems outperform and provide stability, and lay foundations for scalable, transparent, and ethically responsible recruitment technologies.