Date of Defense

27-11-2025 12:00 PM

Location

E1- 1023

Document Type

Thesis Defense

Degree Name

Master of Science in Software Engineering

College

College of Information Technology

Department

Computer Science and Software Engineering

First Advisor

Prof. Nazar Zaki

Keywords

Automated recruitment; Curriculum vitae (CV) ranking; Large Language Models (LLMs); Semantic embeddings; Fair and explainable AI.

Abstract

Increasing numbers of applications have revealed limitations in legacy keyword-filtering-based Applicant Tracking Systems (ATS), which commonly overlook candidate potential and ignore contextual or transferable skills. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) offer an exhilarating alternative, supporting context-sensitive and human-crafted reasoning in candidate evaluation. This thesis systematically evaluates four classes of approaches, lexical models, embedding-based methods, Large Language Models (LLMs), and hybrid ensembles, for automation of Curriculum Vitae (CV) to Job Description (JD) matching without exploiting prior annotations or annotations at match time. Using a combination of publicly available datasets and real-world sample data covering three technical roles, human raters established ground-truth rankings as baselines to measure performance against. We discovered that lexical models achieved efficiency at the loss of poor correlation to human judgment, while embedding-based models, including SBERT and MPNet, raised semantic similarity but did not offer evaluative reasoning. Large Language Models, showed superior correlation to human ranking, reaching high accuracy and contextual comprehension, in spite of results being input-sensitive and computationally costly. The thesis offers empirical insights into prompt engineering, hybrid modeling, and awareness of fairness, and identifies a pivotal role for LLMs in revolutionizing recruitment practice. It concludes that, despite being able to simulate recruiter judgments, hybrid systems outperform and provide stability, and lay foundations for scalable, transparent, and ethically responsible recruitment technologies.

Share

COinS
 
Nov 27th, 12:00 PM

EVALUATING LARGE LANGUAGE MODELS FOR AUTOMATED CV RANKING: A HYBRID EMBEDDING APPROACH FOR ENHANCED RECRUITMENT

E1- 1023

Increasing numbers of applications have revealed limitations in legacy keyword-filtering-based Applicant Tracking Systems (ATS), which commonly overlook candidate potential and ignore contextual or transferable skills. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) offer an exhilarating alternative, supporting context-sensitive and human-crafted reasoning in candidate evaluation. This thesis systematically evaluates four classes of approaches, lexical models, embedding-based methods, Large Language Models (LLMs), and hybrid ensembles, for automation of Curriculum Vitae (CV) to Job Description (JD) matching without exploiting prior annotations or annotations at match time. Using a combination of publicly available datasets and real-world sample data covering three technical roles, human raters established ground-truth rankings as baselines to measure performance against. We discovered that lexical models achieved efficiency at the loss of poor correlation to human judgment, while embedding-based models, including SBERT and MPNet, raised semantic similarity but did not offer evaluative reasoning. Large Language Models, showed superior correlation to human ranking, reaching high accuracy and contextual comprehension, in spite of results being input-sensitive and computationally costly. The thesis offers empirical insights into prompt engineering, hybrid modeling, and awareness of fairness, and identifies a pivotal role for LLMs in revolutionizing recruitment practice. It concludes that, despite being able to simulate recruiter judgments, hybrid systems outperform and provide stability, and lay foundations for scalable, transparent, and ethically responsible recruitment technologies.