Date of Defense
3-4-2026 10:00 AM
Location
E1-1038 - Lecture Hall 65 (Male)
Document Type
Dissertation Defense
Degree Name
Doctor of Philosophy in Informatics and Computing
College
College of Information Technology
Department
Computer Science
First Advisor
Munkhjargal Gochoo
Abstract
This dissertation proposes a full framework for sign language recognition (SLR). This study introduces Meta AI concept and a cloud-based framework that integrates a spatial detection and recognition model for sign language alphabet, and a spatiotemporal Transformer model for word-level SLR. The framework design also integrates a 3D digital twin that acts as a sign language interpreter on real time enhancing user friendliness and human computer interaction with the system. The major focus is finding the balance between performance accuracy and computation complexity efficiency starting with the framework communication between AI agents and the SLR models and digital twin efficiency. The aim is to present a full framework that is efficient and suitable for browser deployment and mobile day to day usage without the need of any SLR special hardware. Both SLR models were developed to use a standard RGB data input which eliminates additional hardware requirement beyond a standard RGB camera. This study focuses on single modality and computational complexity efficiency while attempting to achieve a suitable performing accuracy of the SLR models. This research examines different Transformer architectural setups, sampling methods approaches and systematically refine the design to achieve the lightweight and efficiency objective of the proposed SLR models. The proposed models benchmarked against state of the art models on diverse SLR datasets and achieved competitive results when it comes accuracy and substantial results when it comes to computational complexity efficiency.
Included in
CLOUD-BASED LIGHTWEIGHT SPATIOTEMPORAL DEEP LEARNING SIGN LANGUAGE RECOGNITION MODEL WITH DIGITAL TWIN AVATAR
E1-1038 - Lecture Hall 65 (Male)
This dissertation proposes a full framework for sign language recognition (SLR). This study introduces Meta AI concept and a cloud-based framework that integrates a spatial detection and recognition model for sign language alphabet, and a spatiotemporal Transformer model for word-level SLR. The framework design also integrates a 3D digital twin that acts as a sign language interpreter on real time enhancing user friendliness and human computer interaction with the system. The major focus is finding the balance between performance accuracy and computation complexity efficiency starting with the framework communication between AI agents and the SLR models and digital twin efficiency. The aim is to present a full framework that is efficient and suitable for browser deployment and mobile day to day usage without the need of any SLR special hardware. Both SLR models were developed to use a standard RGB data input which eliminates additional hardware requirement beyond a standard RGB camera. This study focuses on single modality and computational complexity efficiency while attempting to achieve a suitable performing accuracy of the SLR models. This research examines different Transformer architectural setups, sampling methods approaches and systematically refine the design to achieve the lightweight and efficiency objective of the proposed SLR models. The proposed models benchmarked against state of the art models on diverse SLR datasets and achieved competitive results when it comes accuracy and substantial results when it comes to computational complexity efficiency.