Date of Defense

3-4-2026 10:00 AM

Location

E1-1038 - Lecture Hall 65 (Male)

Document Type

Dissertation Defense

Degree Name

Doctor of Philosophy in Informatics and Computing

College

College of Information Technology

Department

Computer Science

First Advisor

Munkhjargal Gochoo

Abstract

This dissertation proposes a full framework for sign language recognition (SLR). This study introduces Meta AI concept and a cloud-based framework that integrates a spatial detection and recognition model for sign language alphabet, and a spatiotemporal Transformer model for word-level SLR. The framework design also integrates a 3D digital twin that acts as a sign language interpreter on real time enhancing user friendliness and human computer interaction with the system. The major focus is finding the balance between performance accuracy and computation complexity efficiency starting with the framework communication between AI agents and the SLR models and digital twin efficiency. The aim is to present a full framework that is efficient and suitable for browser deployment and mobile day to day usage without the need of any SLR special hardware. Both SLR models were developed to use a standard RGB data input which eliminates additional hardware requirement beyond a standard RGB camera. This study focuses on single modality and computational complexity efficiency while attempting to achieve a suitable performing accuracy of the SLR models. This research examines different Transformer architectural setups, sampling methods approaches and systematically refine the design to achieve the lightweight and efficiency objective of the proposed SLR models. The proposed models benchmarked against state of the art models on diverse SLR datasets and achieved competitive results when it comes accuracy and substantial results when it comes to computational complexity efficiency.

Share

COinS
 
Apr 3rd, 10:00 AM

CLOUD-BASED LIGHTWEIGHT SPATIOTEMPORAL DEEP LEARNING SIGN LANGUAGE RECOGNITION MODEL WITH DIGITAL TWIN AVATAR

E1-1038 - Lecture Hall 65 (Male)

This dissertation proposes a full framework for sign language recognition (SLR). This study introduces Meta AI concept and a cloud-based framework that integrates a spatial detection and recognition model for sign language alphabet, and a spatiotemporal Transformer model for word-level SLR. The framework design also integrates a 3D digital twin that acts as a sign language interpreter on real time enhancing user friendliness and human computer interaction with the system. The major focus is finding the balance between performance accuracy and computation complexity efficiency starting with the framework communication between AI agents and the SLR models and digital twin efficiency. The aim is to present a full framework that is efficient and suitable for browser deployment and mobile day to day usage without the need of any SLR special hardware. Both SLR models were developed to use a standard RGB data input which eliminates additional hardware requirement beyond a standard RGB camera. This study focuses on single modality and computational complexity efficiency while attempting to achieve a suitable performing accuracy of the SLR models. This research examines different Transformer architectural setups, sampling methods approaches and systematically refine the design to achieve the lightweight and efficiency objective of the proposed SLR models. The proposed models benchmarked against state of the art models on diverse SLR datasets and achieved competitive results when it comes accuracy and substantial results when it comes to computational complexity efficiency.