Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/184626
Title: | Target driven visual navigation for a mobile robot using deep reinforcement learning | Authors: | Liu, Chengxiao | Keywords: | Engineering | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Liu, C. (2025). Target driven visual navigation for a mobile robot using deep reinforcement learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184626 | Abstract: | Target-driven visual navigation remains a critical challenge for autonomous mobile robots (AMRs) operating in dynamic, unstructured environments. Traditional approaches relying on pre-built maps or GPS-based localization often fail in GPS-denied indoor spaces or scenarios requiring adaptation to unseen layouts. This dissertation presents a novel deep reinforcement learning (DRL) framework that enables AMRs to navigate toward alphanumeric targets using egocentric visual inputs, eliminating dependency on prior environmental knowledge. The proposed framework integrates three key innovations: (1) zero-shot object detection for robust localization of numeric targets without class-specific training, (2) Transformer-based Optical Character Recognition (TrOCR) for discriminative feature extraction, and (3) Principal Component Analysis (PCA) to enhance numeric differentiation by reducing redundant visual information. Leveraging procedural environment generation via ProcTHOR, we create diverse corridor configurations with varying lighting, textures, and obstacle layouts to ensure generalization. The navigation policy is optimized through Proximal Policy Optimization (PPO), combining sparse rewards for target proximity with penalties for inefficient movements. Experimental evaluations demonstrate that the proposed model achieves a 70% success rate in target navigation tasks, marking a significant breakthrough compared to baseline models that lack alphanumeric image recognition capabilities. This performance gap highlights the critical role of integrating visual-textual understanding for navigating alphanumeric targets. | URI: | https://hdl.handle.net/10356/184626 | Schools: | School of Electrical and Electronic Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | EEE Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
LIU_CHENGXIAO_Dissertation_final.pdf Restricted Access | final version | 6.74 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.