Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorXue, Chuhuien_US
dc.identifier.citationXue, C. (2022). Accurate and robust detection and recognition of texts in scene. Doctoral thesis, Nanyang Technological University, Singapore.
dc.description.abstractScene text detection and recognition aim to localize the texts in natural scene images and output corresponding character sequences of texts. Automated scene text detection and recognition have attracted increasing interest in computer vision and deep learning communities due to its wide range of applications in neural machine translation, autonomous driving, etc. As compared with preliminary research that focuses on the design of hand-crafted features, modern deep-learning-based techniques have achieved significant improvements on scene text detection and recognition tasks. Such frameworks usually deploy convolutional neural networks (CNN), recurrent neural networks (RNN), or Transformers to extract image features for accurate text detection and recognition. However, automated detecting and recognizing texts in scenes remain challenging due to the complexity of scene text images. First, texts in scenes exhibit high variability and diversity in appearance due to the complex patterns of texts (e.g., colors, fonts, etc.) and various environments (e.g., lighting, occlusion, etc.). Second, scene texts usually have different lengths, orientations, and shapes that may suffer from both perspective and curvature distortions. Third, scene images usually have complex backgrounds that may contain similar patterns with texts (e.g., trees, traffic signs, etc.). Either of them will lead to incorrect prediction in scene text detection and recognition task. In this thesis, we propose several novel techniques for scene text detection and recognition that aim to produce more accurate detection and recognition of scene texts in different orientations, lengths, sizes, and shapes. First, we design a novel scene text detection approach that detects texts through border semantics awareness and bootstrapping. We introduce a bootstrapping technique that samples multiple `subsections' of a word or text line and accordingly relieves the constraint of limited training data effectively. In addition, a semantics-aware text border detection technique is designed which produces four types of text border segments for text detection. Second, we develop a novel multi-scale shape regression network (MSR) for accurate scene text detection. It detects scene texts by predicting dense text boundary points instead of sparse quadrilateral vertices which often suffers from regression errors while dealing with long text lines. Additionally, the multi-scale network extracts and fuses features at different scales concurrently and seamlessly which demonstrates superb tolerance to the text scale variation. Third, we design a mask-guided multi-task network that reliably detects and rectifies scene texts of arbitrary shapes. The proposed network detects text keypoints and landmark points for accurate text detection and rectification. Forth, we propose a novel scene text recognition method I2C2W that is tolerant to geometric and photometric degradation by decomposing scene text recognition into two inter-connected tasks and leveraging the advances of Transformer architecture. Extensive experiments show that the proposed techniques can accurately detect and recognize texts with various lengths, orientations, and shapes from natural scene images.en_US
dc.publisherNanyang Technological Universityen_US
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).en_US
dc.subjectEngineering::Computer science and engineeringen_US
dc.titleAccurate and robust detection and recognition of texts in sceneen_US
dc.typeThesis-Doctor of Philosophyen_US
dc.contributor.supervisorLu Shijianen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeDoctor of Philosophyen_US
item.fulltextWith Fulltext-
Appears in Collections:SCSE Theses
Files in This Item:
File Description SizeFormat 
Xue_Chuhui-Thesis.pdf20.77 MBAdobe PDFThumbnail

Page view(s)

Updated on Nov 28, 2023

Download(s) 50

Updated on Nov 28, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.