Please use this identifier to cite or link to this item:
Title: Fast and accurate vision-based pedestrian detection
Authors: Zhou, Chengju
Keywords: Engineering::Computer science and engineering
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Zhou, C. (2021). Fast and accurate vision-based pedestrian detection. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Pedestrian detection is an essential task in applications such as automotive safety, surveillance, and robotics. Achieving accurate vision-based pedestrian detection faces many challenges arising from highly cluttered background, high intra-class variations, inconsistent illumination, heavy occlusion, and the need to detect small-scale pedestrians. In addition, practical applications often require fast detection of pedestrians on embedded systems with stringent computational resources. This PhD research aims to develop fast and accurate vision-based pedestrian detection utilizing hand-crafted features and deep learning methods to meet the varied requirements in real-world applications. We first proposed a non deep learning pedestrian detection framework using the top-performing Filtered Channel Features (FCF) approach. Contrary to existing works that utilize many matrix-form filters or few very large-size filters, the proposed method exploits binary vector form filters to effectively and efficiently build robust pedestrian feature representation for detection. A two-stage induced group cost-sensitive RealBoost is introduced to assign varied costs for misclassified samples with different difficulties in training in order to enhance detection of harder samples. Two strategies are proposed to further improve overall detection speed at the image pyramid level and channel feature level. Experimental results on the widely-used Caltech benchmark show that the proposed framework achieves much better detection performance and can run about 148x faster than the best reported FCF method. A fast and robust pedestrian detection framework was developed next, which exploits lightweight vector form decorrelated filters to build more robust feature representation. A group cost-sensitive BoostLR (Boosting with Loss Regularization) is used to provide higher attention to the harder samples during training, which enabled controlled generalization and improved detection performance. Experimental results on INRIA, Caltech and CityPersons pedestrian detection benchmarks demonstrate that the proposed detection framework obtains better detection performance than all state-of-the-art non deep learning approaches and runs an order of magnitude faster than existing top-performing FCF methods. We then explored deep learning methods for accurate and fast pedestrian detection. A unified multi-task neural network learning architecture is proposed to efficiently and effectively inter-fuse the task of semantic segmentation and pedestrian detection. In the proposed learning architecture, we employed Faster R-CNN as the base detector and attached a lightweight semantic segmentation branch that enabled end-to-end hard parameter sharing to improve pedestrian detection, while maintaining computational efficiency. A simple anchor matching strategy is designed to alleviate the problem of feature misalignment for detecting heavily occluded pedestrians. Our proposed multi-task learning architecture is able to achieve improved pedestrian detection in diverse scenarios while maintaining lower computational complexity. Furthermore, the proposed method can obtain improved performance with downsampled images as input, which notably reduces the overall computational complexity. Experimental results on well-known CityPersons and Caltech pedestrian detection benchmarks demonstrate that our proposed learning architecture runs much faster than state-of-the-art pedestrian detection approaches while obtaining competitive detection accuracy. The Faster R-CNN methods obtain top performance in pedestrian detection task but lead to extremely high computational complexity as the complexity of R-CNN linearly increases with number of input proposals. To overcome this problem, we proposed a R-FCN based pedestrian detection framework that incorporates semantic segmentation to confidence modules for RPN head and R-FCN head, and a cascaded R-FCN head. The semantic segmentation confidence modules employ semantic segmentation branch with coarse box-wise annotations designed for the task of pedestrian detection as supervision signals to obtain semantic segmentation result. Then semantic segmentation confidence is computed and utilized as auxiliary classification prior knowledge for RPN proposal selection and R-FCN head prediction. The proposed cascaded R-FCN head progressively refines the prediction accuracy with negligible computation overhead. Experimental results on well-known CityPersons and MOT17Det pedestrian detection benchmarks demonstrate that the proposed detection framework achieves competitive detection accuracy with about 3x speedup over state-of-the-art pedestrian detection methods.
DOI: 10.32657/10356/148933
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Thesis_ZHOU_CHENGJU_G1502518L.pdf40.15 MBAdobe PDFView/Open

Page view(s)

Updated on May 15, 2022

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.