Please use this identifier to cite or link to this item:
Title: Human action recognition by embedding silhouettes and visual words
Authors: Saghafi Khadem, Behrouz
Keywords: DRNTU::Engineering::Computer science and engineering::Computer applications::Computer-aided engineering
Issue Date: 2013
Source: Saghafi Khadem, B. (2013). Human action recognition by embedding silhouettes and visual words. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: With the availability of cheap video recording devices, fast internet access and huge storage spaces, the corpus of video that is accessible has grown tremendously over the last few years. Processing of these videos to achieve end-user tasks such as video retrieval, human-computer interaction (HCI), biometrics etc. require automatic understanding of content in the video. Human action recognition is one aspect of video understanding that is useful in surveillance, behavioral analysis and HCI. Although this problem has been studied for quite some years now, challenges still exist in terms of cluttered background, intra-class variance and inter-class similarity, occlusion etc. In this thesis, we propose three methods for action recognition. First, we propose a novel embedding for learning the manifold of human actions which is optimum based on spatio-temporal correlation distance (SCD) between sequences. Sequences of actions can be compared based on distances between frames. However comparison based on between-sequence distance is more efficient and effective. In particular, our proposed embedding minimizes sum of distances between intra-class sequences while maximizing sum of distances between inter-class points. Actions sequences are represented by key postures chosen equidistantly from a semantic period of action. The projected sequences are compared based on SCD or Hausdorff distance in a nearest neighbor framework. The method not only outperforms other dimension reduction methods but is comparable to the state of the art on three public datasets. Moreover it is robust to additive noise, occlusion, shape deformation and change in view point up to a large extent. Second, we proposed an approach for introducing semantic relations into the bag-of-words framework for recognizing human actions. In the standard bag-of-words framework, the features are clustered based on their appearances and not their semantic relations. We exploit Latent Semantic Models such as LSA and pLSA as well as Canonical Correlation Analysis to find a subspace in which visual words are more semantically distributed. We project the visual words into the computed space and apply k-means to obtain semantically meaningful clusters and use them as the semantic visual vocabulary which leads to more discriminative histograms for recognizing actions. Our proposed method gives promising results on the challenging KTH action dataset. Finally, we introduce a novel method for combining information from multiple viewpoints. Spatio-temporal features are extracted from each viewpoint and used in a bag-of-words framework. Two codebooks with different sizes are used to form the histograms. The similarity between computed histograms are captured by HIK kernel as well as RBF kernel with Chi-Square distance. Obtained kernels are linearly combined using proper weights which are learned through an optimization process. For more efficiency, a separate set of optimum weights are calculated for each binary SVM classifier. Our proposed method not only enables us to combine multiple views efficiently but also models the action in multiple spaces using the same features, thereby increasing performance. Several experiments are performed to show the efficiency of the framework as well as the constitutive parts. We have obtained the state of the art accuracy of 95.8% on the challenging IXMAS multi-view dataset.
DOI: 10.32657/10356/54952
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Copy of main_thesis-rev.pdf5.29 MBAdobe PDFThumbnail

Page view(s) 50

Updated on Aug 2, 2021

Download(s) 20

Updated on Aug 2, 2021

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.