Detecting and recognizing human action in videos
Date of Issue2014
School of Electrical and Electronic Engineering
Detecting and recognizing human actions is of great importance to video analytics due to its numerous applications in video surveillance and human computer interaction. Despite much previous work, fast and reliable action detection and recognition in unconstrained videos remain a challenging problem. First of all, actions are spatio-temporal patterns characterized by both motion and appearance features. The same type of action may exhibit large variations due to the changes of motion speed, scale, view point, clothing, not to mention partial occlusions. It is thus a challenge to perform robust action matching that is insensitive to such variations, especially if only a limited number of training examples are provided. Moreover, fast action detection and localization is another challenging issue in cluttered and dynamic environment. Compared with image based object detection which only requires spatial localization, action localization is in spatio-temporal video space thus is much more time consuming. This thesis presents a systematic study on detecting and recognizing human actions in cluttered and dynamic environments. The videos are characterized by spatio-temporal local features, and the proposed methods leverage the fast matching of local features to perform action recognition and detection. To capture the intra-class variations of action categories, randomized trees are developed to capture the local feature distribution of the action categories. Such a tree-based indexing enables fast local feature matching, and when limited training examples are available, it can be easily extended to index both labelled and unlabelled data samples and perform semi-supervised learning to improve the detection performance. Even with only one exemplar query action, the randomized tree indexing approach can still achieve promising result to detect similar actions in the big video corpus efficiently. To perform fast spatio-temporal action localization, two different approaches have been proposed: (1) Coarse-to-fine branch-and-bound search and (2) Propagative Hough voting. Both methods can significantly reduce the computational cost of action localization, and do not rely on human detection, tracking, and background subtraction. By addressing the fundamental challenges of action detection and recognition, this thesis also investigated action detection solutions for different application scenarios, such as multi-class action detection, action search with one query example, and online action prediction based on partial video observation. Extensive experiments on benchmarked datasets show that the proposed methods can achieve promising results compared with the state of the arts.
DRNTU::Engineering::Electrical and electronic engineering