Hierarchical feature learning for image categorization
Date of Issue2015
School of Electrical and Electronic Engineering
Extracting informative, robust, and compact data representation (feature) has been considered as one of the key factors for good performance in computer vision, and image categorization is one of the most fundamental computer vision problems. Traditionally, hand-crafted features like SIFT and HOG have been widely used, however, these features cannot adapt to data and need to be well designed. In contrast, feature learning methods have been proposed to encode data-adaptive information in data representation, which have outperformed hand-crafted features with big gaps, and brought a rapid progress in image categorization. The goal of this thesis is to present various feature learning architectures for the problem of object/scene image categorization. In the first part of the thesis, a discriminative hierarchical feature learning framework will be presented for object image categorization. This work aims to learn non-linear transformation matrices to transform image patches to local features. Current features learned by unsupervised learning methods can hardly capture the differences between different classes, which are crucial for object categorization. To capture such information, a discriminative constraint is proposed to force the local feature patches extracted from the same categories to be locally similar, while local feature patches from different classes to be separable. In the second part, discriminative and shareable information will be encoded in features for scene image categorization. Different from object images, scene images do not have clear foreground/background. Some patterns are shared among several classes, some patterns are class-specific. While some patterns represent noisy data, which is not helpful and should be excluded. In order to encode such information, the exemplar based deep discriminative and shareable feature learning framework will be proposed to learn compact filter banks, and hierarchically transfer local image patches to features. In the third part, a class of end-to-end neural networks, called convolutional and hierarchical recurrent neural networks (C-HRNNs), will be presented for large- scale object/scene image categorization. In existing convolutional neural networks (CNNs), both convolution and pooling are locally performed for image regions separately, no contextual dependencies between different image regions have been taken into consideration. Such dependencies represent useful spatial structure information in images. In contrast, recurrent neural networks (RNNs) are well known for their capability of learning contextual dependencies of sequential data by using the recurrent (feed-back) connections. In this work, C-HRNNs aim to encode both spatial and scale dependencies among different image regions to enhance the global discriminative power of image representation. Where CNN layers are firstly processed to generate middle level features. HRNN layers are then processed to learn spatial and scale dependencies.
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision