Distance learning between image and class for object recognition
Date of Issue2013
School of Computer Engineering
Centre for Multimedia and Network Technology
Object recognition is an active research topic in the computer vision community. Recently a novel Image-to-Class (I2C) distance has been proposed to handle this problem, which classifies images using a simple Naive-Bayes based nearest-neighbor (NBNN) classifier but provides surprisingly excellent performance. This new distance provides a novel direction that avoids feature quantization and shows better generalization capability than the traditional Image-to-Image (I2I) distance. However, the computation cost of calculating this distance is too expensive since its performance relies heavily on searching the nearest neighbor (NN) from a large number of training features, and the label information of the training data is not fully used, which limits its recognition performance. In this thesis, we aim to improve both the recognition performance and efficiency of this I2C distance as well as to extend its application field. First of all, we add a training phase to this distance for improving its recognition performance by learning a weighted I2C distance. A large margin optimization framework is proposed to learn the I2C distance function, which is modeled as a weighted combination of the distance from every local feature in an image to its NN in a candidate class. We learn these weights associated with local features in the training set by constraining the optimization such that the I2C distance from image to its belonging class should be less than that to any other class. To reduce the computation cost, we also propose two methods based on spatial division and hubness score to accelerate the NN search, which is able to largely reduce the on-line testing time while still preserving or even achieving a better classification accuracy. Secondly, we propose a distance metric learning method to further improve the performance of I2C distance by learning Per-Class Mahalanobis metrics. This Mahalanobis I2C distance is adaptive to different classes by combining with the learned metric for each class. These multiple Per-Class metrics are learned simultaneously by forming a convex optimization problem and solved by an efficient subgradient descent method. For efficiency and scalability to large-scale problems, we also show how to simplify the method to learn a diagonal matrix for each class. Thirdly, we extend the object recognition to the multi-label problem and propose a Class-to-Image (C2I) distance, which shows better performance than the I2C distance for multi-label image classification. However, since the number of local features in a class is huge compared to that in an image, the calculation of the C2I distance is more expensive than the one of I2C distance. Moreover, the label information of training images can be used to help select relevant local features for each class and further improve the recognition performance. Therefore, to make the C2I distance faster and perform better, we propose an optimization algorithm using L_1-norm regularization and large margin constraint to learn the C2I distance, which can not only reduce the number of local features in the class feature set, but also improve the performance of the C2I distance due to the use of label information. We also use this C2I distance for object localization, so that it can tell not only whether a candidate class appears in a test image, but also where it locates. With these three works, we are able to improve the recognition performance and efficiency of the I2C distance and make it applicable for the multi-label problem. Therefore, the learned distance between image and class would be more practical for real world object recognition applications.
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision