Visual recognition by learning from web data
Date of Issue2014
School of Computer Engineering
Centre for Multimedia and Network Technology
With the rapid development of digital cameras, we have witnessed an explosive growth of digital images. Every day, a tremendous amount of images together with rich contextual information (e.g., tags, categories and captions) are posted to Internet. There is an increasing interest in exploiting those web images for building intelligent visual recognition systems. While some works have been proposed to collect large scale image datasets by crawling web images from Internet, considerable human efforts are still required to annotate those images to train classifiers for visual recognition. In this thesis, we propose to develop novel learning algorithms for visual recognition by learning from web data, in which we aim to use as less as possible human efforts for annotating the training data. First, considering that web images are usually associated with noisy surrounding textual descriptions, we treat the words in the surrounding text as weak labels and formulate the task of learning from web data as a multi-instance learning (MIL) problem. By observing the relevant images usually contain many true positive images, we generalize the traditional MIL constraints on positive bags to that each positive bag contains at least a portion of positive instances. To effectively exploit such constraints on positive bags, we develop a new MIL algorithm called MIL with constrained positive bags (MIL-CPB) for web image retrieval. Observing that the constraints are not always satisfied in the MIL-CPB, a progressive scheme is proposed to further improve the retrieval performance, in which we iteratively partition the top-ranked training web images from the current MIL-CPB classifier to construct more confident positive bags and then use these new bags as training data to learn the subsequent MIL-CPB classifiers. Second, when the web training data is represented with multiple views of features, we further propose a co-labeling approach to improve the classifiers learnt from web data by using multiple views of features. We model the learning problem on each view as a weakly labeled learning problem, and use the predicted training labels from the classifier trained on one view to help the classifier on another view. Our co-labeling approach not only can handle the traditional multi-view semi-supervised learning problem, but also can be applied to other multi-view weakly labeled learning problem such as multi-view MIL. Finally, we observe that there are intrinsic differences between the crawled web training data and the testing images in our daily lives, which is also known as the domain adaptation problem. Particularly, we study the heterogeneous domain adaptation problem, in which the samples in source and target domains are with different feature representations. We build upon the recent Heterogeneous Feature Augmentation (HFA) method, and propose a convex reformulation of HFA, which can guarantee the global optimal solution. We further extend the HFA method to semi-supervised HFA (SHFA), in which we improve the learnt classifiers by exploiting the additional unlabeled data from the target domain. For all our proposed approaches, we conduct extensive experiments on publicly available datasets to demonstrate their effectiveness.
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition