Content-based image retrieval with statistical machine learning.
Date of Issue2013
School of Electrical and Electronic Engineering
Content-based image retrieval (CBIR) has attracted intensive attention in the computer vision community during the last decades. Relevance feedback (RF) is a powerful tool to bridge the gap between low-level visual features and high-level semantic concepts in CBIR. Although many algorithms have obtained promising performance in various practical applications, CBIR is still an open research topic mainly due to the difficulties in bridging the semantic gap. In this thesis, we mainly focus on applying statistical machine learning techniques to maximize the potential of conventional RF methods to significantly improve the performance of CBIR. To alleviate the small-sized training data problem in conventional discriminant analysis based RF, i.e., biased discriminant analysis (BDA), a generalized BDA (GBDA) method is developed based on the differential scatter discriminant criterion (DSDC). By redesigning the between-class scatter matrix and integrating the locality preserving principle, GBDA can also avoid the Gaussian distribution assumption for the positive feedback samples and the overfitting problem in BDA. The new method can outperform BDA and its extensions significantly, as shown by a large number of empirical studies. To incorporate the asymmetric property of training data with conventional classification based RF, i.e., support vector machine (SVM)-based RF, a biased maximum margin analysis (BMMA) method is designed based on the graph embedding framework to separate the positive and negative feedback samples by a maximum margin in the reduced subspace. By introducing a Laplacian regularizer to BMMA, semi-supervised BMMA (SemiBMMA) is also proposed to utilize the information of unlabeled samples for SVM-based RF. Experiments on a real-world image database have demonstrated that the proposed scheme combined with SVM-based RF can better model the RF procedure and reduce the performance degradation caused by the asymmetric property of training data. To select the most informative samples for the user to label, a geometric optimum experimental design (GOED) method is proposed to select multiple representative samples in the database as the most informative ones. GOED can alleviate the small-sized training data problem by leveraging the geometric structure of unlabeled samples in the reproducing kernel Hilbert space (RKHS), and thus can further enhance the performance of image retrieval. By minimizing the expected average prediction variance on the test data, GOED has a clear geometric interpretation to select a set of the most representative samples in the database iteratively with the global optimum. Moreover, the new method is label-independent and can effectively avoid various potential problems caused by insufficient and inexactly labeled samples in RF. Extensive experiments on both synthetic datasets and a real-world image database have confirmed the advantages of GOED. To exploit the RF log data, conjunctive patches subspace learning (CPSL) with side information is developed. CPSL can directly learn a semantic concept subspace from the RF log data with a set of similar and dissimilar pairwise constraints without using any explicit class label information, and this is more practical and useful in many real-world applications. CPSL can be formulated as a constraint optimization problem, and an efficient algorithm is presented to solve this task with closed-form solutions. Moreover, the new method can also lean a distance metric but performs more effectively and efficiently when dealing with high-dimensional data. The effectiveness of CPSL in exploiting the RF log data to improve the performance of CBIR has been demonstrated by a large number of empirical studies.
DRNTU::Engineering::Electrical and electronic engineering