Context-aware mobile image recognition and annotation
Date of Issue2013
School of Electrical and Electronic Engineering
The growing usage of mobile camera phones has led to proliferation of many mobile applications, such as mobile city guide, mobile shopping, personalized mobile service, and personal album management. Mobile visual systems have been developed which analyze images taken by mobile devices to enable these mobile applications. Amongst these applications, there are two important ones: 1) mobile image recognition which provides relevant information for the scene/landmark images, and 2) mobile image annotation that uses camera phones to capture images and annotate them. Mobile image recognition and annotation are closely related, and are based on mobile visual analysis. In order to enhance the performance of mobile visual system, it is natural to incorporate the mobile domain-specific context information to the conventional visual content analysis. The context information in this work includes location and direction information on mobile devices, mobile user interaction, etc. However, context information is underutilized in most of the existing mobile visual systems. Existing mobile visual systems mainly use location information provided by GPS (Global Positioning System) to obtain the candidate images located near the current location of the query image, and then carry out content analysis within the shortlisted candidates to obtain the final recognition/annotation results. This is insufficient since (i) GPS is not that reliable due to its large errors in dense build-up areas, and (ii) other context information such as direction (recorded by digital compass on mobile device) is not utilized to further improve recognition. For mobile image recognition, we proposed several approaches based on content analysis with possible incorporation of context information: 1) A new approach for scene image recognition is proposed by combining generative models and discriminative models. A new image signature is proposed based on Gaussian Mixture Model (GMM), and its soft relevance value is incorporated into training of Fuzzy Support Vector Machine (FSVM). By using the proposed GMM-FSVM approach, the recognition performance is shown to be superior to state-of-the-art Bag-of-Words (BoW) methods. 2) A new landmark image recognition method is proposed that can incorporate saliency information of images to the state-of-the-art Scalable Vocabulary Tree (SVT) approach. Since the saliency information emphasizes the foreground landmark object and ignores the cluttered background, recognition performance of the proposed Saliency-Aware Vocabulary Tree (SAVT) algorithm is improved relative to the baseline SVT approach. 3) We propose a real-valued multi-class adaboost algorithm using exponential loss function (RMAE), which can integrate visual content and two types of mobile context: location and direction. RMAE generates SVTs based on content and context analysis, respectively, and then constructs weak classifiers based on them, followed by the final strong classifier construction based on the weak classifiers which contains both content and context information. For mobile image annotation, we developed a system prototype and proposed several approaches by utilizing content analysis, context analysis and their integration: 2) To study the effectiveness of context-based image annotation, a new algorithm is proposed by modeling the tag distributions over different GPS locations of the mobile images. Specifically, the tag distributions are obtained by using an enhanced GMM. Based on the tag distributions, a query image can be associated to tags according to its location, thus achieving context-based image annotation. As part of the contributions, we have also constructed two mobile image databases: i) Singapore Landmark-40 dataset for recognition, and ii) NTU Scene-25 dataset for annotation. Singapore Landmark-40 datasets consists of 12,338 training images and 1,200 testing images for 40 famous landmarks in Singapore. NTU Scene-25 dataset consists of 3916 images in 25 categories of geotagged scenes/landmarks/activities from the campus in NTU. This dataset include various context information such as GPS location and direction. Comprehensive experiments have been carried on a number of mobile image datasets, and experimental results show that the proposed mobile image recognition and annotation methods outperform the state-of-the-art methods, and shows good potential in mobile image sharing based on recognition and annotation.
DRNTU::Engineering::Computer science and engineering::Information systems