Please use this identifier to cite or link to this item:
Title: Exploring effective data representation for saliency detection in image and video
Authors: Ren, Zhixiang
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Issue Date: 2013
Abstract: Visual saliency plays an important role in many applications, such as image/video retargeting, automatic photo composition, vision-based navigation, etc. Visual saliency can guide these applications to only focus on the important regions, thus reduce the complexity of scene analysis. However, current saliency detection methods generate saliency maps with low resolution or quality, which may not satisfy the requirements of some applications. Moreover, compared with a large amount of research efforts on static images, the saliency models for videos are less well established. In this thesis, we study and propose several models to detect salient objects or regions in images and videos. To address the low resolution problem of saliency maps, we improve the current clustering framework by introducing a two-level clustering strategy based on the complexity of images. We first use the adaptive mean shift algorithm to extract superpixels from the input image, then employ the Gaussian Mixture Model (GMM) to group superpixels based on their appearance similarity. The saliency value is finally calculated for each cluster using compactness metric together with modified PageRank propagation. With the superpixel representation and saliency refinement, this region-based method represents the input image in a perceptually meaningful way and highlights salient regions with full resolution and well-defined boundary. The application of our saliency maps in object recognition shows the potential of the proposed method. For video saliency detection, motivated by the psychological findings that human visual system is extremely sensitive to isolated abrupt stimulus and relative movement, we formulate the saliency detection problem as an unified feature reconstruction problem. For temporal saliency, we use patches in neighboring frames to sparsely reconstruct the target patch in the current frame. We measure the temporal saliency of a patch based on its abruptness, which is estimated by the reconstruction error as well as regularizer, and its motion contrast calculated as the difference of reconstruction coefficients. For spatial saliency, we use the surrounding patches in the same frame to sparsely reconstruct the center patch. The reconstruction error and regularizer are used to measure the local center-surround contrast for spatial saliency detection. The excellent performance of our feature reconstruction in both image and video evaluations justifies the plausibility of feature reconstruction as an explanation for visual saliency. The sparse and low-rank representation demonstrates great potential in subspace learning. For different camera motion, we develop different video saliency detection models based on this powerful technique. With respect to moderate camera motion, we jointly estimate the salient foreground motion and the camera motion via robust alignment with sparse and low-rank decomposition. Consecutive frames are transformed and aligned, and then decomposed to a low-rank matrix representing the background and a sparse matrix indicating the objects with salient motion. We also incorporate useful spatial information including global rarity, local center-surround contrast and location priority, into our model to comprehensively detect spatiotemporal saliency. With regards to large camera motion, our alignment-based model may fail to detect moving objects. Thus we propose to use trajectory representation in the sparse and low-rank decomposition for videos with large camera motion. Under the assumption of orthographic projection, the trajectories from background lie in a subspace spanned by three basis trajectories, i.e. the rank of the background matrix is 3. We estimate the compact background model based on this rank constraint. Furthermore, to enforce the spatial connectivity and motion coherency constraint, a Markov Random Field (MRF) is built for foreground estimation. This model is evaluated on a set of challenging sequences and shows superior performance compared to several state-of-the-art methods.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
5.58 MBAdobe PDFView/Open

Page view(s)

Updated on Nov 30, 2020


Updated on Nov 30, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.