Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/162800
Title: Image segmentation with less manual labeling effort
Authors: Liu, Weide
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Liu, W. (2022). Image segmentation with less manual labeling effort. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/162800
Abstract: Semantic segmentation is a task that classifies each pixel into a particular class. With the help of deep learning, fully supervised segmentation has achieved remarkable performance. However, fully supervised learning has critical intrinsic limitations, which is that it often requires a prohibitively large number of pixel-level annotated images for model training. Collecting the labeled data can be notoriously expensive in dense prediction tasks like semantic segmentation, instance segmentation, and video segmentation. To alleviate or even free researchers from the high cost of laborious annotations, this thesis tackles the problem mentioned above from two aspects: few-shot segmentation and weakly supervised segmentation. Few-shot segmentation is proposed to learn a network to predict segmentation masks for the novel classes with only a few newly annotated training samples. On the other hand, weakly supervised segmentation is proposed to learn a pixel-level network with weaker annotations. The annotations can be obtained in a much-eased manner, such as bounding boxes, scribbles, image labels, and points, rather than labeling all pixels in an image. In the first aspect, we aim to improve the few-shot segmentation performance with the following innovations: Firstly, we propose a Cross-Reference and Local-Global Condition Network (CRCNet) to concurrently make predictions for both the support image and the query image to mine out the same category objects for the few-shot segmentation. To further improve object feature representation, we develop a local-global condition module to capture both global and local relations. As there is a massive variance in the object appearances, mining foreground regions in images can be multi-step. We also develop a mask refinement module to refine the prediction of the target object regions recurrently. After that, we propose a Query Guided Network (QGNet) to extract the information from the query itself independently to benefit the few-shot segmentation task. We propose a prior extractor to learn the query information from the unlabeled images with our proposed global-local contrastive learning. With the prior extractor, the extraction of query information is detached from the support branch, overcoming the limitation by support, and could obtain more informative query clues to achieve better interaction. In the second aspect, we focus on weakly-supervised segmentation, aiming to predict the pixel-level mask by learning a network supervised with the image-level annotation. The quality of the Class Activation Maps (CAMs) has a crucial impact on the performance of the weakly supervised segmentation model. Weakly supervised image segmentation trained with image-level labels usually suffers from inaccurate coverage of object areas during the generation of the pseudo groundtruth. This is because the CAMs are trained with the classification objective and lack the ability to generalize. We aim to improve the quality of CAMs to improve the weakly-supervised segmentation performance from different aspects. \vspace{-0.05cm} Firstly, we will discuss using a bipartite graph to locate the object-activated areas in two images containing common classes. The matching areas are then used to refine the predicted object regions in the CAMs. In particular, we propose the maximum bipartite matching network (MBMNet) to map the paired images with a bipartite graph. Then we utilize the maximum matching algorithm to locate corresponding areas in the paired images. The matching areas are used to enhance the corresponded feature representations. Based on the enhanced feature representations, we can generate better CAMs with more object regions involved. Finally, we propose a region prototypical network (RPNet) to explore the cross-image object diversity of the training set to enhance the object activated maps for weakly supervised segmentation. Similar object parts across images are identified via region feature comparison. Object confidence is propagated between regions to discover and re-activate new object areas while background regions are suppressed. We aim to obtain a more complete pseudo ground truth for the weakly supervised segmentation based on the re-activated feature maps. In summary, with CRCNet and QGNet, we improved the few-shot segmentation performance with a cross-reference mechanism and global-local contrastive learning. With our proposed MBMNet and RPNet, we enhanced the object activated maps and improved the performance of the weakly supervised segmentation by discovering new object areas. We have achieved new state-of-the-art segmentation performance on public benchmarks for both tasks.
URI: https://hdl.handle.net/10356/162800
DOI: 10.32657/10356/162800
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Liu_Weide_thesis_final.pdf22.71 MBAdobe PDFView/Open

Page view(s)

61
Updated on Dec 8, 2022

Download(s) 50

21
Updated on Dec 8, 2022

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.