Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/180255
Title: ProxyCLIP: proxy attention improves CLIP for open-vocabulary segmentation
Authors: Lan, Mengcheng
Chen, Chaofeng
Ke, Yiping
Wang, Xinjiang
Feng, Litong
Zhang, Wayne
Keywords: Computer and Information Science
Issue Date: 2024
Source: Lan, M., Chen, C., Ke, Y., Wang, X., Feng, L. & Zhang, W. (2024). ProxyCLIP: proxy attention improves CLIP for open-vocabulary segmentation. 2024 European Conference on Computer Vision (ECCV). https://dx.doi.org/10.48550/arXiv.2408.04883
Project: IAF-ICP
Conference: 2024 European Conference on Computer Vision (ECCV)
Abstract: Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring spatially consistent local visual representations, yet they fall short in semantic understanding. This paper introduces ProxyCLIP, an innovative framework designed to harmonize the strengths of both CLIP and VFMs, facilitating enhanced open-vocabulary semantic segmentation. ProxyCLIP leverages the spatial feature correspondence from VFMs as a form of proxy attention to augment CLIP, thereby inheriting the VFMs' robust local consistency and maintaining CLIP's exceptional zero-shot transfer capacity. We propose an adaptive normalization and masking strategy to get the proxy attention from VFMs, allowing for adaptation across different VFMs. Remarkably, as a training-free approach, ProxyCLIP significantly improves the average mean Intersection over Union (mIoU) across eight benchmarks from 40.3 to 44.4, showcasing its exceptional efficacy in bridging the gap between spatial precision and semantic richness for the open-vocabulary segmentation task.
URI: https://hdl.handle.net/10356/180255
URL: http://arxiv.org/abs/2408.04883v1
DOI: 10.48550/arXiv.2408.04883
DOI (Related Dataset): 10.21979/N9/YY8L5O
Schools: College of Computing and Data Science 
Research Centres: S-Lab
Rights: © 2024 ECCV. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Conference Papers

Files in This Item:
File Description SizeFormat 
ProxyCLIP.pdf3.6 MBAdobe PDFView/Open

Page view(s)

70
Updated on Dec 3, 2024

Download(s)

13
Updated on Dec 3, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.