Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/170129
Title: Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
Authors: Bai, Lubin
Huang, Weiming
Zhang, Xiuyuan
Du, Shihong
Cong, Gao
Wang, Haoyu
Liu, Bo
Keywords: Engineering::Computer science and engineering
Issue Date: 2023
Source: Bai, L., Huang, W., Zhang, X., Du, S., Cong, G., Wang, H. & Liu, B. (2023). Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs. ISPRS Journal of Photogrammetry and Remote Sensing, 201, 193-208. https://dx.doi.org/10.1016/j.isprsjprs.2023.05.006
Project: IAF-PP 
Journal: ISPRS Journal of Photogrammetry and Remote Sensing
Abstract: Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR.
URI: https://hdl.handle.net/10356/170129
ISSN: 0924-2716
DOI: 10.1016/j.isprsjprs.2023.05.006
Schools: School of Computer Science and Engineering 
Rights: © 2023 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Journal Articles

SCOPUSTM   
Citations 20

31
Updated on May 4, 2025

Web of ScienceTM
Citations 50

2
Updated on Oct 23, 2023

Page view(s)

185
Updated on May 4, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.