Conjuring Semantic Similarity

arXiv:2410.16431v4 Announce Type: replace Abstract: The semantic similarity between sample expressions measures the distance between their latent ‘meaning’. These meanings are themselves typically represented by

  • Home
  • Uncategorized
  • GAIR: Location-Aware Self-Supervised Contrastive Pre-Training with Geo-Aligned Implicit Representations

arXiv:2503.16683v2 Announce Type: replace-cross
Abstract: Vision Transformer (ViT) has been widely used in computer vision tasks with excellent results by providing representations for a whole image or image patches. However, ViT lacks detailed localized image representations at arbitrary positions when applied to geospatial tasks that involve multiple geospatial data modalities, such as overhead remote sensing (RS) data, ground-level imagery, and geospatial vector data. Here high-resolution localized representations are vital for modeling geospatial relationships and alignments across modalities. We proposed to solve this representation problem with an implicit neural representation (INR) module extending ViT with Neural Implicit Local Interpolation, which produces a continuous RS image representation covering arbitrary location in the RS image. Based on the INR module, we introduce GAIR, a novel location-aware self-supervised learning (SSL) objective integrating overhead RS data, street view (SV) imagery, and their geolocation metadata. GAIR utilizes three factorized neural encoders to project different modalities into the embedding space, and the INR module is used to further align these representations geographically, which are trained with contrastive learning objectives from unlabeled data. We evaluate GAIR across 9 geospatial tasks and 22 datasets spanning RS image-based, SV image-based, and location embedding-based benchmarks. Experimental results demonstrate that GAIR outperforms state-of-the-art geo-foundation models (GeoFM) and alternative SSL training objectives (e.g., MoCo V3 and MAE) that do not use fine-grained geo-aligned spatial representations. Our results highlight the effectiveness of GAIR in learning generalizable geospatial representations across tasks, spatial scales, and temporal contexts. The project code is available at https://github.com/zpl99/GAIR.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844