{"paper":{"title":"GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"GeoViSTA creates transferable geospatial embeddings by jointly modeling imagery and tabular socioeconomic data with cross-attention.","cross_cats":["cs.CV"],"primary_cat":"cs.LG","authors_text":"Ashok Veeraraghavan, Guha Balakrishnan, Sadeer Al-Kindi, Yuhao Liu","submitted_at":"2026-05-14T05:46:07Z","abstract_excerpt":"Large-scale pretraining on Earth observation imagery has yielded powerful representations of the natural and built environment. However, most existing geospatial foundation models do not directly model the structured socioeconomic covariates typically stored in tabular form. This modality gap limits their ability to capture the complete total environment, which is critical for reasoning about complex environmental, social, and health-related outcomes. In this work, we propose GeoViSTA (Geospatial Vision-Tabular Transformer), a vision-tabular architecture that learns unified geospatial embeddin"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"jointly modeling the physical environment alongside structured socioeconomic context yields highly transferable representations for holistic geospatial inference","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That bilateral cross-attention and geography-aware attention can effectively align irregular tabular tokens with image patches and that the self-supervised masked autoencoding objective produces embeddings that generalize to downstream tasks without significant modality misalignment or information loss.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"GeoViSTA learns unified geospatial embeddings from co-registered imagery and tabular data via bilateral cross-attention and joint masked autoencoding, yielding better linear probing performance on mortality and fire hazard prediction tasks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"GeoViSTA creates transferable geospatial embeddings by jointly modeling imagery and tabular socioeconomic data with cross-attention.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"db93278e3545824c5007160c3fc6b584b015ec2ebc2e51502ab334981d61b692"},"source":{"id":"2605.14406","kind":"arxiv","version":1},"verdict":{"id":"40ad17cf-294e-494a-b924-b89575dab21c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:19:22.595715Z","strongest_claim":"jointly modeling the physical environment alongside structured socioeconomic context yields highly transferable representations for holistic geospatial inference","one_line_summary":"GeoViSTA learns unified geospatial embeddings from co-registered imagery and tabular data via bilateral cross-attention and joint masked autoencoding, yielding better linear probing performance on mortality and fire hazard prediction tasks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That bilateral cross-attention and geography-aware attention can effectively align irregular tabular tokens with image patches and that the self-supervised masked autoencoding objective produces embeddings that generalize to downstream tasks without significant modality misalignment or information loss.","pith_extraction_headline":"GeoViSTA creates transferable geospatial embeddings by jointly modeling imagery and tabular socioeconomic data with cross-attention."},"references":{"count":33,"sample":[{"doi":"","year":2021,"title":"Skilful precipitation nowcasting using deep generative models of radar,","work_id":"585cd8bc-f7d9-40b7-bb37-38ba93eec105","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Downscaling Extreme Precipitation With Wasserstein Regularized Diffusion,","work_id":"b3f6151d-77dc-4bb3-bd99-4f44b0c10569","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Designsafe: New cyberinfrastructure for natural hazards engineering,","work_id":"65e3c977-abb0-45a9-88e9-be85cc5410bb","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Debris segmentation using post-hurricane aerial imagery,","work_id":"1524d882-5c61-42f3-87fd-0d5099921571","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2054,"title":"Air pollution and cardiovascular disease: Jacc state-of- the-art review,","work_id":"a0f041d6-4025-4c68-a85a-e8c9e76ed083","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":33,"snapshot_sha256":"9983d1b953623b960da3845159942b169c7d8bfb7165f58dcff81efc1f2f17b8","internal_anchors":1},"formal_canon":{"evidence_count":2,"snapshot_sha256":"1ad350fe93aa6aa5751a625e8b6f4ec47796930204fb60a7bd21e2544a18f6c0"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}