pith. sign in

arxiv: 2606.27514 · v1 · pith:4B3XKMAZnew · submitted 2026-06-25 · 💻 cs.CV

Tessellating The Earth

Pith reviewed 2026-06-29 01:41 UTC · model grok-4.3

classification 💻 cs.CV
keywords location encodervoronoi tessellationgeolocationspherical partitioninggeographic priorsemantic tokenssatellite imageryspecies classification
0
0 comments X

The pith

Tessellating the Earth replaces fixed spherical bases with migrating Voronoi sites and shared semantic tokens to concentrate location encoding capacity on informative regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Tessellating the Earth, a geolocation encoder that learns Spherical Voronoi partitions whose sites move during training to devote more representational power to discriminative areas rather than spreading it evenly across the globe. It augments these partitions with a small set of global semantic tokens that extract recurring concepts from satellite imagery so that distant sites sharing similar environments can reference the same knowledge at inference time. The resulting encoder improves accuracy on multiple geospatial classification and regression benchmarks and supplies the strongest geographic prior yet reported for fine-grained species classification on iNaturalist-2018.

Core claim

Tessellating the Earth builds location encoders from learnable Spherical Voronoi partitions whose sites migrate end-to-end toward discriminative regions, augmented by a fixed set of global semantic tokens that transfer semantic knowledge distilled from satellite imagery without task-specific supervision or additional losses.

What carries the argument

Learnable Spherical Voronoi partitions whose sites migrate during training together with a compact vocabulary of global semantic tokens that enable cross-location semantic sharing.

If this is right

  • Location encoders can allocate capacity unevenly, giving more resolution to populated or variable regions instead of uniform global coverage.
  • Geographic priors for downstream vision tasks can combine local spatial structure with global semantic reuse learned directly from imagery.
  • Task-relevant geographic boundaries emerge automatically from gradient descent rather than from hand-designed bases.
  • Shared semantic tokens allow environmental concepts to transfer between geographically distant but visually similar locations without explicit alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same migrating-partition idea could be tested on other spherical domains such as atmospheric or oceanographic data where uniform grids waste capacity.
  • If site migration proves stable, the method might reduce reliance on manually engineered geographic features across a wider range of spatial prediction problems.
  • Semantic tokens distilled once from satellite imagery might serve as reusable priors for any task whose labels correlate with land cover or climate patterns.
  • Evaluating the encoder on regression tasks with finer spatial granularity would test whether the capacity concentration scales beyond the reported benchmarks.

Load-bearing premise

End-to-end migration of Voronoi sites will reliably concentrate capacity on discriminative regions and a fixed set of global semantic tokens can distill and transfer semantic knowledge from imagery without task-specific supervision or additional losses.

What would settle it

After training, inspect whether the learned Voronoi sites have clustered around urban or ecologically variable land areas rather than open ocean, and measure whether removing the semantic tokens erases the reported gains on iNaturalist species classification.

Figures

Figures reproduced from arXiv: 2606.27514 by Brian Wei, Daniel Cher, Eric Xing, Hamza Iqbal, Nathan Jacobs.

Figure 1
Figure 1. Figure 1: Learned spatial decomposition. TTE partitions the sphere into learnable Voronoi regions that adapt to geographic complexity. (a) After training, sites con￾centrate on land masses, clustering densely along coastlines and ecological boundaries while leaving oceans sparsely covered. (b) Each site carries a learned embedding, with neighboring sites sharing similar embeddings (as indicated by color proximity), … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of TTE. The location pathway (bottom) maps a coordinate onto S 2 , where learnable Voronoi sites perform soft assignment; the resulting embedding attends over shared semantic tokens to produce the location embedding. The image pathway (top) encodes the co-located satellite image with a frozen ViT. The resulting image embedding enters the contrastive objective directly and supervises the tokens dur… view at source ↗
Figure 3
Figure 3. Figure 3: Learned global semantic tokens. Each sub-figure displays a specific seman￾tic token with three top-attending satellite image patches (top) and the corresponding global spatial attention map (bottom). The visualization demonstrates that the model’s internal tokens naturally specialize in distinct geographic and climatic features without explicit supervision [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Global semantic token mechanism. Both paths attend to shared learnable tokens. The image path (training only) produces peaked attention via a fixed low temperature; the location path learns to match this distribution. Three losses jointly optimize the system. A visualization of the token alignment is shown in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Biome classification predictions. Ground truth WWF biome labels (left) with predictions from a host of location encoders. We used a linear probe on each en￾coder’s embeddings. TTE produces more spatially coherent predictions with smoother biome boundaries, while other models show more fragmentation within homogeneous regions [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study. Top: performance vs. Voronoi site count (left) and token count R (right). Performance peaks at 4,096 sites and R = 64 tokens. Bottom: compo￾nent ablations. Each row removes or modifies one component from the full model. Bio Eco Ctry Cls ↑ Temp Elev Pop Cali Reg ↑ TTE (full) 77.8 67.4 94.8 80.0 0.946 0.839 0.790 0.532 0.777 Site ablations fixed sites 66.6 45.4 89.1 67.1 0.936 0.644 0.727 0.2… view at source ↗
Figure 7
Figure 7. Figure 7: Dynamics of Site Migration. To maintain visual clarity, we visualize the training trajectories of a random 30% subset of Voronoi sites. Markers indicate initial positions (■) and converged positions (•). Sites dynamically move toward geographic boundaries. Notably, many migrate offshore to specialize on narrow coastal segments, providing higher-resolution representational capacity where geographic features… view at source ↗
Figure 1
Figure 1. Figure 1: Site migration across regions. Voronoi sites are initialized on a Fibonacci lattice. Markers indicate initial positions (■) and converged positions (•). Connecting lines show displacement. Random 30% subset is shown. Learned Temperature Distribution [PITH_FULL_IMAGE:figures/full_fig_p020_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Learned site temperatures. Each site is plotted at its converged position and colored by its learned temperature τk. Higher temperatures (warm colors) indicate sharper partition boundaries. We randomly select 30% of sites for visual discrimination. Global Semantic Tokens [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Spatial attention maps of global semantic tokens. [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-token mean attention. Mean location-path attention received by each of the 64 semantic tokens, sorted in descending order, over a dense global query sample. 24/64 tokens lie above the uniform baseline (dashed). The per-query attention entropy is 0.79, corresponding to ≈27 effective tokens per query, indicating the encoder draws on a broad token vocabulary rather than collapsing onto a few. D Effective … view at source ↗
Figure 5
Figure 5. Figure 5: Performance vs. Voronoi site count with reduced data. All models trained at 50% data scale for 70 epochs. Classification columns report accuracy (%); regression columns report R 2 . Coordinate conditioning produces mixed results. It improves EcoRegion (+2.5) and California Housing (+0.112) but degrades Biome (−6.1), Country (−3.8), and Elevation (−0.172). The gains on EcoRegion and Cali-Housing likely re￾f… view at source ↗
Figure 6
Figure 6. Figure 6: Effective receptive field: top-k site assignment at inference. Average classification accuracy (left) and regression R 2 (right) as a function of the number of retained sites k. All evaluations use a single model trained with full soft assignment (k=4,096). Only the inference-time truncation varies. F Inference-Time Augmentation RANGE [9] augments a parametric location encoder with image features re￾trieve… view at source ↗
Figure 7
Figure 7. Figure 7: Site initialization strategies. Initial site positions for each initialization method. (a) Fibonacci lattice filtered to land areas distributes sites approximately uniformly. (b,c) Cluster centroids derived from SatCLIP and RANGE embeddings concentrate sites in regions of high embedding complexity [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
read the original abstract

Geolocation encoders, which map geographic coordinates to learned representations, are emerging as an effective means of capturing visual and non-visual characteristics from a latitude-longitude pair alone. However, existing approaches project coordinates onto fixed bases (e.g., spherical harmonics), allocating representational capacity uniformly and devoting equal resources to the open ocean and to a developing city. We introduce Tessellating the Earth (TTE), a location encoder built from learnable Spherical Voronoi partitions that concentrates representational capacity where it is needed in a fully differentiable, end-to-end manner. Each Voronoi site carries its own embedding and migrates during training toward discriminative areas. To bridge the gap between local spatial structure and global semantic understanding, we introduce \emph{global semantic tokens}: a set of shared learnable concept tokens that distill semantic knowledge from the satellite imagery into a compact vocabulary the location encoder can reference at inference, enabling geographically distant sites covering similar environments to share semantics. TTE sets a new state of the art for location encoders across a suite of geospatial classification and regression tasks, and achieves the strongest results when used as a geographic prior for fine-grained species classification on iNaturalist-2018. Code, and weights are available at https://github.com/mvrl/TTE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces Tessellating the Earth (TTE), a location encoder that replaces fixed bases (e.g., spherical harmonics) with learnable Spherical Voronoi partitions whose sites migrate differentiably during training to concentrate capacity on discriminative geographic regions. It further adds a fixed set of global semantic tokens that distill semantic knowledge from satellite imagery, allowing distant sites to share semantics at inference without task-specific losses. The method is reported to set a new state of the art on a suite of geospatial classification and regression tasks and to yield the strongest results when used as a geographic prior for fine-grained species classification on iNaturalist-2018; code and weights are released.

Significance. If the central mechanisms are shown to function as described, the work would be significant for geospatial representation learning by demonstrating that end-to-end adaptive partitioning and shared semantic tokens can outperform uniform bases while improving efficiency. The public release of code and weights strengthens reproducibility and enables direct follow-up.

major comments (2)
  1. [Abstract] The central SOTA claim rests on two mechanisms whose correctness is not load-bearingly demonstrated: (1) that spherical Voronoi sites migrate reliably to concentrate capacity on discriminative land regions rather than oceans or uniform areas, and (2) that the fixed global semantic tokens successfully distill and transfer semantics across distant sites without additional supervision or losses. The provided abstract supplies no equations, training objective, stability analysis, or ablation results for either component, so the reported gains cannot be verified from the given text.
  2. [Abstract] No experimental details, baselines, metrics, ablation studies, or error analysis are supplied for the geospatial tasks or the iNaturalist-2018 prior experiment. Without these, the claim that TTE "sets a new state of the art" and "achieves the strongest results" cannot be evaluated for soundness or compared to prior location encoders.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments. The abstract is intentionally concise, but the full manuscript provides the requested details on mechanisms, objectives, and experiments. We respond point by point below.

read point-by-point responses
  1. Referee: [Abstract] The central SOTA claim rests on two mechanisms whose correctness is not load-bearingly demonstrated: (1) that spherical Voronoi sites migrate reliably to concentrate capacity on discriminative land regions rather than oceans or uniform areas, and (2) that the fixed global semantic tokens successfully distill and transfer semantics across distant sites without additional supervision or losses. The provided abstract supplies no equations, training objective, stability analysis, or ablation results for either component, so the reported gains cannot be verified from the given text.

    Authors: The abstract summarizes the approach at a high level. Section 3 of the full manuscript defines the differentiable Spherical Voronoi formulation, including the site migration equations, the composite training objective, and analysis of convergence behavior. Section 5 contains ablations and visualizations confirming that sites concentrate on discriminative land regions rather than oceans. Section 4 specifies the global semantic tokens, their distillation process from satellite imagery, and the lack of task-specific losses, with results showing semantic transfer across distant sites. We can revise the abstract to include one additional sentence referencing these components. revision: partial

  2. Referee: [Abstract] No experimental details, baselines, metrics, ablation studies, or error analysis are supplied for the geospatial tasks or the iNaturalist-2018 prior experiment. Without these, the claim that TTE "sets a new state of the art" and "achieves the strongest results" cannot be evaluated for soundness or compared to prior location encoders.

    Authors: The abstract reports the high-level outcome. The Experiments section supplies the full experimental protocol: task definitions, baselines (including prior location encoders), metrics, ablation tables, and error breakdowns for the geospatial suite. The iNaturalist-2018 results include the geographic prior setup, quantitative comparisons, and analysis. These elements are presented with sufficient detail to support the SOTA claims and enable comparison. revision: no

Circularity Check

0 steps flagged

No circularity: new construction with no self-referential derivations or fitted predictions

full rationale

The paper introduces TTE as a novel location encoder using learnable spherical Voronoi partitions whose sites migrate end-to-end and a set of global semantic tokens distilled from imagery. No equations, uniqueness theorems, or parameter-fitting steps are described that reduce any claimed performance gain to a self-definition, a renamed input, or a self-citation chain. The SOTA claims rest on empirical results across tasks rather than on any derivation that is circular by construction. This is the expected non-finding for a methods paper presenting an independent architectural contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view yields no explicit free parameters, axioms, or invented entities beyond the high-level description of Voronoi sites and semantic tokens; no numerical constants or background lemmas are stated.

pith-pipeline@v0.9.1-grok · 5749 in / 1113 out tokens · 29994 ms · 2026-06-29T01:41:27.952620+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 11 canonical work pages · 5 internal anchors

  1. [1]

    Advances in neural information processing systems35, 23716– 23736 (2022) 5

    Alayrac, J.B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Men- sch, A., Millican, K., Reynolds, M., et al.: Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems35, 23716– 23736 (2022) 5

  2. [2]

    ACM computing surveys (CSUR)23(3), 345–405 (1991) 2

    Aurenhammer, F.: Voronoi diagrams—a survey of a fundamental geometric data structure. ACM computing surveys (CSUR)23(3), 345–405 (1991) 2

  3. [3]

    Qwen3-VL Technical Report

    Bai, S., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025) 5

  4. [4]

    In: The Thirteenth International Conference on Learning Representations (ICLR) (2025) 4

    Cai, D., Balestriero, R.: No location left behind: Measuring and improving the fairness of implicit representations for earth data. In: The Thirteenth International Conference on Learning Representations (ICLR) (2025) 4

  5. [5]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

    Cher, D., Wei, B., Sastry, S., Jacobs, N.: Vectorsynth: Fine-grained satellite image synthesis with structured semantics. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 7019–7029 (March

  6. [6]

    In: International conference on machine learning

    Cole, E., Van Horn, G., Lange, C., Shepard, A., Leary, P., Perona, P., Loarie, S., Mac Aodha, O.: Spatial implicit neural representations for global-scale species mapping. In: International conference on machine learning. pp. 6320–6342. PMLR (2023) 1, 10

  7. [7]

    Advances in neural information processing systems36, 49250–49267 (2023) 5

    Dai, W., Li, J., Li, D., Tiong, A., Zhao, J., Wang, W., Li, B., Fung, P.N., Hoi, S.: Instructblip: Towards general-purpose vision-language models with instruction tuning. Advances in neural information processing systems36, 49250–49267 (2023) 5

  8. [8]

    In: International Conference on Learning Representations (ICLR) (2024) 5

    Darcet, T., Oquab, M., Mairal, J., Bojanowski, P.: Vision transformers need reg- isters. In: International Conference on Learning Representations (ICLR) (2024) 5

  9. [9]

    In: Proceed- ings of the Computer Vision and Pattern Recognition Conference

    Dhakal, A., Sastry, S., Khanal, S., Ahmad, A., Xing, E., Jacobs, N.: Range: Re- trieval augmented neural fields for multi-resolution geo-embeddings. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 24680–24689 (2025) 4, 10, 1, 5, 7, 8

  10. [10]

    Spherical Voronoi: Directional Appearance as a Differentiable Partition of the Sphere

    Di Sario, F., Rebain, D., Verbin, D., Grangetto, M., Tagliasacchi, A.: Spherical voronoi: Directional appearance as a differentiable partition of the sphere. arXiv preprint arXiv:2512.14180 (2025) 2, 4, 5, 7

  11. [11]

    BioScience67(6), 534–545 (2017) 9

    Dinerstein, E., Olson, D., Joshi, A., Vynne, C., Burgess, N.D., Wikramanayake, E., et al.: An ecoregion-based approach to protecting half the terrestrial realm. BioScience67(6), 534–545 (2017) 9

  12. [12]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 5 16 D. Cher et al

  13. [13]

    In: International Conference on Learning Representations (ICLR) (2019) 10

    Gao, R., Xie, J., Zhu, S.C., Wu, Y.N.: Learning grid cells as vector representation of self-position coupled with matrix representation of self-motion. In: International Conference on Learning Representations (ICLR) (2019) 10

  14. [14]

    Math- ematics of Computation66(218), 699–717 (1997).https://doi.org/10.1090/ S0025-5718-97-00828-44

    Gelb, A.: The resolution of the gibbs phenomenon for spherical harmonics. Math- ematics of Computation66(218), 699–717 (1997).https://doi.org/10.1090/ S0025-5718-97-00828-44

  15. [15]

    Scientific Data5(1) (2018) 9

    Hooker, J., Duveiller, G., Cescatti, A.: A global dataset of air temperature derived from satellite remote sensing and weather stations. Scientific Data5(1) (2018) 9

  16. [16]

    In: International conference on machine learning

    Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman, A., Carreira, J.: Per- ceiver: General perception with iterative attention. In: International conference on machine learning. pp. 4651–4664. PMLR (2021) 5

  17. [17]

    arXiv preprint arXiv:2505.18461 (May 2025),https://arxiv.org/abs/2505

    Karimzadeh, M., Wang, Z., Crooks, J.L.: Performance and generalizability impacts of incorporating location encoders into deep learning for dynamic pm2.5 estima- tion. arXiv preprint arXiv:2505.18461 (May 2025),https://arxiv.org/abs/2505. 184611

  18. [18]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Klemmer, K., Rolf, E., Robinson, C., Mackey, L., Rußwurm, M.: Satclip: Global, general-purpose location embeddings with satellite imagery. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 4347–4355 (2025) 1, 2, 4, 8, 9, 10, 5

  19. [19]

    EarthArXiv (2025) 1

    Klemmer, K., Rolf, E., Rußwurm, M., Camps-Valls, G., et al.: Earth embeddings: Towards ai-centric representations of our planet. EarthArXiv (2025) 1

  20. [20]

    Nature Ecology & Evolution7(11), 1778–1789 (2023) 1

    Lang, N., Jetz, W., Schindler, K., Wegner, J.D.: A high-resolution canopy height model of the earth. Nature Ecology & Evolution7(11), 1778–1789 (2023) 1

  21. [21]

    In: Pro- ceedings of the 36th International Conference on Machine Learning (ICML)

    Lee, J., Lee, Y., Kim, J., Kosiorek, A.R., Choi, S., Teh, Y.W.: Set transformer: A framework for attention-based permutation-invariant neural networks. In: Pro- ceedings of the 36th International Conference on Machine Learning (ICML). Pro- ceedings of Machine Learning Research, vol. 97, pp. 3744–3753. PMLR (2019) 5

  22. [22]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Li, J., Feng, D., Li, D., Fan, D., Chang, X., Tan, X., Chao, W., Lu, Y., Zhou, J., Ba- tra,D.,Parikh,D.,Girshick,R.:Blip-2:Bootstrappinglanguage-imagepre-training with frozen image encoders and large language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20076–20086 (2023).https://doi.org/10.1109...

  23. [23]

    Decoupled Weight Decay Regularization

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017) 7, 1

  24. [24]

    In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV) (2019) 1, 3, 10

    Mac Aodha, O., Cole, E., Perona, P.: Presence-only geographical priors for fine- grained image classification. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV) (2019) 1, 3, 10

  25. [25]

    International Journal of Geographical Information Science36(4), 639–673 (2022) 4

    Mai, G., Janowicz, K., Hu, Y., Gao, S., Yan, B., Zhu, R., Cai, L., Lao, N.: A review of location encoding for geoai: Methods and applications. International Journal of Geographical Information Science36(4), 639–673 (2022) 4

  26. [26]

    In: The Eighth International Conference on Learning Representations (ICLR) (2020) 2, 4

    Mai, G., Janowicz, K., Yan, B., Zhu, R., Cai, L., Lao, N.: Multi-scale represen- tation learning for spatial feature distributions using grid cells. In: The Eighth International Conference on Learning Representations (ICLR) (2020) 2, 4

  27. [27]

    In: Proceedings of the International Conference on Machine Learning (ICML) (2023) 4, 10, 8

    Mai, G., Lao, N., He, Y., Song, J., Ermon, S.: Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. In: Proceedings of the International Conference on Machine Learning (ICML) (2023) 4, 10, 8

  28. [28]

    ISPRS Journal of Photogrammetry and Remote Sensing202, 439–462 (2023) 2, 4, 10 Tessellating The Earth 17

    Mai, G., Xuan, Y., Zuo, W., He, Y., Song, J., Ermon, S., Janowicz, K., Lao, N.: Sphere2vec: A general-purpose location representation learning over a spherical surface for large-scale geospatial predictions. ISPRS Journal of Photogrammetry and Remote Sensing202, 439–462 (2023) 2, 4, 10 Tessellating The Earth 17

  29. [29]

    Wiley Series in Probability and Statistics, John Wiley and Sons, 2nd edn

    Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. Wiley Series in Probability and Statistics, John Wiley and Sons, 2nd edn. (2000) 2

  30. [30]

    Statistics & Probability Let- ters33(3), 291–297 (1997) 9

    Pace, R.K., Barry, R.: Sparse spatial autoregressions. Statistics & Probability Let- ters33(3), 291–297 (1997) 9

  31. [31]

    In: Proceedings of the 38th International Conference on Machine Learning (ICML)

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transfer- able visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 139, pp. ...

  32. [32]

    In: ICML

    Rahaman, N., Baratin, A., Arpit, D., Draxler, F., Lin, M., Hamprecht, F.A., Ben- gio, Y., Courville, A.: On the spectral bias of neural networks. In: ICML. pp. 5301–5310 (2019) 2, 4

  33. [33]

    Rao, A., Crasto, R., Ooms, T., Rolnick, D., Klemmer, K., Rußwurm, M.: Localized, high-resolution geographic representations with slepian functions (2026) 4

  34. [34]

    In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

    Rebain, D., Jiang, W., Yazdani, S., Li, K., Yi, K.M., Tagliasacchi, A.: DeRF: De- composed radiance fields. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 14153–14161 (2021) 2, 4

  35. [35]

    Ocean Dynam- ics58(5–6), 475–498 (2008).https://doi.org/10.1007/s10236-008-0157-22

    Ringler, T., Ju, L., Gunzburger, M.: A multiresolution method for climate system modeling: application of spherical centroidal Voronoi tessellations. Ocean Dynam- ics58(5–6), 475–498 (2008).https://doi.org/10.1007/s10236-008-0157-22

  36. [36]

    Nature Communications12(1), 4392 (2021) 9

    Rolf, E., Proctor, J., Carleton, T., Bolliger, I., Shankar, V., Ishihara, M., Recht, B., Hsiang, S.: A generalizable and accessible approach to machine learning with global satellite imagery. Nature Communications12(1), 4392 (2021) 9

  37. [37]

    In: Proceedings of the International Conference on Learning Representations (ICLR) (2024) 2, 4

    Rußwurm, M., Klemmer, K., Rolf, E., Zbinden, R., Tuia, D.: Geographic loca- tion encoding with spherical harmonics and sinusoidal representation networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2024) 2, 4

  38. [38]

    In: Winter Conference on Applications of Computer Vision

    Sastry, S., Khanal, S., Dhakal, A., Ahmad, A., Jacobs, N.: Taxabind: A unified embedding space for ecological applications. In: Winter Conference on Applications of Computer Vision. IEEE/CVF (2025) 10, 8

  39. [39]

    In: IEEE/ISPRS Workshop: Large Scale Com- puter Vision for Remote Sensing (EARTHVISION) (2024) 1

    Sastry, S., Khanal, S., Dhakal, A., Jacobs, N.: Geosynth: Contextually-aware high- resolution satellite image synthesis. In: IEEE/ISPRS Workshop: Large Scale Com- puter Vision for Remote Sensing (EARTHVISION) (2024) 1

  40. [40]

    In: NeurIPS

    Sitzmann, V., Martel, J.N., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: NeurIPS. vol. 33, pp. 7462–7473 (2020) 4

  41. [41]

    Monthly Weather Review140(9), 3090–3105 (2012).https://doi.org/10.1175/MWR-D-11-00215.12

    Skamarock, W.C., Klemp, J.B., Duda, M.G., Fowler, L.D., Park, S.H., Ringler, T.D.: A multiscale nonhydrostatic atmospheric model using centroidal Voronoi tesselations and C-grid staggering. Monthly Weather Review140(9), 3090–3105 (2012).https://doi.org/10.1175/MWR-D-11-00215.12

  42. [42]

    In: NeurIPS

    Tancik, M., Srinivasan, P.P., Mildenhall, B., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS. vol. 33, pp. 7537–7547 (2020) 4

  43. [43]

    arXiv e-prints pp

    Tseng, G., Fuller, A., Reil, M., Herzog, H., Beukema, P., Bastani, F., Green, J.R., Shelhamer, E., Kerner, H., Rolnick, D.: Galileo: Learning global and local features in pretrained remote sensing models. arXiv e-prints pp. arXiv–2502 (2025) 1

  44. [44]

    In: CVPR

    Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S.: The inaturalist species classification and detection dataset. In: CVPR. pp. 8769–8778 (2018) 9, 10 18 D. Cher et al

  45. [45]

    Advances in Neural Information Processing Systems36, 8690–8701 (2023) 1, 2, 4, 10, 8

    Vivanco Cepeda, V., Nayak, G.K., Shah, M.: Geoclip: Clip-inspired alignment be- tween locations and images for effective worldwide geo-localization. Advances in Neural Information Processing Systems36, 8690–8701 (2023) 1, 2, 4, 10, 8

  46. [46]

    In: Proceedings of the ACM SIGGRAPH Con- ference on Computer Graphics

    Wang, J., Ren, P., Gong, M., Snyder, J., Guo, B.: All-frequency rendering of dy- namic, spatially-varying reflectance. In: Proceedings of the ACM SIGGRAPH Con- ference on Computer Graphics. pp. 133:1–133:10 (2009).https://doi.org/10. 1145/1576246.15763634

  47. [47]

    arXiv preprint arXiv:2211.07044 (2022) 9, 1

    Wang, Y., Braham, N.A.A., Xiong, Z., Liu, C., Albrecht, C.M., Zhu, X.X.: Ssl4eo- s12: A large-scale multi-modal, multi-temporal dataset for self-supervised learning in earth observation. arXiv preprint arXiv:2211.07044 (2022) 9, 1

  48. [48]

    MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

    Xiao,Z.,Ma,Q.,Gu,M.,Chen,C.c.J.,Chen,X.,Ordonez,V.,Mohan,V.:Metaem- bed: Scaling multimodal retrieval at test-time with flexible late interaction. arXiv preprint arXiv:2509.18095 (2025) 5

  49. [49]

    Yin, Y., Liu, Z., Zhang, Y., Wang, S., Shah, R.R., Zimmermann, R.: Gps2vec: Towards generating worldwide gps embeddings. In: ACM SIGSPATIAL (2019) 4 Supplementary Material: Tessellating The Earth A Implementation Details Pretraining.We train on the S2-100K [18] preprocessed dataset of globally sam- pled Sentinel-2 multispectral images, using 13-band input...