COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data
Pith reviewed 2026-05-15 16:37 UTC · model grok-4.3
The pith
COP-GEN models cross-modal Earth observation relationships as conditional distributions to generate diverse, physically consistent samples across sensors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
COP-GEN is a multimodal latent diffusion transformer that models the joint distribution of heterogeneous EO modalities at native spatial resolutions by parameterizing cross-modal mappings as conditional distributions, enabling flexible any-to-any conditional generation including zero-shot modality translation without task-specific retraining, while producing diverse yet physically consistent realizations that capture meaningful cross-modal structure and adapt output uncertainty to the conditioning information.
What carries the argument
The multimodal latent diffusion transformer that parameterizes cross-modal mappings as conditional distributions for joint modeling of Earth observation modalities.
Load-bearing premise
That the multi-temporal Sentinel-2 benchmark represents the full joint distribution of heterogeneous Earth observation modalities and that visual plus quantitative checks confirm physical consistency of the generated samples.
What would settle it
A new expanded benchmark or physical-law violation test showing that COP-GEN samples fall outside plausible reflectance or backscatter ranges or cover less of the real manifold than the strongest competing method.
Figures
read the original abstract
Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover. Relationships between modalities are fundamental for data integration but are inherently non-injective: identical conditioning information can correspond to multiple physically plausible observations, and should be parametrised as conditional distributions. Deterministic models, by contrast, collapse toward conditional means and fail to represent the uncertainty and variability required for tasks such as data completion and cross-sensor translation. We introduce COP-GEN, a multimodal latent diffusion transformer that models the joint distribution of heterogeneous EO modalities at their native spatial resolutions. By parameterising cross-modal mappings as conditional distributions, COP-GEN enables flexible any-to-any conditional generation, including zero-shot modality translation without task-specific retraining. Experiments show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities. Qualitative and quantitative analyses demonstrate that the model captures meaningful cross-modal structure and adapts its output uncertainty as conditioning information increases. We release a stochastic benchmark built from multi-temporal Sentinel-2 observations that enables distribution-level comparison of generative EO models. On this benchmark, COP-GEN covers 90% of the real observation manifold and 63% of its per-band reflectance range, while the strongest competing method collapses to 2.8% and 18%, respectively. These results highlight the importance of stochastic generative modeling for EO and motivate evaluation protocols beyond single-reference, pointwise metrics. Website: https://miquel-espinosa.github.io/cop-gen
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces COP-GEN, a multimodal latent diffusion transformer that models the joint distribution of heterogeneous Copernicus Earth observation modalities (optical, radar, elevation, land-cover) at native resolutions. It parameterizes cross-modal mappings as conditional distributions to enable any-to-any generation, including zero-shot translation, and releases a multi-temporal Sentinel-2 benchmark on which it reports 90% coverage of the real observation manifold and 63% per-band reflectance range versus 2.8% and 18% for the strongest baseline.
Significance. If the multimodal claims are substantiated, the work would meaningfully advance stochastic generative modeling for EO by capturing uncertainty and variability in cross-sensor tasks, where deterministic approaches collapse to conditional means. The released benchmark protocol is a constructive contribution that shifts evaluation beyond pointwise metrics.
major comments (3)
- [Abstract, §4] Abstract and §4 (Experiments): The headline quantitative results (90% manifold coverage, 63% reflectance range) are obtained exclusively on the optical multi-temporal Sentinel-2 benchmark. No equivalent coverage or range metrics are reported for cross-modal generation (e.g., optical-to-radar or optical-to-DEM), leaving the central any-to-any multimodal claim without direct quantitative support.
- [§3] §3 (Model Architecture): The description of the latent diffusion transformer does not specify how heterogeneous modalities are tokenized or conditioned at native resolutions, nor the precise form of the training loss and sampling schedule. These omissions make it impossible to assess whether the reported physical consistency arises from the architecture or from benchmark-specific tuning.
- [§4.2] §4.2 (Quantitative Evaluation): The manifold-coverage and reflectance-range metrics are defined only for the optical marginal; the paper provides no ablation or extension showing that the same metrics remain high when the model is conditioned on or generates non-optical modalities, which directly tests the multimodal joint-distribution claim.
minor comments (2)
- [Abstract] The abstract states that the model 'adapts its output uncertainty as conditioning information increases,' but no quantitative plot or table quantifies this adaptation (e.g., variance vs. number of conditioning bands).
- [§4.1] Figure captions and §4.1 should explicitly state the number of samples drawn per conditioning input when computing coverage statistics, as this affects the interpretation of the 90% figure.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which helps clarify the scope of our quantitative claims and the need for additional architectural details. We address each major comment point by point below, indicating revisions where appropriate to strengthen the manuscript's support for the multimodal claims.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (Experiments): The headline quantitative results (90% manifold coverage, 63% reflectance range) are obtained exclusively on the optical multi-temporal Sentinel-2 benchmark. No equivalent coverage or range metrics are reported for cross-modal generation (e.g., optical-to-radar or optical-to-DEM), leaving the central any-to-any multimodal claim without direct quantitative support.
Authors: The referee correctly notes that the manifold coverage and reflectance range metrics are reported specifically for the optical Sentinel-2 benchmark, as these metrics are defined with respect to the multi-temporal optical observation distribution. Cross-modal results in the manuscript are supported by qualitative visualizations and per-modality fidelity metrics (e.g., PSNR/SSIM for radar and DEM) demonstrating physical consistency and diversity. To better substantiate the any-to-any claim, we will revise the manuscript to include additional quantitative results for cross-modal tasks using adapted distribution-level metrics where feasible, and we will explicitly state the scope of the 90% figure as applying to the optical marginal. revision: yes
-
Referee: [§3] §3 (Model Architecture): The description of the latent diffusion transformer does not specify how heterogeneous modalities are tokenized or conditioned at native resolutions, nor the precise form of the training loss and sampling schedule. These omissions make it impossible to assess whether the reported physical consistency arises from the architecture or from benchmark-specific tuning.
Authors: We agree that §3 requires expanded detail for full reproducibility and to allow assessment of the architecture's contribution. In the revised manuscript we will add: (i) modality-specific tokenization at native resolutions using patch embeddings (16×16 for optical/radar, adjusted for DEM/land-cover); (ii) conditioning via cross-attention in the transformer backbone enabling any-to-any mappings; (iii) the training loss as the standard diffusion noise-prediction objective with modality-weighted terms; and (iv) the sampling schedule (linear beta schedule, 1000 timesteps, DDIM inference). These additions will be integrated into the main text rather than left to supplementary material. revision: yes
-
Referee: [§4.2] §4.2 (Quantitative Evaluation): The manifold-coverage and reflectance-range metrics are defined only for the optical marginal; the paper provides no ablation or extension showing that the same metrics remain high when the model is conditioned on or generates non-optical modalities, which directly tests the multimodal joint-distribution claim.
Authors: We acknowledge that the current evaluation focuses the manifold metrics on the optical marginal due to the availability of multi-temporal references for that modality. We will add an ablation study in the revised §4.2 that reports results when the model is conditioned on non-optical inputs (radar, DEM) for optical generation and vice versa, using the same or suitably adapted coverage and range metrics. This will provide direct quantitative support for the joint-distribution modeling across modalities. revision: yes
Circularity Check
No circularity in derivation chain; claims rest on independent benchmark
full rationale
The paper defines COP-GEN as a latent diffusion transformer adapted for multimodal EO data and evaluates it on a separately released multi-temporal Sentinel-2 benchmark. No equations reduce by construction to fitted parameters from the same data, no predictions are statistically forced by input fits, and no load-bearing self-citations or uniqueness theorems collapse the central claims to tautologies. The 90%/63% coverage numbers are presented as empirical measurements on an external benchmark rather than derived from the model's definition. Minor related-work citations do not carry the central multimodal claims.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
COP-GEN is a multimodal latent diffusion transformer that models the joint distribution of heterogeneous EO modalities at their native spatial resolutions... unified transformer diffusion backbone... independent timestep control per modality
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We release a stochastic benchmark... COP-GEN covers 90% of the real observation manifold and 63% of its per-band reflectance range
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Segdiff: Image segmentation with diffusion probabilistic models
Tomer Amit, Tal Shaharbany, Eliya Nachmani, and Lior Wolf. Segdiff: Image segmentation with diffusion probabilistic models.arXiv preprint arXiv:2112.00390, 2021. 3
-
[2]
Tai An, Bin Xue, Chunlei Huo, Shiming Xiang, and Chunhong Pan. Efficient remote sensing image super- resolution via lightweight diffusion models.IEEE Geoscience and Remote Sensing Letters, 2023. 3
work page 2023
-
[3]
Hugues Van Assel, Mark Ibrahim, Tommaso Biancalani, Aviv Regev, and Randall Balestriero. Joint-embedding vs reconstruction: Provable benefits of latent space prediction for self-supervised learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 2
work page 2025
-
[4]
OmniSat: Self-supervised modal- ity fusion for Earth observation.ECCV, 2024
Guillaume Astruc, Nicolas Gonthier, Clement Mallet, and Loic Landrieu. OmniSat: Self-supervised modal- ity fusion for Earth observation.ECCV, 2024. 2, 4
work page 2024
-
[5]
Wele Gedara Chaminda Bandara, Nithin Gopalakrish- nan Nair, and Vishal M Patel. Ddpm-cd: Denoising diffusion probabilistic models as feature extractors for change detection.arXiv preprint arXiv:2206.11892,
-
[6]
All are worth words: A vit backbone for diffusion models
Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22669–22679, 2023. 3, 6
work page 2023
-
[7]
One transformer fits all distributions in multi- modal diffusion at scale
Fan Bao, Shen Nie, Kaiwen Xue, Chongxuan Li, Shi Pu, Yaole Wang, Gang Yue, Yue Cao, Hang Su, and Jun Zhu. One transformer fits all distributions in multi- modal diffusion at scale. InInternational Confer- ence on Machine Learning, pages 1692–1717. PMLR,
-
[8]
Christopher F. Brown, Michal R. Kazmierski, Va- lerie J. Pasquarella, William J. Rucklidge, Masha Samsikova, Chenhui Zhang, Evan Shelhamer, Es- tefania Lahera, Olivia Wiles, Simon Ilyushchenko, Noel Gorelick, Lihui Lydia Zhang, Sophia Alj, Emily Schechter, Sean Askay, Oliver Guinan, Rebecca Moore, Alexis Boukouvalas, and Pushmeet Kohli. Al- phaearth found...
work page 2025
-
[9]
Terrafm: A scalable foundation model for unified multisensor earth observation
Muhammad Sohail Danish, Muhammad Akhtar Mu- nir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer, Jorma Laaksonen, Fa- had Shahbaz Khan, and Salman Khan. Terrafm: A scalable foundation model for unified multisensor earth observation. 2025. 2
work page 2025
-
[10]
Diffusion models beat GANs on image synthesis
Prafulla Dhariwal and Alex Nichol. Diffusion models beat GANs on image synthesis. InAdvances in Neural Information Processing Systems, 2021. 3
work page 2021
-
[11]
Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, and Haohuan Fu. Building bridges across spa- tial and temporal resolutions: Reference-based super- resolution via change priors and conditional diffusion model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2...
work page 2024
-
[12]
Remote sensing image super-resolution via enhanced back-projection networks
Xiaoyu Dong, Zhihong Xi, Xu Sun, and Lina Yang. Remote sensing image super-resolution via enhanced back-projection networks. InIGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, pages 1480–1483. IEEE, 2020. 3
work page 2020
-
[13]
Cop-gen-beta: Unified generative modelling of copernicus imagery thumbnails
Miguel Espinosa, Valerio Marsocci, Yuru Jia, Elliot Crowley, and Mikolaj Czerkawski. Cop-gen-beta: Unified generative modelling of copernicus imagery thumbnails. InProceedings of the Computer Vision and Pattern Recognition Conference, 2025. 3
work page 2025
-
[14]
Taming transformers for high-resolution image syn- thesis
Patrick Esser, Robin Rombach, and Bj ¨orn Ommer. Taming transformers for high-resolution image syn- thesis. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. 3
work page 2021
-
[15]
Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications, 2025
Daniela Szwarcman et al. Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications, 2025. 2
work page 2025
-
[16]
Copernicus: Europes eyes on Earth, 2025
European Commission. Copernicus: Europes eyes on Earth, 2025. Accessed: 2024-12-30. 1
work page 2025
-
[17]
Coomes, Anil Madhavapeddy, Andrew Blake, and Srinivasan Keshav
Zhengpeng Feng, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline Lisaius, Markus Immitzer, James Ball, Clement Atzberger, David A. Coomes, Anil Madhavapeddy, Andrew Blake, and Srinivasan Keshav. Tessera: Temporal embeddings of surface spectra for earth representation and analysis,
-
[18]
Major tom: Expandable datasets for earth observation
Alistair Francis and Mikolaj Czerkawski. Major tom: Expandable datasets for earth observation. InIGARSS 2024-2024 IEEE International Geoscience and Re- mote Sensing Symposium, pages 2935–2940. IEEE,
work page 2024
-
[19]
Masked diffusion transformer is a strong image synthesizer
Shanghua Gao, Pan Zhou, Ming-Ming Cheng, and Shuicheng Yan. Masked diffusion transformer is a strong image synthesizer. InProceedings of the IEEE/CVF international conference on computer vi- sion, pages 23164–23173, 2023. 3
work page 2023
-
[20]
Generative adversar- ial nets.Advances in Neural Information Processing Systems, 27, 2014
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversar- ial nets.Advances in Neural Information Processing Systems, 27, 2014. 3
work page 2014
-
[21]
Jiang He, Yajie Li, Qiangqiang Yuan, et al. Td- iffde: A truncated diffusion model for remote sens- ing hyperspectral image denoising.arXiv preprint arXiv:2311.13622, 2023. 3
-
[22]
Henry Herzog, Favyen Bastani, Yawen Zhang, Gabriel Tseng, Joseph Redmon, Hadrien Sablon, Ryan Park, Jacob Morrison, Alexandra Buraczynski, Karen Far- ley, et al. Olmoearth: Stable latent image model- ing for multimodal earth observation.arXiv preprint arXiv:2511.13655, 2025. 4
-
[23]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, 2020. 3
work page 2020
-
[24]
TerraMind: Large-scale generative multimodality for Earth observation,
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, et al. Terramind: Large-scale generative multimodality for earth observation.arXiv preprint arXiv:2504.11171, 2025. 4
-
[25]
Jia Jia, Geunho Lee, Zhibo Wang, Lyu Zhi, and Yuchu He. Siamese meets diffusion network: Smdnet for en- hanced change detection in high-resolution rs imagery. IEEE Journal of Selected Topics in Applied Earth Ob- servations and Remote Sensing, 2024. 3
work page 2024
-
[26]
Yuru Jia, Valerio Marsocci, Ziyang Gong, Xue Yang, Maarten Vergauwen, and Andrea Nascetti. Can gen- erative geospatial diffusion models excel as discrimi- native geospatial foundation models?arXiv preprint arXiv:2503.07890, 2025. 2, 3
-
[27]
Minghao Jin, Pengwei Wang, and Yusong Li. Hya- gan: remote sensing image cloud removal based on hy- brid attention generation adversarial network.Interna- tional Journal of Remote Sensing, 45(6):1755–1773,
-
[28]
Ran Jing, Fuzhou Duan, Fengxian Lu, Miao Zhang, and Wenji Zhao. Denoising diffusion probabilistic feature-based network for cloud removal in sentinel-2 imagery.Remote Sensing, 15(9):2217, 2023. 3
work page 2023
-
[29]
Analyz- ing and improving the image quality of StyleGAN
Tero Karras, Samuli Laine, and Timo Aila. Analyz- ing and improving the image quality of StyleGAN. In IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, 2020. 3
work page 2020
-
[30]
Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David Lo- bell, and Stefano Ermon. Diffusionsat: A generative foundation model for satellite imagery.arXiv preprint arXiv:2312.03606, 2023. 2, 3
-
[31]
Multi-class segmentation from aerial views using recursive noise diffusion
Benedikt Kolbeinsson and Krystian Mikolajczyk. Multi-class segmentation from aerial views using recursive noise diffusion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 8439–8449, 2024. 3
work page 2024
-
[32]
Improved precision and recall metric for assessing generative models
Tuomas Kynk ¨a¨anniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. In Advances in Neural Information Processing Systems (NeurIPS), 2019. 7
work page 2019
-
[33]
Bryan N. Lawrence, Victoria L. Bennett, James Churchill, Martin Juckes, Philip Kershaw, Stephen Pascoe, Sam Pepler, Matthew Pritchard, and Ag Stephens. Storing and manipulating environmental big data with jasmin. InIEEE Big Data, pages 1–5, San Francisco, 2013. IEEE. 14
work page 2013
-
[34]
Detecting out-of-distribution earth observation images with dif- fusion models
Georges Le Bellier and Nicolas Audebert. Detecting out-of-distribution earth observation images with dif- fusion models. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 481–491, 2024. 3
work page 2024
-
[35]
Mdfl: Multi-domain diffusion-driven feature learn- ing
Daixun Li, Weiying Xie, Jiaqing Zhang, and Yunsong Li. Mdfl: Multi-domain diffusion-driven feature learn- ing. InProceedings of the AAAI conference on artifi- cial intelligence, 2024. 3
work page 2024
-
[36]
Yang Liu, Yexin Wang, Kaichang Di, Man Peng, Wen- hui Wan, and Zhaoqin Liu. A generative adversar- ial network for pixel-scale lunar dem generation from high-resolution monocular imagery and low-resolution dem.Remote Sensing, 14(21):5420, 2022. 3
work page 2022
-
[37]
Yidan Liu, Jun Yue, Shaobo Xia, Pedram Ghamisi, Weiying Xie, and Leyuan Fang. Diffusion models meet remote sensing: Principles, methods, and per- spectives.arXiv preprint arXiv:2404.08926, 2024. 3
-
[38]
Revisiting clas- sifier two-sample tests
David Lopez-Paz and Maxime Oquab. Revisiting clas- sifier two-sample tests. InInternational Conference on Learning Representations (ICLR), 2017. 7
work page 2017
-
[39]
Jiayi Ma, Wei Yu, Chen Chen, Pengwei Liang, Xiao- jie Guo, and Junjun Jiang. Pan-gan: An unsupervised pan-sharpening method for remote sensing image fu- sion.Information Fusion, 62:110–120, 2020. 3
work page 2020
-
[40]
Andrea Meraner, Patrick Ebel, Xiao Xiang Zhu, and Michael Schmitt. Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion.ISPRS Journal of Photogrammetry and Remote Sensing, 166:333–346, 2020. 3
work page 2020
-
[41]
Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehm- cke, Serge Belongie, Christian Igel, and Nico Lang. Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning. InEuropean Con- ference on Computer Vision, pages 164–182. Springer,
-
[42]
Hir-diff: Unsu- pervised hyperspectral image restoration via improved diffusion models
Li Pang, Xiangyu Rui, Long Cui, Hongzhong Wang, Deyu Meng, and Xiangyong Cao. Hir-diff: Unsu- pervised hyperspectral image restoration via improved diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pages 3005–3014, 2024. 3
work page 2024
-
[43]
Z ´arate L Paola, L ´opez S Jes ´us, Arroyo H Christian, and Rinc ´on U Sonia. Correction of banding errors in satellite images with generative adversarial networks (gan).IEEE Access, 11:51960–51970, 2023. 3
work page 2023
-
[44]
Scalable diffu- sion models with transformers
William Peebles and Saining Xie. Scalable diffu- sion models with transformers. InProceedings of the IEEE/CVF international conference on computer vi- sion, pages 4195–4205, 2023. 3
work page 2023
-
[45]
Jiahui Qu, Yuanbo Yang, Wenqian Dong, and Yufei Yang. Lds2ae: Local diffusion shared-specific autoen- coder for multimodal remote sensing image classifi- cation with arbitrary missing modalities. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 14731–14739, 2024. 3
work page 2024
-
[46]
Zero-shot text-to-image generation
Aditya Ramesh, Pavel Pavlov, Gabriel Goh, et al. Zero-shot text-to-image generation. InInternational Conference on Machine Learning, 2021. 3
work page 2021
-
[47]
High- resolution image synthesis with latent diffusion mod- els
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High- resolution image synthesis with latent diffusion mod- els. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. 3, 5
work page 2022
-
[48]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation.International Conference on Medical Image Computing and Computer-Assisted Interven- tion, 2015. 3
work page 2015
-
[49]
Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealis- tic text-to-image diffusion models with deep language understanding, 2022. 3
work page 2022
-
[50]
Neetu Sigger, Quoc-Tuan Vien, Sinh Van Nguyen, Gi- anluca Tozzi, and Tuan Thanh Nguyen. Unveiling the potential of diffusion model-based framework with transformer for hyperspectral image classification.Sci- entific Reports, 14(1):8438, 2024. 3
work page 2024
-
[51]
Jascha Sohl-Dickstein, Eric Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics.Inter- national Conference on Machine Learning, 2015. 3
work page 2015
-
[52]
Datao Tang, Xiangyong Cao, Xingsong Hou, Zhongyuan Jiang, Junmin Liu, and Deyu Meng. Crs-diff: Controllable remote sensing image gener- ation with diffusion model.IEEE Transactions on Geoscience and Remote Sensing, 2024. 2, 3
work page 2024
-
[53]
Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, and Yunsong Li. Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image.IEEE Transactions on Geoscience and Remote Sensing, 2024. 3
work page 2024
-
[54]
Satsynth: Augmenting image- mask pairs through diffusion models for aerial seman- tic segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers, and Laura Leal-Taix´e. Satsynth: Augmenting image- mask pairs through diffusion models for aerial seman- tic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pages 27695–27705, 2024. 3
work page 2024
-
[55]
Galileo: Learning global & local features of many remote sensing modalities
Gabriel Tseng, Anthony Fuller, Marlena Reil, Henry Herzog, Patrick Beukema, Favyen Bastani, James R Green, Evan Shelhamer, Hannah Kerner, and David Rolnick. Galileo: Learning global & local features of many remote sensing modalities. InForty-second International Conference on Machine Learning, 2025. 2, 4
work page 2025
-
[56]
Panop- ticon: Advancing any-sensor foundation models for earth observation
Leonard Waldmann, Ando Shah, Yi Wang, Nils Lehmann, Adam Stewart, Zhitong Xiong, Xiao Xi- ang Zhu, Stefan Bauer, and John Chuang. Panop- ticon: Advancing any-sensor foundation models for earth observation. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 2204– 2214, 2025. 2
work page 2025
-
[57]
Ce Wang and Wanjie Sun. Semantic guided large scale factor remote sensing image super-resolution with generative diffusion prior.ISPRS Journal of Photogrammetry and Remote Sensing, 220:125–138,
-
[58]
Lei Wang, Xin Xu, Yue Yu, Rui Yang, Rong Gui, Zhaozhuo Xu, and Fangling Pu. Sar-to-optical image translation using supervised cycle-consistent adversar- ial networks.Ieee Access, 7:129136–129149, 2019. 3
work page 2019
-
[59]
Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, and Liang Lin. Idf-cr: Iterative dif- fusion process for divide-and-conquer cloud removal in remote-sensing images.IEEE Transactions on Geo- science and Remote Sensing, 2024. 3
work page 2024
-
[60]
Es- rgan: Enhanced super-resolution generative adversar- ial networks
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Es- rgan: Enhanced super-resolution generative adversar- ial networks. InProceedings of the European confer- ence on computer vision (ECCV) workshops, pages 0– 0, 2018. 3
work page 2018
-
[61]
Yi Wang, Zhitong Xiong, Chenying Liu, Adam J. Stewart, Thomas Dujardin, Nikolaos Ioannis Bountos, Angelos Zavras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taix ´e, and Xiao Xiang Zhu. Towards a unified copernicus foundation model for earth vision,
-
[62]
Gcd-ddpm: A generative change detec- tion model based on difference-feature guided ddpm
Yihan Wen, Xianping Ma, Xiaokang Zhang, and Man- On Pun. Gcd-ddpm: A generative change detec- tion model based on difference-feature guided ddpm. IEEE Transactions on Geoscience and Remote Sens- ing, 2024. 3
work page 2024
-
[63]
Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J Stewart, Jo ¨elle Hanna, Damian Borth, Ioannis Pa- poutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu. Neural plasticity-inspired founda- tion model for observing the Earth crossing modalities. arXiv preprint arXiv:2403.15356, 2024. 2, 4
-
[64]
Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, and Zhengxia Zou. Metaearth: A generative founda- tion model for global-scale remote sensing image gen- eration.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1764–1781, 2025. 2, 3
work page 2025
-
[65]
Xiangrong Zhang, Shunli Tian, Guanchun Wang, Huiyu Zhou, and Licheng Jiao. Diffucd: Unsuper- vised hyperspectral image change detection with se- mantic correlation diffusion model.arXiv preprint arXiv:2305.12410, 2023. 3
-
[66]
Y . Zhang, G. Tseng, J. Redmon, H. Herzog, F. Bas- tani, H. Sablon, R. Park, J. Morrison, A. Buraczyn- ski, K. Farley, J. Hansen, A. Howe, P. Johnson, M. Otterlee, H. Pitelka, R. Ratner, T. Schmitt, C. Wil- helm, S. Wood, M. Jacobi, H. Kerner, E. Shelhamer, A. Farhadi, R. Krishna, and P. Beukema. OlmoEarth: Earth observation foundation model.https : / / w...
work page 2025
-
[67]
Changen2: Multi-temporal remote sensing generative change foundation model
Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, and Yanfei Zhong. Changen2: Multi-temporal remote sensing generative change foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 3
work page 2024
-
[68]
Jingyi Zhou, Jiamu Sheng, Peng Ye, Jiayuan Fan, Tong He, Bin Wang, and Tao Chen. Exploring multi- timestep multi-stage diffusion features for hyperspec- tral image classification.IEEE Transactions on Geo- science and Remote Sensing, 2024. 3
work page 2024
-
[69]
Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiy- ing Wang, Lei Jin, and Pin Tao. Diffcr: A fast con- ditional diffusion framework for cloud removal from optical satellite images.IEEE Transactions on Geo- science and Remote Sensing, 62:1–14, 2024. 3 A. Supplementary Material We provide additional qualitative, quantitative, and architectural results that...
-
[70]
Example real-image thumbnails are provided for comparison
reveal that TerraMind (blue) concentrates on a few modes, while COP-GEN (green) captures multiple plausible geographic locations with similar terrain and biome properties, consistent with a non-injective mapping. Example real-image thumbnails are provided for comparison. INPUT MODALITIES S2L2A 293U_659R 407U_358R 456U_995L 352U_1041L 388U_486R Figure 14.G...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.