pith. sign in

arxiv: 2605.15558 · v1 · pith:SAV53V4Znew · submitted 2026-05-15 · 📡 eess.IV · cs.CV

Text-RSIR: A Text-Guided Framework for Efficient Remote Sensing Image Transmission and Reconstruction

Pith reviewed 2026-05-19 19:55 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords remote sensingimage transmissiontext-guided reconstructioncross-modal learningdata compressionsatellite imageryimage restorationefficient communication
0
0 comments X p. Extension
pith:SAV53V4Z Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{SAV53V4Z}

Prints a linked pith:SAV53V4Z badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

A text-guided system transmits remote sensing images at roughly 2% of original data volume while reconstructing them to PSNRs of 16-27 dB.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes sending low-resolution remote sensing images paired with compact textual descriptions instead of full high-resolution data. An onboard generator creates spatial and semantic summaries, while a ground station uses a text-conditioned model to restore details through cross-modal learning. This reduces transmitted volume to roughly 2% of the original. Results on Alsat-2B, UC Merced Land Use, and Aerial Image datasets show PSNR values of 16.36 dB, 26.87 dB, and 27.41 dB respectively. Such efficiency would matter for bandwidth-limited scenarios where full imagery cannot be sent.

Core claim

The central claim is that a text-guided remote sensing image transmission system replaces complete high-resolution data with low-resolution images accompanied by compact textual descriptions, reducing the transmitted data volume to approximately 2% of the original size, and achieves reconstruction PSNRs of 16.36 dB, 26.87 dB, and 27.41 dB on the Alsat-2B, UC Merced Land Use, and Aerial Image datasets through cross-modal learning in a text-conditioned image restoration model.

What carries the argument

The onboard text generator producing spatial and semantic summaries combined with the ground-based text-conditioned image restoration model that uses cross-modal learning to recover details from low-resolution inputs.

If this is right

  • Transmission data volume drops to approximately 2% of the full high-resolution image size.
  • Reconstruction achieves PSNRs of 16.36 dB on Alsat-2B, 26.87 dB on UC Merced Land Use, and 27.41 dB on Aerial Image datasets.
  • The restored images maintain semantic coherence for tasks such as land cover analysis.
  • The framework supports efficient image transfer for environmental monitoring and urban mapping under limited bandwidth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be combined with adaptive coding to send only the text component during severe bandwidth drops.
  • Extending the text generator to handle multi-spectral bands might preserve additional diagnostic information without increasing payload size.
  • Deployment on actual satellite links would need to account for transmission errors in the text stream that could degrade restoration.

Load-bearing premise

Compact textual descriptions produced by an onboard generator contain enough spatial and semantic information for a restoration model to recover fine details and maintain coherence from low-resolution images.

What would settle it

Applying the full pipeline to a fourth remote sensing dataset and measuring whether average reconstruction PSNR falls below 15 dB or land-cover classification accuracy drops sharply compared with full-resolution transmission.

Figures

Figures reproduced from arXiv: 2605.15558 by Hao Yang, Man-On Pun, Peifeng Ma, Xianping Ma.

Figure 1
Figure 1. Figure 1: Comparison between conventional and text-guided RS image transfer. (a) Conventional HR image downlink. (b) Proposed transmission of LR imagery with textual descriptors for efficient and information-preserving data delivery. the framework of remote sensing image super-resolution (RSSR), which aims to recover HR imagery from its LR counterpart by learning a mapping between LR and HR image spaces. Existing RS… view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of the proposed TGRSIT. The system transmits lightweight LR–text pairs from satellite to ground, where Text-RSIR reconstructs HR imagery guided by semantic information. Image Decoding Multimodal Encoding TGISR CLR (𝐋) SRI TC (𝐭) The image is a bird’s eye view of a city, featuring a large body of water, a beach, and a roadway... CNN VE (𝐗!) TE (𝐓) CNN Image Encoder CLIP Text Encoder CLI… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed Text-RSIR. TTSA denotes Top-𝑘 Token Selective Attention, and WSA represents Window￾based Self-Attention within the TTST architecture. images and contextual knowledge from the captions, form￾ing the foundation for multimodal SR. 3.3. Multimodal Encoding In this process, the CLR image is processed by both a CNN Image Encoder and a CLIP Image Encoder to generate the CNN VE and SGMs, r… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the SG process. The MetaNet and MHSA blocks are shown in dashed boxes, indicating that they contain learnable parameters, which are specified in square brackets following their names. The TE is generated using the CLIP Text Encoder as follows: T = TECLIP(t), (5) where t denotes the caption of the HR image generated by the VLM, TECLIP denotes Text Encoder, and T represents the resulting TE. 3.4.… view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of results from Text-RSIR and other methods across all four datasets. Subfigures are labeled (a)–(d) [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of Text-RSIR results across all four datasets, including both the TCs and heatmap visualizations of the SGMs. The TCs often provide additional details that may be overlooked by the SGMs, while the SGMs from different iterations in TGISR often attend to different image regions. Subfigures are labeled (a)–(d) [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of the data scalability experiment results on the UC Merced Land Use Dataset [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

High-resolution remote sensing imagery is critical for environmental monitoring, urban mapping, and land cover analysis, but its transmission is often hindered by limited bandwidth and high communication costs. Conventional pipelines transmit full-resolution pixel data, resulting in redundant and inefficient delivery. This paper proposes a text-guided remote sensing image transmission system that replaces complete high-resolution data with low-resolution images accompanied by compact textual descriptions. An onboard text generator produces spatial and semantic summaries, reducing the transmitted data volume to approximately 2\% of the original size. For ground-based reconstruction, a text-conditioned image restoration model is introduced, which leverages cross-modal learning to recover fine spatial details and maintain semantic coherence. Experimental results on the Alsat-2B, UC Merced Land Use, and Aerial Image datasets demonstrate that the proposed framework achieves reconstruction PSNRs of 16.36 dB, 26.87 dB, and 27.41 dB, respectively, enabling efficient and information-preserving image transfer for remote sensing applications. The implementation will be made publicly available at \href{https://github.com/haoyangofficial/textrssr}{GitHub}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Text-RSIR, a text-guided framework for remote sensing image transmission and reconstruction. It replaces full high-resolution pixel data with low-resolution images plus compact textual descriptions generated onboard (claimed to reduce transmitted volume to ~2% of original), then uses a ground-based text-conditioned restoration network leveraging cross-modal learning to recover details and semantic coherence. Experiments on Alsat-2B, UC Merced Land Use, and Aerial Image datasets report reconstruction PSNRs of 16.36 dB, 26.87 dB, and 27.41 dB.

Significance. If the central sufficiency assumption holds and reconstruction quality proves adequate for downstream remote-sensing tasks, the approach could enable substantial bandwidth reduction in bandwidth-constrained satellite or UAV scenarios. The public GitHub release of code would further strengthen reproducibility. However, the notably lower PSNR on Alsat-2B already signals that performance may be scene-dependent and insufficient for applications requiring fine spatial fidelity.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Experimental Results): The reported PSNR of 16.36 dB on Alsat-2B is low enough to question whether compact textual descriptions plus low-resolution input can recover the fine-grained spatial structures (sub-pixel textures, small-object geometries) that dominate remote-sensing utility; no ablation isolating the text component versus low-resolution input alone is presented to support the sufficiency claim.
  2. [§3.1] §3.1 (Onboard Text Generator): The description of the text generator as producing 'spatial and semantic summaries' lacks any mechanism (e.g., dense patch-level captions, geometric tokens, or explicit spatial encoding) that would demonstrably bridge the modality gap between discrete text and continuous high-frequency image content; without such detail or quantitative validation, the 2% data-volume claim rests on an unverified assumption.
  3. [§4.3] §4.3 (Comparison and Baselines): No quantitative comparison to standard compression baselines (e.g., JPEG2000, learned codecs) or to a low-resolution-only reconstruction model is provided; this omission makes it impossible to isolate the incremental benefit of the text guidance and to assess whether the framework truly preserves information beyond what a low-resolution image alone could achieve.
minor comments (2)
  1. [Abstract] The abstract states that implementation will be made publicly available but provides no link or commit hash; this should be added for reproducibility.
  2. [Figures in §4] Figure captions and axis labels in the results section use inconsistent font sizes and lack error bars or standard-deviation shading, reducing clarity of the quantitative claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experimental Results): The reported PSNR of 16.36 dB on Alsat-2B is low enough to question whether compact textual descriptions plus low-resolution input can recover the fine-grained spatial structures (sub-pixel textures, small-object geometries) that dominate remote-sensing utility; no ablation isolating the text component versus low-resolution input alone is presented to support the sufficiency claim.

    Authors: We acknowledge that the PSNR of 16.36 dB on Alsat-2B is lower than on the other datasets and may indicate limitations in recovering fine spatial details for certain complex scenes. The framework prioritizes semantic preservation for downstream tasks such as land-use classification over pixel-perfect fidelity. To directly address the absence of an ablation study, we will add a new experiment in the revised §4 comparing the full text-conditioned model against a low-resolution-only baseline (e.g., bicubic upsampling followed by a standard restoration network). This will isolate the contribution of the text component. revision: yes

  2. Referee: [§3.1] §3.1 (Onboard Text Generator): The description of the text generator as producing 'spatial and semantic summaries' lacks any mechanism (e.g., dense patch-level captions, geometric tokens, or explicit spatial encoding) that would demonstrably bridge the modality gap between discrete text and continuous high-frequency image content; without such detail or quantitative validation, the 2% data-volume claim rests on an unverified assumption.

    Authors: The onboard text generator is based on a fine-tuned vision-language model that produces concise captions emphasizing object categories, approximate spatial layouts, and semantic attributes derived from remote-sensing-specific training data. While it does not employ dense patch-level captions, the generated text incorporates relational descriptors (e.g., 'building cluster in upper-left quadrant') to help bridge the modality gap. We will expand §3.1 with additional implementation details, example text outputs, and a quantitative breakdown of transmitted data sizes (low-resolution image plus text) to substantiate the ~2% claim. revision: yes

  3. Referee: [§4.3] §4.3 (Comparison and Baselines): No quantitative comparison to standard compression baselines (e.g., JPEG2000, learned codecs) or to a low-resolution-only reconstruction model is provided; this omission makes it impossible to isolate the incremental benefit of the text guidance and to assess whether the framework truly preserves information beyond what a low-resolution image alone could achieve.

    Authors: We agree that direct comparisons to established baselines are necessary to quantify the benefit of text guidance. In the revised manuscript we will augment §4.3 with results for JPEG2000 at comparable bit rates, a learned compression baseline, and a low-resolution-only reconstruction model. These additions will clarify the incremental value of the text-conditioned restoration network. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework claims rest on experimental validation rather than self-referential derivation

full rationale

The paper introduces a text-guided transmission and reconstruction pipeline for remote sensing images, replacing full-resolution data with low-resolution images plus compact textual summaries generated onboard. Reconstruction relies on a cross-modal restoration network whose performance is demonstrated via reported PSNR values on three public datasets (Alsat-2B, UC Merced, Aerial Image). No equations, uniqueness theorems, or fitted parameters are shown to reduce by construction to the inputs; the 2% data-volume claim and PSNR figures are presented as empirical outcomes rather than tautological predictions. Self-citations, if present, are not load-bearing for the central sufficiency assumption, which is instead tested through standard dataset experiments. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the framework relies on standard assumptions in multimodal machine learning but introduces no explicit free parameters, new axioms, or invented entities; cross-modal learning is treated as a domain assumption.

axioms (1)
  • domain assumption Cross-modal learning can effectively combine textual and visual information for image reconstruction in remote sensing
    Invoked in the description of the text-conditioned image restoration model that leverages cross-modal learning.

pith-pipeline@v0.9.0 · 5735 in / 1410 out tokens · 47395 ms · 2026-05-19T19:55:47.947996+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 3 internal anchors

  1. [1]

    A. N. Netravali and B. G. Haskell , title =. Springer , year =

  2. [2]

    Wang and A

    Z. Wang and A. Bovik and H. Sheikh and E. Simoncelli , title =. IEEE Transactions on Image Processing , volume =. 2004 , publisher =

  3. [3]

    Kingma and J

    D. Kingma and J. Ba , title =. International Conference on Learning Representations (ICLR) , year =

  4. [4]

    Zhao and O

    H. Zhao and O. Gallo and I. Frosio and J. Kautz , title =. IEEE Transactions on Computational Imaging , volume =

  5. [5]

    A New Public

    Achraf Djerida and Khelifa Djerriri and Moussa Sofiane Karoui and Mohammed El Amin larabi , archivePrefix =. A New Public. 2103.12547 , year =

  6. [6]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    A. Paszke and S. Gross and F. Massa and A. Lerer and J. Bradbury and G. Chanan and T. Killeen and Z. Lin and N. Gimelshein and L. Antiga and A. Desmaison and A. Köpf and E. Yang and Z. DeVito and M. Raison and A. Tejani and S. Chilamkurthy and B. Steiner and L. Fang and J. Bai and S. Chintala , archivePrefix =. 1912.01703 , year =

  7. [7]

    Liu and C

    H. Liu and C. Li and Y. Li and Y. Lee , title =

  8. [8]

    Xiao and Q

    Y. Xiao and Q. Yuan and K. Jiang and J. He and C. -W. Lin and L. Zhang , title =. IEEE Transactions on Image Processing , volume =

  9. [9]

    Vaswani and N

    A. Vaswani and N. Shazeer and N. Parmar and J. Uszkoreit and L. Jones and A. Gomez and. Attention Is All You Need , booktitle =

  10. [10]

    Ravi and H

    S. Ravi and H. Larochelle , title =. International Conference on Learning Representations (ICLR) , year =

  11. [11]

    Chen and K

    B. Chen and K. Chen and M. Yang and Z. Zou and Z. Shi , eprint=

  12. [12]

    Radford and J

    A. Radford and J. W. Kim and C. Hallacy and A. Ramesh and G. Goh and S. Agarwal and G. Sastry and A. Askell and P. Mishkin and J. Clark and G. Krueger and I. Sutskever , title =

  13. [13]

    Sui and X

    J. Sui and X. Ma and X. Zhang and M.-O. Pun , title =. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , year=

  14. [14]

    Xiao and Q

    Y. Xiao and Q. Yuan and K. Jiang and Y. Chen and Q. Zhang and C.-W. Lin , journal=. Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution , year=

  15. [15]

    Yang and S

    Y. Yang and S. Newsam , title =. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS) , year =

  16. [16]

    Xia and J

    G.-S. Xia and J. Hu and F. Hu and B. Shi and X. Bai and Y. Zhong and L. Zhang and X. Lu , journal =

  17. [17]

    IEEE Transactions on Geoscience and Remote Sensing , year=

    Mitigating texture bias: A remote sensing super-resolution method focusing on high-frequency texture reconstruction , author=. IEEE Transactions on Geoscience and Remote Sensing , year=

  18. [18]

    IEEE Transactions on Geoscience and Remote Sensing , volume=

    Two-Stage Spatial-Frequency Joint Learning for Large-Factor Remote Sensing Image Super-Resolution , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=

  19. [19]

    Keys , title =

    R. Keys , title =. IEEE Transactions on Acoustics, Speech, and Signal Processing , volume =

  20. [20]

    Image Super-Resolution Using Deep Convolutional Networks

    C. Dong and C. C. Loy and K. He and X. Tang , title =. 1501.00092 , archivePrefix=

  21. [21]

    Lim and S

    B. Lim and S. Son and H. Kim and S. Nah and K. M. Lee , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , year =

  22. [22]

    Ledig and L

    C. Ledig and L. Theis and F. Huszar and J. Caballero and A. Cunningham and A. Acosta and A. Aitken and A. Tejani and J. Totz and Z. Wang and W. Shi , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  23. [23]

    Lei and Z

    S. Lei and Z. Shi and Z. Zou , title =. IEEE Transactions on Geoscience and Remote Sensing , volume =

  24. [24]

    Li and Y

    H. Li and Y. Yang and M. Chang and H. Feng and Z. Xu and Q. Li and Y. Chen , title =. 2104.14951 , archivePrefix=

  25. [25]

    Kawar and M

    B. Kawar and M. Elad and S. Ermon and J. Song , title =. 2201.11793 , year =

  26. [26]

    Wu and R

    T. Wu and R. Zhao and M. Lv and Z. Jia and L. Li and M. Liu , title =. IEEE Transactions on Geoscience and Remote Sensing , volume =

  27. [27]

    Yang and J

    J. Yang and J. Wright and T. S. Huang and L. Yu , title =. IEEE Transactions on Image Processing , volume =. 2010 , publisher =

  28. [28]

    Yuan and X

    Y. Yuan and X. Meng and W. Sun and G. Yang and L. Wang and J. Peng and Y. Wang , title =. Remote Sensing , volume =

  29. [29]

    J. F. Hu and T. Z. Huang and L. J. Deng , title =. 2109.02079 , year =

  30. [30]

    K. V. Gandikota and P. Chandramouli , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  31. [31]

    Jain and A

    U. Jain and A. Wilson and V. Gulshan , title =. Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS) , year =

  32. [32]

    Fuller and K

    A. Fuller and K. Millard and J. R. Green , title =. Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS) , year =

  33. [33]

    Z. A. Pierrat and others , title =. New Phytologist , year =

  34. [34]

    A. Y. A. Abdelmajeed and R. Juszczak , title =. Remote Sensing , volume =. 2024 , article =

  35. [35]

    Li and Y

    J. Li and Y. Meng and C. Tao and Z. Zhang and X. Yang and Z. Wang and X. Wang and L. Li and W. Zhang , title =. IEEE Transactions on Geoscience and Remote Sensing , year =

  36. [36]

    Guo and Z

    M. Guo and Z. Zhang and H. Liu and Y. Huang , title =. Remote Sensing , volume =. 2022 , article =

  37. [37]

    Kong and Y

    J. Kong and Y. Ryu and S. Jeong and Z. Zhong and W. Choi and J. Kim and K. Lee and J. Lim and K. Jang and J. Chun and K.-M. Kim and R. Houborg , title =. ISPRS Journal of Photogrammetry and Remote Sensing , volume =. 2023 , pages =

  38. [38]

    Liu and L

    D. Liu and L. Zhong and H. Wu and S. Li and Y. Li , title =. Scientific Reports , volume =

  39. [39]

    Wang and B

    P. Wang and B. Bayram and E. Sertel , title =. Earth-Science Reviews , year =

  40. [40]

    Rombach and A

    R. Rombach and A. Blattmann and D. Lorenz and P. Esser and B. Ommer , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

  41. [41]

    Liu and D

    F. Liu and D. Chen and Z. Guan and X. Zhou and J. Zhu and Q. Ye and L. Fu and J. Zhou , title =. IEEE Transactions on Geoscience and Remote Sensing , volume =

  42. [42]

    Berg and F.-F

    Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and F.-F. Li , Title =. 2015 , journal =

  43. [43]

    D. K. Mahanta and T. K. Bhoi and J. Komal and I. Samal and A. Mastinu , title =. Plant Stress , year =

  44. [44]

    Huang and X

    Q. Huang and X. Lu and F. Chen and Q. Zhang and H. Zhang , title =. Remote Sensing , year =

  45. [45]

    Mahanta and T

    S. Mahanta and T. R. Mohanty , title =. Vigyan Varta , year =

  46. [46]

    Rolla and A

    J. Rolla and A. Khuller and K. An and R. Emberson and E. Fielding and L. Schultz and K. Miner , title =. AGU Advances , year =

  47. [47]

    Zhang and Z

    W. Zhang and Z. Tan and Q. Lv and J. Li and B. Zhu and Y. Liu , title =. Remote Sensing , year =

  48. [48]

    Zhang and X

    W. Zhang and X. Yang and Z. Yuan and Z. Chen and Y. Xu , title =. Remote Sensing , year =

  49. [49]

    Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives , author=

  50. [50]

    When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning , author=

  51. [51]

    Referring Remote Sensing Image Segmentation via Bidirectional Alignment Guided Joint Prediction , author=

  52. [52]

    Yao and N

    K. Yao and N. Xu and R. Yang and Y. Xu and Z. Gao and T. Kitrungrotsakul and Y. Ren and P. Zhang and J. Wang and N. Wei and C. Li , archivePrefix=. 2503.11070 , year=

  53. [53]

    Ma and Q

    X. Ma and Q. Wu and X. Zhao and X. Zhang and M.-O. Pun and B. Huang , journal=. 2024 , volume=

  54. [54]

    Ma and X

    X. Ma and X. Zhang and M.-O. Pun and B. Huang , journal=. A Unified Framework With Multimodal Fine-Tuning for Remote Sensing Semantic Segmentation , year=

  55. [55]

    Rong and M

    F. Rong and M. Lan and Q. Zhang and L. Zhang , archivePrefix=. 2503.07266 , year=

  56. [56]

    Wang and W

    Y. Wang and W. Yu and P. Ghamisi , archivePrefix=. Change Captioning in Remote Sensing: Evolution to. 2501.08114 , year=

  57. [57]

    MsEdF: A Multi-stream Encoder-decoder Framework for Remote Sensing Image Captioning

    S. Das and R. Sharma , archivePrefix=. 2502.09282 , year=

  58. [58]

    He and J

    Y. He and J. Zhu and Y. Li and X. Zhang and C. Qiu and J. Wang and Q. Huang and K. Yang , archivePrefix=. Enhancing Remote Sensing Vision-Language Models Through. 2507.16716 , year=

  59. [59]

    Yang and Z

    H. Yang and Z. Jiang and D. Ma and Q. Wang , year =. Multimodal Difference Augmentation Learning for Remote Sensing Change Detection , journal =

  60. [60]

    and Marcellin, Michael W

    Taubman, David S. and Marcellin, Michael W. , year =

  61. [61]

    2017 , number =

    Image Data Compression , author =. 2017 , number =

  62. [62]

    Impact of

    Zabala, Alaitz and Vitulli, Raffaele and Pons, Xavier , journal =. Impact of. 2012 , volume =

  63. [63]

    International Conference on Learning Representations (ICLR) , year =

    Variational Image Compression with a Scale Hyperprior , author =. International Conference on Learning Representations (ICLR) , year =

  64. [64]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Joint Autoregressive and Hierarchical Priors for Learned Image Compression , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  65. [65]

    arXiv preprint , year =

    Task-Oriented Image Transmission for Scene Classification in Unmanned Aerial Systems , author =. arXiv preprint , year =. 2112.10948 , archiveprefix=

  66. [66]

    2022 , institution =

  67. [67]

    2412.15304 , archivePrefix=

    Savitha Viswanadh Kandala and Pramuka Medaranga and Ambuj Varshney , year=. 2412.15304 , archivePrefix=