arxiv: 2604.12315 · v1 · submitted 2026-04-14 · 💻 cs.CV · cs.MM

Recognition: unknown

GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality

Zhiwei Zhang , Xingyuan Zeng , Xinkai Kong , Kunquan Zhang , Haoyuan Liang , Bohan Shi , Juepeng Zheng , Jianxi Huang

show 2 more authors

Yutong Lu Haohuan Fu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:51 UTC · model grok-4.3

classification 💻 cs.CV cs.MM

keywords terraced parcelsmultimodal datasetparcel boundary extractionremote sensingdigital elevation modelagricultural monitoringimage-text-DEM

0 comments

The pith

Adding text descriptions and elevation data to satellite images improves terraced parcel boundary extraction in complex mountainous terrain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GTPBD-MM, the first global multimodal benchmark that pairs high-resolution optical imagery of terraced farmland with structured text descriptions and digital elevation model data. It proposes the ETTerra network as a baseline that fuses these three inputs to extract parcel boundaries. Experiments show that the added semantic and geometric information produces more accurate, coherent, and structurally consistent results than image-only methods, especially where terrain steps, irregular edges, and regional differences make visual cues insufficient. This fills a gap left by prior benchmarks that focus on flat, regular farmland scenes.

Core claim

The authors establish that textual semantics and terrain geometry supply complementary cues beyond visual appearance, enabling more accurate, coherent, and structurally consistent delineation of terraced parcels under aligned image-text-DEM conditions.

What carries the argument

The Elevation and Text guided Terraced parcel network (ETTerra), a multimodal fusion model that incorporates optical imagery, structured text, and DEM data to guide boundary extraction in stepped terrain.

Load-bearing premise

The collected image-text-DEM data is well aligned and representative of global terraced heterogeneity, and the network fuses the modalities without introducing new biases or artifacts.

What would settle it

A controlled test on held-out terraced regions where models using the full multimodal inputs show no accuracy or coherence gain over strong image-only baselines would falsify the claim of complementary benefits.

Figures

Figures reproduced from arXiv: 2604.12315 by Bohan Shi, Haohuan Fu, Haoyuan Liang, Jianxi Huang, Juepeng Zheng, Kunquan Zhang, Xingyuan Zeng, Xinkai Kong, Yutong Lu, Zhiwei Zhang.

**Figure 2.** Figure 2: Overview of GTPBD-MM. Top: unified multimodal design with aligned modalities, hierarchical annotations, and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Dataset statistics of GTPBD-MM. (a) Regional- and Country-level area distribution. (b) Word cloud of text descriptions. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of our proposed Elevation and Text [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of different methods on GTPBD-MM. Red boxes highlight typical semantic confusion in [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Edge-level error analysis of different methods on [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: More representative cases of GTPBD-MM from different regions. From top to bottom, the samples are collected from [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Visual comparison between representative agricultural parcel datasets and GTPBD-MM. Existing datasets mainly [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: More boundary visualization results on representative regions. For each case, we compare the predictions of Ours, [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: More object-level error visualization results on representative regions. For each case, we compare the predictions [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

read the original abstract

Agricultural parcel extraction plays an important role in remote sensing-based agricultural monitoring, supporting parcel surveying, precision management, and ecological assessment. However, existing public benchmarks mainly focus on regular and relatively flat farmland scenes. In contrast, terraced parcels in mountainous regions exhibit stepped terrain, pronounced elevation variation, irregular boundaries, and strong cross-regional heterogeneity, making parcel extraction a more challenging problem that jointly requires visual recognition, semantic discrimination, and terrain-aware geometric understanding. Although recent studies have advanced visual parcel benchmarks and image-text farmland understanding, a unified benchmark for complex terraced parcel extraction under aligned image-text-DEM settings remains absent. To fill this gap, we present GTPBD-MM, the first multimodal benchmark for global terraced parcel extraction. Built upon GTPBD, GTPBD-MM integrates high-resolution optical imagery, structured text descriptions, and DEM data, and supports systematic evaluation under Image-only, Image+Text, and Image+Text+DEM settings. We further propose Elevation and Text guided Terraced parcel network (ETTerra), a multimodal baseline for terraced parcel delineation. Extensive experiments demonstrate that textual semantics and terrain geometry provide complementary cues beyond visual appearance alone, yielding more accurate, coherent, and structurally consistent delineation results in complex terraced scenes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces GTPBD-MM, the first multimodal benchmark for global terraced parcel extraction, integrating high-resolution optical imagery, structured text descriptions, and DEM data built upon the prior GTPBD dataset. It proposes the ETTerra network as a baseline model and claims through experiments that textual semantics and terrain geometry provide complementary cues to visual appearance, resulting in more accurate, coherent, and structurally consistent parcel delineation in complex terraced scenes under Image-only, Image+Text, and Image+Text+DEM settings.

Significance. If the multimodal gains prove robust after proper controls, this work supplies a needed benchmark for challenging non-flat agricultural scenes that existing flat-farmland datasets overlook, and it offers an initial demonstration of how semantic and geometric modalities can improve boundary extraction in remote sensing. The dataset itself could support follow-on research in terrain-aware and text-guided agricultural monitoring.

major comments (3)

[Methods] Methods section on ETTerra: the fusion mechanism (early/late/attention-based) between image, text, and DEM inputs is not specified in sufficient detail to determine whether it preserves fine boundary geometry or risks introducing new artifacts, which directly bears on the complementarity claim.
[Experiments] Experiments section and associated tables: ablation comparisons across modality settings do not report controls that hold total parameter count, model capacity, and optimization schedule fixed, leaving open the possibility that observed gains arise from increased capacity rather than complementary cues from text and DEM.
[Dataset] Dataset construction section: it is not stated whether the structured text descriptions are human-annotated or LLM-generated, nor are quantitative alignment metrics between image-text-DEM triples provided, which is load-bearing for verifying that the multimodal data isolates modality contributions without misalignment bias.

minor comments (2)

[Abstract] Abstract: the phrase 'extensive experiments demonstrate' would be strengthened by referencing specific quantitative tables or metrics (e.g., IoU or boundary F-score deltas) rather than remaining qualitative.
[Introduction] Notation: the acronym ETTerra is introduced without an explicit expansion on first use in the main text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will make the indicated revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods] Methods section on ETTerra: the fusion mechanism (early/late/attention-based) between image, text, and DEM inputs is not specified in sufficient detail to determine whether it preserves fine boundary geometry or risks introducing new artifacts, which directly bears on the complementarity claim.

Authors: We agree that the current description of the fusion mechanism in ETTerra is insufficiently detailed. In the revised manuscript we will expand the Methods section to explicitly describe the fusion strategy (including whether it is early, late, or attention-based), provide the relevant architectural equations or diagrams, and discuss design choices intended to preserve fine boundary geometry. revision: yes
Referee: [Experiments] Experiments section and associated tables: ablation comparisons across modality settings do not report controls that hold total parameter count, model capacity, and optimization schedule fixed, leaving open the possibility that observed gains arise from increased capacity rather than complementary cues from text and DEM.

Authors: This is a valid methodological concern. We will revise the Experiments section and tables to report parameter counts for each setting and to include additional controls that keep backbone capacity and optimization schedule fixed (e.g., by freezing the image encoder and varying only the fusion modules). These controls will be added to isolate the contribution of the text and DEM modalities. revision: yes
Referee: [Dataset] Dataset construction section: it is not stated whether the structured text descriptions are human-annotated or LLM-generated, nor are quantitative alignment metrics between image-text-DEM triples provided, which is load-bearing for verifying that the multimodal data isolates modality contributions without misalignment bias.

Authors: We acknowledge the omission. In the revised Dataset construction section we will explicitly state the annotation procedure used for the structured text descriptions and will add quantitative alignment metrics (e.g., similarity scores or overlap statistics) between the image, text, and DEM triples to demonstrate data quality and minimize concerns about misalignment. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical dataset and baseline evaluation

full rationale

The paper introduces the GTPBD-MM dataset and ETTerra baseline network, with central claims supported by empirical comparisons across Image-only, Image+Text, and Image+Text+DEM settings. No mathematical derivation chain, equations, or predictions exist that reduce by construction to fitted inputs or self-referential definitions. No self-citation load-bearing steps, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation are invoked to justify the complementarity result. The evaluation relies on standard experimental metrics rather than any renaming of known results or self-definitional constructs, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the creation of a new dataset and fusion model; no explicit free parameters are stated in the abstract, but standard computer vision assumptions about data alignment, annotation quality, and multimodal fusion are invoked without independent verification.

axioms (2)

domain assumption Remote sensing imagery, text descriptions, and DEM data can be reliably aligned and annotated for terraced parcels across global regions
Invoked implicitly when presenting the multimodal benchmark as usable for systematic evaluation.
domain assumption Multimodal fusion in semantic segmentation improves boundary coherence in heterogeneous terrain
Underlying the ETTerra baseline and the claim of complementary cues.

invented entities (2)

GTPBD-MM dataset no independent evidence
purpose: Provide the first aligned multimodal benchmark for terraced parcel extraction
Newly constructed resource; no external independent validation cited in abstract.
ETTerra network no independent evidence
purpose: Multimodal baseline model that fuses elevation and text cues for terraced parcel delineation
Proposed architecture whose performance claims depend on internal experiments.

pith-pipeline@v0.9.0 · 5552 in / 1568 out tokens · 42264 ms · 2026-05-10T15:51:20.138423+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 11 canonical work pages · 2 internal anchors

[1]

Hui Cao, Pengjie Tao, Haihong Li, and Jun Shi. 2019. Bundle adjustment of satellite images based on an equivalent geometric sensor model with digital elevation model.ISPRS Journal of Photogrammetry and Remote Sensing156 (2019), 169–183

2019
[2]

Keyan Chen, Chenyang Liu, Bowen Chen, Jiafan Zhang, Zhengxia Zou, and Zhenwei Shi. 2025. Rsrefseg 2: decoupling referring remote sensing image segmentation with foundation models.arXiv preprint arXiv:2507.06231(2025)

work page arXiv 2025
[4]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking Atrous Convolution for Semantic Image Segmentation.arXiv preprint arXiv:1706.05587(2017). https://arxiv.org/abs/1706.05587

work page internal anchor Pith review arXiv 2017
[5]

Yinda Chen, Che Liu, Wei Huang, Sibo Cheng, Rossella Arcucci, and Zhiwei Xiong. 2023. Generative text-guided 3d vision-language pretraining for unified medical image segmentation.arXiv preprint arXiv:2306.04811(2023)

work page arXiv 2023
[6]

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. 2022. Masked-attention mask transformer for universal image segmen- tation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1290–1299

2022
[7]

Robert K Colwell and David C Lees. 2000. The mid-domain effect: geometric constraints on the geography of species richness.Trends in ecology & evolution 15, 2 (2000), 70–76

2000
[8]

Raphaël d’Andrimont, Martin Claverie, Pieter Kempeneers, Davide Muraro, Momchil Yordanov, Devis Peressutti, Matej Batič, and François Waldner. 2023. AI4Boundaries: an open AI-ready dataset to map field boundaries with Sentinel-2 and aerial photography.Earth System Science Data15, 1 (2023), 317–329

2023
[9]

Henghui Ding, Chang Liu, Suchen Wang, and Xudong Jiang. 2021. Vision- language transformer and query generation for referring segmentation. InPro- ceedings of the IEEE/CVF international conference on computer vision. 16321–16330

2021
[10]

Zhe Dong, Yu-Zhe Sun, Tian-Zhu Liu, and Yan-Feng Gu. 2025. Diffris: Enhancing referring remote sensing image segmentation with pre-trained text-to-image diffusion models.Fundamental Research(2025)

2025
[11]

Amine Hadir, Mohamed Adjou, Olga Assainova, Gaëtan Palka, and Marwa Elbouz
[12]

Comparative study of agricultural parcel delineation deep learning methods using satellite images: Validation through parcels complexity.Smart Agricultural Technology10 (2025), 100833

2025
[13]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3

2022
[14]

Pin-Hao Huang, Han-Hung Lee, Hwann-Tzong Chen, and Tyng-Luh Liu. 2021. Text-guided graph neural networks for referring 3d instance segmentation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 1610–1618

2021
[15]

Hannah Kerner, Snehal Chaudhari, Aninda Ghosh, Caleb Robinson, Adeel Ahmad, Eddie Choi, Nathan Jacobs, Chris Holmes, Matthias Mohr, Rahul Dodhia, et al
[16]

InProceedings of the AAAI Conference on Artificial Intelligence, Vol

Fields of the world: A machine learning benchmark dataset for global agricultural field boundary segmentation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 28151–28159
[17]

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything.arXiv:2304.02643(2023)

work page internal anchor Pith review arXiv 2023
[18]

Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, and Jiaya Jia. 2024. Lisa: Reasoning segmentation via large language model. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9579–9589

2024
[19]

Ivano Lauriola, Alberto Lavelli, and Fabio Aiolli. 2022. An introduction to deep learning in natural language processing: Models, techniques, and tools.Neuro- computing470 (2022), 443–456

2022
[20]

Jiepan Li, Yipan Wei, Tiangao Wei, and Wei He. 2024. A comprehensive deep- learning framework for fine-grained farmland mapping from high-resolution images.IEEE Transactions on Geoscience and Remote Sensing63 (2024), 1–15

2024
[21]

Mengmeng Li, Jiang Long, Alfred Stein, and Xiaoqin Wang. 2023. Using a semantic edge-aware multi-task neural network to delineate agricultural parcels from remote sensing images.ISPRS journal of photogrammetry and remote sensing200 (2023), 24–40

2023
[22]

Yifan Li, Fuyou Tian, Miao Zhang, Hongwei Zeng, Shukri Ahmed, Xinli Qin, Yanxu Liu, Lizhe Wang, Runyu Fan, and Bingfang Wu. 2025. A 10-meter global terrace mapping using sentinel-2 imagery and topographic features with deep learning methods and cloud computing platform support.International Journal of Applied Earth Observation and Geoinformation139 (2025), 104528

2025
[23]

Rui Lu, Yingfan Zhang, Qiting Huang, Penghao Zeng, Zhou Shi, and Su Ye
[24]

A refined edge-aware convolutional neural networks for agricultural parcel delineation.International Journal of Applied Earth Observation and Geoinformation 133 (2024), 104084

2024
[25]

Timo Lüddecke and Alexander Ecker. 2022. Image segmentation using text and image prompts. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7086–7096

2022
[26]

Xianping Ma, Xiaokang Zhang, Man-On Pun, and Bo Huang. 2025. A unified framework with multimodal fine-tuning for remote sensing semantic segmenta- tion.IEEE Transactions on Geoscience and Remote Sensing(2025)

2025
[27]

Xianping Ma, Xiaokang Zhang, Man-On Pun, and Ming Liu. 2024. A multilevel multimodal fusion transformer for remote sensing semantic segmentation.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–15

2024
[28]

Giuseppe Modica, Salvatore Praticò, and Salvatore Di Fazio. 2017. Abandonment of traditional terraced landscape: A change detection approach (a case study in Costa Viola, Calabria, Italy).Land Degradation & Development28, 8 (2017), 2608–2622

2017
[29]

Richard J Pike. 1988. The geometric signature: quantifying landslide-terrain types from digital elevation models.Mathematical geology20, 5 (1988), 491–511

1988
[30]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th Inter- national Conference on Machine Learning (Proceedings of Machi...

2021
[31]

Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, and Xiaojie Jin. 2024. Pixellm: Pixel reasoning with large multimodal model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 26374–26383

2024
[32]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolu- tional networks for biomedical image segmentation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241

2015
[33]

Yiqing Shen, Chenjia Li, Fei Xiong, Jeong-O Jeong, Tianpeng Wang, Michael Latman, and Mathias Unberath. 2025. Reasoning segmentation for images and videos: A survey.arXiv preprint arXiv:2505.18816(2025)

work page arXiv 2025
[34]

Antonia Spanò, Giulia Sammartano, Francesca Calcagno Tunin, Sylvie Cerise, and Giulia Possi. 2018. GIS-based detection of terraced landscape heritage: comparative tests using regional DEMs and UAV data.Applied Geomatics10, 2 (2018), 77–97

2018
[35]

Joseph E Spencer and Gary A Hale. 1961. The origin, nature, and distribution of agricultural terracing.Pacific viewpoint2, 1 (1961), 1–40

1961
[36]

Takeo Tadono, Hirotsugu Ishida, Fumiko Oda, Sei Naito, Kazunari Minakawa, and Hiroshi Iwamoto. 2014. Precise global DEM generation by ALOS PRISM. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences2 (2014), 71–76

2014
[37]

Chao Tao, Dandan Zhong, Weiliang Mu, Zhuofei Du, and Haiyang Wu. 2025. A large-scale image–text dataset benchmark for farmland segmentation.Earth System Science Data17, 9 (2025), 4835–4864

2025
[38]

Paolo Tarolli, Davide Rizzo, and Gerardo Brancucci. 2018. Terraced landscapes: Land abandonment, soil degradation, and suitable management. InWorld terraced landscapes: History, environment, quality of life. Springer, 195–210

2018
[39]

Prasad S Thenkabail, Pardhasaradhi G Teluguntla, Jun Xiong, Adam Oliphant, Russell G Congalton, Mutlu Ozdogan, Murali Krishna Gumma, James C Tilton, Chandra Giri, Cristina Milesi, et al. 2021.Global cropland-extent product at 30-m resolution (GCEP30) derived from Landsat satellite time-series data for the year 2015 using multiple machine-learning algorith...

2021
[40]

Ke Tong, Peijun Sun, Yingxue Mei, and Zhangli Sun. 2026. SLFNet: an improved boundary-sensitive multi-tasks deep network for agricultural parcel delineation using high-resolution remotely sensed imagery.International Journal of Digital Earth19, 2 (2026), 2632409

2026
[41]

Junchi Wang and Lei Ke. 2024. Llm-seg: Bridging image segmentation and large language model reasoning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1765–1774

2024
[42]

Xuying Wang, Lei Shu, Ru Han, Fan Yang, Timothy Gordon, Xiaochan Wang, and Hongyu Xu. 2023. A survey of farmland boundary extraction technology based on remote sensing images.Electronics12, 5 (2023), 1156

2023
[43]

Yue Wang, Jie Gao, and Joseph SB Mitchell. 2006. Boundary recognition in sensor networks by topological methods. InProceedings of the 12th annual international conference on Mobile computing and networking. 122–133

2006
[44]

Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, and Lin Ma. 2024. Lasagna: Language-based segmentation assistant for complex queries.arXiv preprint arXiv:2404.08506(2024)

work page arXiv 2024
[45]

Marie Weiss, Frédéric Jacob, and Grgory Duveiller. 2020. Remote sensing for agricultural applications: A meta-review.Remote sensing of environment236 (2020), 111402

2020
[46]

Haiyang Wu, Zhuofei Du, Dandan Zhong, Yuze Wang, and Chao Tao. 2025. FSVLM: A Vision-Language Model for Remote Sensing Farmland Segmentation. IEEE Transactions on Geoscience and Remote Sensing63 (2025), 1–13. doi:10.1109/ TGRS.2025.3532960

work page arXiv 2025
[47]

Honglin Wu, Peng Huang, Min Zhang, Wenlong Tang, and Xinyu Yu. 2023. CMTFNet: CNN and multiscale transformer fusion network for remote-sensing image semantic segmentation.IEEE Transactions on Geoscience and Remote GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Sensing61 (...

2023
[48]

Haiyang Wu, Weiliang Mu, Dandan Zhong, Zhuofei Du, Haifeng Li, and Chao Tao
[49]

FarmSeg_VLM: A farmland remote sensing image segmentation method considering vision-language alignment.ISPRS Journal of Photogrammetry and Remote Sensing225 (2025), 423–439

2025
[50]

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems34 (2021), 12077–12090

2021
[51]

Junyang Xie, Hao Wu, Wenbin Wu, Liang Hong, Lihua He, Qiangyi Yu, Lanfa Liu, Anqi Lin, and Jaturong Som-ard. 2026. A CNN-Transformer Hybrid Network With Boundary Guidance for Mapping Cropland Field Parcels From High-Resolution Remote Sensing Imagery.IEEE Transactions on Geoscience and Remote Sensing64 (2026), 1–22

2026
[52]

Zhenghang Yuan, Lichao Mou, Yuansheng Hua, and Xiao Xiang Zhu. 2024. Rrsis: Referring remote sensing image segmentation.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–12

2024
[53]

Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Mengxuan Hu, Zihan Guan, Sheng Li, and Lan Mu. 2023. Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models.arXiv preprint arXiv:2304.10597(2023)

work page arXiv 2023
[54]

Zhiwei Zhang, Zi Ye, Yibin Wen, Shuai Yuan, Haohuan Fu, Huang Jianxi, and Juepeng Zheng. 2025. GTPBD: A Fine-Grained Global Terraced Parcel and Boundary Dataset. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/ forum?id=A3aV30YGqP

2025
[55]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. InProceedings of the IEEE conference on computer vision and pattern recognition. 2881–2890

2017
[56]

Hang Zhao, Bingfang Wu, Miao Zhang, Jiang Long, Fuyou Tian, Yan Xie, Hongwei Zeng, Zhaoju Zheng, Zonghan Ma, Mingxing Wang, et al. 2025. A large-scale VHR parcel dataset and a novel hierarchical semantic boundary-guided network for agricultural parcel delineation.ISPRS Journal of Photogrammetry and Remote Sensing221 (2025), 1–19

2025
[57]

Juepeng Zheng, Zi Ye, Yibin Wen, Jianxi Huang, Zhiwei Zhang, Qingmei Li, Qiong Hu, Baodong Xu, Lingyuan Zhao, and Haohuan Fu. 2026. A Comprehensive Review of Agricultural Parcel and Boundary Delineation From Remote Sensing Images: Recent progress and future perspectives.IEEE Geoscience and Remote Sensing Magazine(2026), 2–33. doi:10.1109/MGRS.2026.3658493

work page doi:10.1109/mgrs.2026.3658493 2026
[58]

Yu Zhu, Yaozhong Pan, Tangao Hu, and Yao Liu. 2025. A Deep Learning Method for Field Boundary Delineation From Remote Sensing Imagery With High Bound- ary Connectivity.IEEE Transactions on Geoscience and Remote Sensing63 (2025), 1–23. doi:10.1109/TGRS.2025.3628397

work page doi:10.1109/tgrs.2025.3628397 2025
[59]

Yu Zhu, Yaozhong Pan, Dujuan Zhang, Hanyi Wu, and Chuanwu Zhao. 2024. A Deep Learning Method for Cultivated Land Parcels’ (CLPs) Delineation From High-Resolution Remote Sensing Images With High-Generalization Capability. IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–25. doi:10.1109/ TGRS.2024.3425673

work page arXiv 2024