Recognition: unknown
GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality
Pith reviewed 2026-05-10 15:51 UTC · model grok-4.3
The pith
Adding text descriptions and elevation data to satellite images improves terraced parcel boundary extraction in complex mountainous terrain.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that textual semantics and terrain geometry supply complementary cues beyond visual appearance, enabling more accurate, coherent, and structurally consistent delineation of terraced parcels under aligned image-text-DEM conditions.
What carries the argument
The Elevation and Text guided Terraced parcel network (ETTerra), a multimodal fusion model that incorporates optical imagery, structured text, and DEM data to guide boundary extraction in stepped terrain.
Load-bearing premise
The collected image-text-DEM data is well aligned and representative of global terraced heterogeneity, and the network fuses the modalities without introducing new biases or artifacts.
What would settle it
A controlled test on held-out terraced regions where models using the full multimodal inputs show no accuracy or coherence gain over strong image-only baselines would falsify the claim of complementary benefits.
Figures
read the original abstract
Agricultural parcel extraction plays an important role in remote sensing-based agricultural monitoring, supporting parcel surveying, precision management, and ecological assessment. However, existing public benchmarks mainly focus on regular and relatively flat farmland scenes. In contrast, terraced parcels in mountainous regions exhibit stepped terrain, pronounced elevation variation, irregular boundaries, and strong cross-regional heterogeneity, making parcel extraction a more challenging problem that jointly requires visual recognition, semantic discrimination, and terrain-aware geometric understanding. Although recent studies have advanced visual parcel benchmarks and image-text farmland understanding, a unified benchmark for complex terraced parcel extraction under aligned image-text-DEM settings remains absent. To fill this gap, we present GTPBD-MM, the first multimodal benchmark for global terraced parcel extraction. Built upon GTPBD, GTPBD-MM integrates high-resolution optical imagery, structured text descriptions, and DEM data, and supports systematic evaluation under Image-only, Image+Text, and Image+Text+DEM settings. We further propose Elevation and Text guided Terraced parcel network (ETTerra), a multimodal baseline for terraced parcel delineation. Extensive experiments demonstrate that textual semantics and terrain geometry provide complementary cues beyond visual appearance alone, yielding more accurate, coherent, and structurally consistent delineation results in complex terraced scenes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GTPBD-MM, the first multimodal benchmark for global terraced parcel extraction, integrating high-resolution optical imagery, structured text descriptions, and DEM data built upon the prior GTPBD dataset. It proposes the ETTerra network as a baseline model and claims through experiments that textual semantics and terrain geometry provide complementary cues to visual appearance, resulting in more accurate, coherent, and structurally consistent parcel delineation in complex terraced scenes under Image-only, Image+Text, and Image+Text+DEM settings.
Significance. If the multimodal gains prove robust after proper controls, this work supplies a needed benchmark for challenging non-flat agricultural scenes that existing flat-farmland datasets overlook, and it offers an initial demonstration of how semantic and geometric modalities can improve boundary extraction in remote sensing. The dataset itself could support follow-on research in terrain-aware and text-guided agricultural monitoring.
major comments (3)
- [Methods] Methods section on ETTerra: the fusion mechanism (early/late/attention-based) between image, text, and DEM inputs is not specified in sufficient detail to determine whether it preserves fine boundary geometry or risks introducing new artifacts, which directly bears on the complementarity claim.
- [Experiments] Experiments section and associated tables: ablation comparisons across modality settings do not report controls that hold total parameter count, model capacity, and optimization schedule fixed, leaving open the possibility that observed gains arise from increased capacity rather than complementary cues from text and DEM.
- [Dataset] Dataset construction section: it is not stated whether the structured text descriptions are human-annotated or LLM-generated, nor are quantitative alignment metrics between image-text-DEM triples provided, which is load-bearing for verifying that the multimodal data isolates modality contributions without misalignment bias.
minor comments (2)
- [Abstract] Abstract: the phrase 'extensive experiments demonstrate' would be strengthened by referencing specific quantitative tables or metrics (e.g., IoU or boundary F-score deltas) rather than remaining qualitative.
- [Introduction] Notation: the acronym ETTerra is introduced without an explicit expansion on first use in the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will make the indicated revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods section on ETTerra: the fusion mechanism (early/late/attention-based) between image, text, and DEM inputs is not specified in sufficient detail to determine whether it preserves fine boundary geometry or risks introducing new artifacts, which directly bears on the complementarity claim.
Authors: We agree that the current description of the fusion mechanism in ETTerra is insufficiently detailed. In the revised manuscript we will expand the Methods section to explicitly describe the fusion strategy (including whether it is early, late, or attention-based), provide the relevant architectural equations or diagrams, and discuss design choices intended to preserve fine boundary geometry. revision: yes
-
Referee: [Experiments] Experiments section and associated tables: ablation comparisons across modality settings do not report controls that hold total parameter count, model capacity, and optimization schedule fixed, leaving open the possibility that observed gains arise from increased capacity rather than complementary cues from text and DEM.
Authors: This is a valid methodological concern. We will revise the Experiments section and tables to report parameter counts for each setting and to include additional controls that keep backbone capacity and optimization schedule fixed (e.g., by freezing the image encoder and varying only the fusion modules). These controls will be added to isolate the contribution of the text and DEM modalities. revision: yes
-
Referee: [Dataset] Dataset construction section: it is not stated whether the structured text descriptions are human-annotated or LLM-generated, nor are quantitative alignment metrics between image-text-DEM triples provided, which is load-bearing for verifying that the multimodal data isolates modality contributions without misalignment bias.
Authors: We acknowledge the omission. In the revised Dataset construction section we will explicitly state the annotation procedure used for the structured text descriptions and will add quantitative alignment metrics (e.g., similarity scores or overlap statistics) between the image, text, and DEM triples to demonstrate data quality and minimize concerns about misalignment. revision: yes
Circularity Check
No significant circularity; empirical dataset and baseline evaluation
full rationale
The paper introduces the GTPBD-MM dataset and ETTerra baseline network, with central claims supported by empirical comparisons across Image-only, Image+Text, and Image+Text+DEM settings. No mathematical derivation chain, equations, or predictions exist that reduce by construction to fitted inputs or self-referential definitions. No self-citation load-bearing steps, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation are invoked to justify the complementarity result. The evaluation relies on standard experimental metrics rather than any renaming of known results or self-definitional constructs, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Remote sensing imagery, text descriptions, and DEM data can be reliably aligned and annotated for terraced parcels across global regions
- domain assumption Multimodal fusion in semantic segmentation improves boundary coherence in heterogeneous terrain
invented entities (2)
-
GTPBD-MM dataset
no independent evidence
-
ETTerra network
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Hui Cao, Pengjie Tao, Haihong Li, and Jun Shi. 2019. Bundle adjustment of satellite images based on an equivalent geometric sensor model with digital elevation model.ISPRS Journal of Photogrammetry and Remote Sensing156 (2019), 169–183
2019
- [2]
-
[4]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking Atrous Convolution for Semantic Image Segmentation.arXiv preprint arXiv:1706.05587(2017). https://arxiv.org/abs/1706.05587
work page internal anchor Pith review arXiv 2017
- [5]
-
[6]
Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. 2022. Masked-attention mask transformer for universal image segmen- tation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1290–1299
2022
-
[7]
Robert K Colwell and David C Lees. 2000. The mid-domain effect: geometric constraints on the geography of species richness.Trends in ecology & evolution 15, 2 (2000), 70–76
2000
-
[8]
Raphaël d’Andrimont, Martin Claverie, Pieter Kempeneers, Davide Muraro, Momchil Yordanov, Devis Peressutti, Matej Batič, and François Waldner. 2023. AI4Boundaries: an open AI-ready dataset to map field boundaries with Sentinel-2 and aerial photography.Earth System Science Data15, 1 (2023), 317–329
2023
-
[9]
Henghui Ding, Chang Liu, Suchen Wang, and Xudong Jiang. 2021. Vision- language transformer and query generation for referring segmentation. InPro- ceedings of the IEEE/CVF international conference on computer vision. 16321–16330
2021
-
[10]
Zhe Dong, Yu-Zhe Sun, Tian-Zhu Liu, and Yan-Feng Gu. 2025. Diffris: Enhancing referring remote sensing image segmentation with pre-trained text-to-image diffusion models.Fundamental Research(2025)
2025
-
[11]
Amine Hadir, Mohamed Adjou, Olga Assainova, Gaëtan Palka, and Marwa Elbouz
-
[12]
Comparative study of agricultural parcel delineation deep learning methods using satellite images: Validation through parcels complexity.Smart Agricultural Technology10 (2025), 100833
2025
-
[13]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3
2022
-
[14]
Pin-Hao Huang, Han-Hung Lee, Hwann-Tzong Chen, and Tyng-Luh Liu. 2021. Text-guided graph neural networks for referring 3d instance segmentation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 1610–1618
2021
-
[15]
Hannah Kerner, Snehal Chaudhari, Aninda Ghosh, Caleb Robinson, Adeel Ahmad, Eddie Choi, Nathan Jacobs, Chris Holmes, Matthias Mohr, Rahul Dodhia, et al
-
[16]
InProceedings of the AAAI Conference on Artificial Intelligence, Vol
Fields of the world: A machine learning benchmark dataset for global agricultural field boundary segmentation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 28151–28159
-
[17]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything.arXiv:2304.02643(2023)
work page internal anchor Pith review arXiv 2023
-
[18]
Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, and Jiaya Jia. 2024. Lisa: Reasoning segmentation via large language model. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9579–9589
2024
-
[19]
Ivano Lauriola, Alberto Lavelli, and Fabio Aiolli. 2022. An introduction to deep learning in natural language processing: Models, techniques, and tools.Neuro- computing470 (2022), 443–456
2022
-
[20]
Jiepan Li, Yipan Wei, Tiangao Wei, and Wei He. 2024. A comprehensive deep- learning framework for fine-grained farmland mapping from high-resolution images.IEEE Transactions on Geoscience and Remote Sensing63 (2024), 1–15
2024
-
[21]
Mengmeng Li, Jiang Long, Alfred Stein, and Xiaoqin Wang. 2023. Using a semantic edge-aware multi-task neural network to delineate agricultural parcels from remote sensing images.ISPRS journal of photogrammetry and remote sensing200 (2023), 24–40
2023
-
[22]
Yifan Li, Fuyou Tian, Miao Zhang, Hongwei Zeng, Shukri Ahmed, Xinli Qin, Yanxu Liu, Lizhe Wang, Runyu Fan, and Bingfang Wu. 2025. A 10-meter global terrace mapping using sentinel-2 imagery and topographic features with deep learning methods and cloud computing platform support.International Journal of Applied Earth Observation and Geoinformation139 (2025), 104528
2025
-
[23]
Rui Lu, Yingfan Zhang, Qiting Huang, Penghao Zeng, Zhou Shi, and Su Ye
-
[24]
A refined edge-aware convolutional neural networks for agricultural parcel delineation.International Journal of Applied Earth Observation and Geoinformation 133 (2024), 104084
2024
-
[25]
Timo Lüddecke and Alexander Ecker. 2022. Image segmentation using text and image prompts. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7086–7096
2022
-
[26]
Xianping Ma, Xiaokang Zhang, Man-On Pun, and Bo Huang. 2025. A unified framework with multimodal fine-tuning for remote sensing semantic segmenta- tion.IEEE Transactions on Geoscience and Remote Sensing(2025)
2025
-
[27]
Xianping Ma, Xiaokang Zhang, Man-On Pun, and Ming Liu. 2024. A multilevel multimodal fusion transformer for remote sensing semantic segmentation.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–15
2024
-
[28]
Giuseppe Modica, Salvatore Praticò, and Salvatore Di Fazio. 2017. Abandonment of traditional terraced landscape: A change detection approach (a case study in Costa Viola, Calabria, Italy).Land Degradation & Development28, 8 (2017), 2608–2622
2017
-
[29]
Richard J Pike. 1988. The geometric signature: quantifying landslide-terrain types from digital elevation models.Mathematical geology20, 5 (1988), 491–511
1988
-
[30]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th Inter- national Conference on Machine Learning (Proceedings of Machi...
2021
-
[31]
Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, and Xiaojie Jin. 2024. Pixellm: Pixel reasoning with large multimodal model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 26374–26383
2024
-
[32]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolu- tional networks for biomedical image segmentation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241
2015
- [33]
-
[34]
Antonia Spanò, Giulia Sammartano, Francesca Calcagno Tunin, Sylvie Cerise, and Giulia Possi. 2018. GIS-based detection of terraced landscape heritage: comparative tests using regional DEMs and UAV data.Applied Geomatics10, 2 (2018), 77–97
2018
-
[35]
Joseph E Spencer and Gary A Hale. 1961. The origin, nature, and distribution of agricultural terracing.Pacific viewpoint2, 1 (1961), 1–40
1961
-
[36]
Takeo Tadono, Hirotsugu Ishida, Fumiko Oda, Sei Naito, Kazunari Minakawa, and Hiroshi Iwamoto. 2014. Precise global DEM generation by ALOS PRISM. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences2 (2014), 71–76
2014
-
[37]
Chao Tao, Dandan Zhong, Weiliang Mu, Zhuofei Du, and Haiyang Wu. 2025. A large-scale image–text dataset benchmark for farmland segmentation.Earth System Science Data17, 9 (2025), 4835–4864
2025
-
[38]
Paolo Tarolli, Davide Rizzo, and Gerardo Brancucci. 2018. Terraced landscapes: Land abandonment, soil degradation, and suitable management. InWorld terraced landscapes: History, environment, quality of life. Springer, 195–210
2018
-
[39]
Prasad S Thenkabail, Pardhasaradhi G Teluguntla, Jun Xiong, Adam Oliphant, Russell G Congalton, Mutlu Ozdogan, Murali Krishna Gumma, James C Tilton, Chandra Giri, Cristina Milesi, et al. 2021.Global cropland-extent product at 30-m resolution (GCEP30) derived from Landsat satellite time-series data for the year 2015 using multiple machine-learning algorith...
2021
-
[40]
Ke Tong, Peijun Sun, Yingxue Mei, and Zhangli Sun. 2026. SLFNet: an improved boundary-sensitive multi-tasks deep network for agricultural parcel delineation using high-resolution remotely sensed imagery.International Journal of Digital Earth19, 2 (2026), 2632409
2026
-
[41]
Junchi Wang and Lei Ke. 2024. Llm-seg: Bridging image segmentation and large language model reasoning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1765–1774
2024
-
[42]
Xuying Wang, Lei Shu, Ru Han, Fan Yang, Timothy Gordon, Xiaochan Wang, and Hongyu Xu. 2023. A survey of farmland boundary extraction technology based on remote sensing images.Electronics12, 5 (2023), 1156
2023
-
[43]
Yue Wang, Jie Gao, and Joseph SB Mitchell. 2006. Boundary recognition in sensor networks by topological methods. InProceedings of the 12th annual international conference on Mobile computing and networking. 122–133
2006
- [44]
-
[45]
Marie Weiss, Frédéric Jacob, and Grgory Duveiller. 2020. Remote sensing for agricultural applications: A meta-review.Remote sensing of environment236 (2020), 111402
2020
- [46]
-
[47]
Honglin Wu, Peng Huang, Min Zhang, Wenlong Tang, and Xinyu Yu. 2023. CMTFNet: CNN and multiscale transformer fusion network for remote-sensing image semantic segmentation.IEEE Transactions on Geoscience and Remote GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Sensing61 (...
2023
-
[48]
Haiyang Wu, Weiliang Mu, Dandan Zhong, Zhuofei Du, Haifeng Li, and Chao Tao
-
[49]
FarmSeg_VLM: A farmland remote sensing image segmentation method considering vision-language alignment.ISPRS Journal of Photogrammetry and Remote Sensing225 (2025), 423–439
2025
-
[50]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems34 (2021), 12077–12090
2021
-
[51]
Junyang Xie, Hao Wu, Wenbin Wu, Liang Hong, Lihua He, Qiangyi Yu, Lanfa Liu, Anqi Lin, and Jaturong Som-ard. 2026. A CNN-Transformer Hybrid Network With Boundary Guidance for Mapping Cropland Field Parcels From High-Resolution Remote Sensing Imagery.IEEE Transactions on Geoscience and Remote Sensing64 (2026), 1–22
2026
-
[52]
Zhenghang Yuan, Lichao Mou, Yuansheng Hua, and Xiao Xiang Zhu. 2024. Rrsis: Referring remote sensing image segmentation.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–12
2024
- [53]
-
[54]
Zhiwei Zhang, Zi Ye, Yibin Wen, Shuai Yuan, Haohuan Fu, Huang Jianxi, and Juepeng Zheng. 2025. GTPBD: A Fine-Grained Global Terraced Parcel and Boundary Dataset. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/ forum?id=A3aV30YGqP
2025
-
[55]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. InProceedings of the IEEE conference on computer vision and pattern recognition. 2881–2890
2017
-
[56]
Hang Zhao, Bingfang Wu, Miao Zhang, Jiang Long, Fuyou Tian, Yan Xie, Hongwei Zeng, Zhaoju Zheng, Zonghan Ma, Mingxing Wang, et al. 2025. A large-scale VHR parcel dataset and a novel hierarchical semantic boundary-guided network for agricultural parcel delineation.ISPRS Journal of Photogrammetry and Remote Sensing221 (2025), 1–19
2025
-
[57]
Juepeng Zheng, Zi Ye, Yibin Wen, Jianxi Huang, Zhiwei Zhang, Qingmei Li, Qiong Hu, Baodong Xu, Lingyuan Zhao, and Haohuan Fu. 2026. A Comprehensive Review of Agricultural Parcel and Boundary Delineation From Remote Sensing Images: Recent progress and future perspectives.IEEE Geoscience and Remote Sensing Magazine(2026), 2–33. doi:10.1109/MGRS.2026.3658493
-
[58]
Yu Zhu, Yaozhong Pan, Tangao Hu, and Yao Liu. 2025. A Deep Learning Method for Field Boundary Delineation From Remote Sensing Imagery With High Bound- ary Connectivity.IEEE Transactions on Geoscience and Remote Sensing63 (2025), 1–23. doi:10.1109/TGRS.2025.3628397
-
[59]
Yu Zhu, Yaozhong Pan, Dujuan Zhang, Hanyi Wu, and Chuanwu Zhao. 2024. A Deep Learning Method for Cultivated Land Parcels’ (CLPs) Delineation From High-Resolution Remote Sensing Images With High-Generalization Capability. IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–25. doi:10.1109/ TGRS.2024.3425673
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.