Geometric Coastline Localization using Vision-Language Models
Pith reviewed 2026-06-27 14:08 UTC · model grok-4.3
The pith
Coastline extraction improves when formulated as direct polyline prediction by a vision-language model rather than pixel segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Formulating the task as geometric boundary localization and training a vision-language model to output polylines directly produces better global geometric alignment with reference coastlines than mask-based segmentation approaches, as measured by reduced Hausdorff and Earth Mover's distances on the NZCCD under strict one-pixel boundary supervision.
What carries the argument
CoastlineVLM-7B, a vision-language model on the GeoChat-7B/LLaVA-1.5 architecture that jointly performs presence detection, proxy-type classification, and coastline grounding to predict a polyline.
If this is right
- Geometry-based evaluation metrics should be prioritized over IoU when assessing coastline localization quality.
- Direct polyline output aligns the learning objective with the geomorphic proxies used in coastal monitoring.
- Vision-language models can incorporate semantic reasoning for proxy classification alongside geometric grounding.
- Output representation choice affects how well automated methods match operational coastal change workflows.
Where Pith is reading between the lines
- The same direct-polyline formulation could apply to other linear boundary tasks such as river banks or road edges in remote sensing.
- Adding textual prompts describing expected proxy types might further improve generalization across different coastal environments.
- If the VLM's proxy classification proves robust, it could enable automated proxy selection without manual post-processing steps.
Load-bearing premise
That geometry-based metrics such as Hausdorff distance and Earth Mover's Distance are more appropriate than pixel-overlap metrics for judging operational coastline quality, and that the model's steps generalize beyond the NZCCD training distribution.
What would settle it
Evaluation on an independent coastal dataset from another region where the VLM shows no reduction in Hausdorff or EMD distances, or where IoU better correlates with field-measured change accuracy.
Figures
read the original abstract
Coastline detection in remote sensing imagery is commonly formulated as a pixel-wise segmentation problem, where the final coastline is extracted from a predicted mask through post-processing. This formulation relegates coastline geometry, the primary representation used in coastal change analysis, to a secondary artifact rather than the learning objective. In practice, coastlines are defined by geomorphic proxies such as vegetation lines, dune toes, or cliff edges, rather than an instantaneous land-water boundary often used in pixel-based segmentation approaches. In this work, we revisit coastline extraction from a representation perspective and formulate the task as geometric boundary localization. We use the New Zealand Coastal Change Dataset (NZCCD) and high-resolution aerial imagery from Land Information New Zealand (LINZ) to develop CoastlineVLM-7B, a vision-language model (VLM) built on the GeoChat-7B/LLaVA-1.5 architecture that jointly performs coastline presence detection, proxy-type classification, and coastline grounding. The model directly predicts a coastline as a polyline rather than a dense segmentation mask. We evaluate CoastlineVLM-7B against segmentation baselines under strict one-pixel boundary supervision. Results show that geometry-based metrics are more suitable for assessing coastline localization quality than pixel-overlap metrics such as Intersection over Union (IoU). CoastlineVLM-7B improves global geometric alignment with reference coastlines, reducing Hausdorff distance from 37.74 m to 31.84 m and Earth Mover's Distance from 21.12 m to 17.32 m. These results indicate that output representation is a critical design choice in coastline extraction, and that geometry-oriented learning, combined with the semantic reasoning capabilities of vision-language models, aligns well with how coastlines are defined and evaluated in operational coastal monitoring.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that reformulating coastline detection as direct geometric polyline prediction via a 7B-parameter VLM (CoastlineVLM-7B, built on GeoChat/LLaVA) yields superior global alignment to reference coastlines compared to segmentation baselines, with Hausdorff distance reduced from 37.74 m to 31.84 m and EMD from 21.12 m to 17.32 m on NZCCD/LINZ imagery under one-pixel boundary supervision. It further argues that geometry-based metrics are more suitable than IoU for evaluation and that output representation is a critical design choice, enabled by the VLM's joint detection-classification-grounding capabilities.
Significance. If the central empirical comparison can be made robust, the work provides evidence that aligning the learning objective with polyline geometry (rather than post-processed masks) better matches operational coastal monitoring practices that rely on geomorphic proxies. The concrete metric gains on held-out high-resolution aerial imagery and the use of semantic reasoning in a VLM are strengths that could inform representation choices in remote-sensing tasks.
major comments (3)
- [Experiments] Experiments section: the segmentation baselines are not described with parameter counts, architectures, or training protocols matched to the 7B VLM (including its vision encoder and language-based proxy steps), so the reported Hausdorff/EMD gains cannot be isolated to the polyline formulation versus differences in model scale or pretraining. This directly undermines the claim that output representation is the critical factor.
- [Results] Results section: no statistical significance testing, confidence intervals, or variance estimates accompany the metric improvements (37.74 m to 31.84 m Hausdorff; 21.12 m to 17.32 m EMD), and baseline hyperparameter/training details are absent, weakening the quantitative support for the central claim.
- [Discussion] Discussion section: the assertion that geometry-based metrics are more suitable than pixel-overlap metrics rests on qualitative argument rather than a quantitative validation (e.g., correlation with operational coastal-change metrics or expert utility), which is load-bearing for the recommendation to prefer them.
minor comments (2)
- [Methods] Clarify in the methods how the VLM's polyline output is evaluated under the same 'strict one-pixel boundary supervision' applied to segmentation masks.
- The abstract and results would benefit from explicit statement of the number of test images and any cross-validation procedure used for the held-out evaluation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our results. We respond to each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the segmentation baselines are not described with parameter counts, architectures, or training protocols matched to the 7B VLM (including its vision encoder and language-based proxy steps), so the reported Hausdorff/EMD gains cannot be isolated to the polyline formulation versus differences in model scale or pretraining. This directly undermines the claim that output representation is the critical factor.
Authors: We will expand the Experiments section to fully describe the segmentation baselines, including their architectures (e.g., specific CNN variants), approximate parameter counts, training protocols, hyperparameters, and how they were adapted to the one-pixel boundary supervision on NZCCD/LINZ data. While exact matching of scale and pretraining to the 7B VLM is not feasible given the architectural differences (standard segmentation networks versus a VLM with language-based proxy classification), we will explicitly discuss these differences as a potential confounding factor and clarify that the comparison highlights the benefit of direct polyline prediction enabled by the VLM's joint capabilities. This addresses the isolation concern without overstating the representation effect alone. revision: partial
-
Referee: [Results] Results section: no statistical significance testing, confidence intervals, or variance estimates accompany the metric improvements (37.74 m to 31.84 m Hausdorff; 21.12 m to 17.32 m EMD), and baseline hyperparameter/training details are absent, weakening the quantitative support for the central claim.
Authors: We will add the baseline hyperparameter and training details in the revised Experiments section. For the metric improvements, we will include confidence intervals computed over the test set images and, where computationally feasible, report results from multiple training seeds or cross-validation folds to provide variance estimates. If full statistical significance testing (e.g., paired t-tests) cannot be performed without additional runs, we will note this limitation explicitly while still reporting the observed differences with the available data. revision: yes
-
Referee: [Discussion] Discussion section: the assertion that geometry-based metrics are more suitable than pixel-overlap metrics rests on qualitative argument rather than a quantitative validation (e.g., correlation with operational coastal-change metrics or expert utility), which is load-bearing for the recommendation to prefer them.
Authors: We will strengthen the Discussion by adding references to how operational coastal monitoring (e.g., in NZCCD documentation) prioritizes geomorphic proxies and geometric representations over instantaneous pixel boundaries. We will also attempt a quantitative check by correlating Hausdorff/EMD values with known coastal change indicators in the dataset; if this correlation analysis is limited by data availability, we will acknowledge the primarily qualitative basis of the argument and frame the preference for geometry metrics as aligned with domain practice rather than as a fully validated superiority claim. revision: partial
Circularity Check
No circularity: purely empirical comparison with no derivations or self-referential reductions
full rationale
The paper contains no equations, derivations, or parameter-fitting steps that could reduce to inputs by construction. It reports direct empirical results from training and evaluating CoastlineVLM-7B on held-out NZCCD imagery, measuring Hausdorff and EMD distances against segmentation baselines. The central claim rests on these held-out metric comparisons rather than any self-definitional loop, fitted-input prediction, or self-citation chain. Model architecture is referenced to external GeoChat/LLaVA work with no overlapping authors, and no uniqueness theorems or ansatzes are invoked. This is a standard empirical ML evaluation paper whose results are falsifiable on external data.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Fine-tuned VLMs can reliably perform joint classification and spatial grounding tasks when trained on paired image-text data
- domain assumption One-pixel boundary supervision is sufficient to train and evaluate polyline outputs against reference coastlines
Reference graph
Works this paper leans on
-
[1]
E. H. Boak, I. L. Turner, Shoreline definition and detection: A review, Journal of Coastal Research 21 (4) (2005) 688– 703
2005
-
[2]
E. R. Thieler, E. A. Himmelstoss, J. L. Zichichi, A. Er- gul, Digital shoreline analysis system (dsas) version 4.0—an arcgis extension for calculating shoreline change, Tech. Rep. 2008-1278, U.S. Geological Survey (2009). doi:10.3133/ofr20081278
-
[3]
Learning and Instruction , author =
P. Scala, G. Manno, G. Ciraolo, Coastal dynamics an- alyzer (cda): A qgis plugin for transect based analy- sis of coastal erosion, SoftwareX 28 (2024) 101894. doi:https://doi.org/10.1016/j.softx.2024.101894. URL https://www.sciencedirect.com/science/ article/pii/S2352711024002644
-
[4]
W. Sun, C. Chen, W. Liu, G. Yang, X. Meng, L. Wang, K. Ren, Coastline extraction using remote sensing: A re- view, GIScience & Remote Sensing 60 (1) (2023) 2243671. doi:10.1080/15481603.2023.2243671
-
[5]
S. Toure, O. Diop, K. Kpalma, A. S. Maiga, Shoreline detection using optical remote sensing: A review, ISPRS International Journal of Geo-Information 8 (2) (2019) 75. doi:10.3390/ijgi8020075
-
[6]
Z. Yang, G. Wang, L. Feng, Y . Wang, G. Wang, S. Liang, A transformer model for coastline prediction in weitou bay, china, Remote Sensing 15 (19) (2023) 4771. doi:10.3390/rs15194771
-
[7]
R. Gens, Remote sensing of coastlines: Detec- tion, extraction and monitoring, International Jour- nal of Remote Sensing 31 (7) (2010) 1819–1836. doi:10.1080/01431160902926673
-
[8]
L. Tang, T. Ai, M. Yang, J. Stoter, An adaptive simplifica- tion method for coastlines using bridging skeleton lines, ISPRS International Journal of Geo-Information 13 (5) (2024) 155. doi:10.3390/ijgi13050155
-
[9]
J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440
2015
-
[10]
Ronneberger, P
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Inter- national Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234– 241
2015
-
[11]
L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the Eu- ropean conference on computer vision (ECCV), 2018, pp. 801–818
2018
-
[12]
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, Segformer: Simple and efficient design for seman- tic segmentation with transformers, NeurIPS (2021)
2021
-
[13]
M. Tuck, M. Dickson, E. Ryan, M. Ford, T. Kon- lechner, A national scale coastal change dataset for aotearoa new zealand, Data in Brief 57 (2024) 111104. doi:https://doi.org/10.1016/j.dib.2024.111104
-
[14]
Accessed 10 March 2026 (2026)
Department of Energy, Environment and Climate Action (Victoria), Vcmp sites – shorelines, victorian Coastal Mon- itoring Program (VCMP) shoreline dataset. Accessed 10 March 2026 (2026). URL https://discover.data.vic.gov.au/datas et/vcmp-sites-shorelines
2026
-
[15]
In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, F. S. Khan, Geochat: Grounded large vision- language model for remote sensing, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27831–27840. doi:10.1109/CVPR52733.2024.02629
-
[16]
H. Liu, C. Li, Q. Wu, Y . J. Lee, Visual instruction tuning, Advances in neural information processing systems 36 (2023) 34892–34916
2023
-
[17]
H. G. Barrow, J. M. Tenenbaum, R. C. Bolles, H. C. Wolf, Parametric correspondence and chamfer matching: Two new techniques for image matching, Proceedings of the 5th International Joint Conference on Artificial Intelligence (1977) 659–663
1977
-
[18]
D. P. Huttenlocher, G. A. Klanderman, W. J. Rucklidge, Comparing images using the hausdorffdistance, IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (9) (1993) 850–863. doi:10.1109/34.232073
-
[19]
Eiter, H
T. Eiter, H. Mannila, Computing discrete fréchet distance, Tech. Rep. CD-TR 94/64, Technical University of Vienna (1994). URL https://www.kr.tuwien.ac.at/staff/eite r/et-archive/files/cdtr9464.pdf 10
1994
-
[20]
Y . Rubner, C. Tomasi, L. J. Guibas, The earth mover’s distance as a metric for image retrieval, International Journal of Computer Vision 40 (2) (2000) 99–121. doi:10.1023/A:1026543900054
-
[21]
K. V os, M. D. Harley, K. D. Splinter, J. A. Simmons, I. L. Turner, Coastsat: A google earth engine-enabled python toolkit to extract shorelines from publicly available satellite imagery, Environmental Modelling & Software 122 (2019) 104528
2019
-
[22]
M. S. J. Rogers, M. Bithell, S. M. Brooks, T. Spencer, Vedge_detector: automated coastal vegetation edge detec- tion using a convolutional neural network, International Journal of Remote Sensing 42 (13) (2021) 4805–4835. arXiv:https://doi.org/10.1080/01431161.2021.1897185, doi:10.1080/01431161.2021.1897185. URL https://doi.org/10.1080/01431161.2021. 1897185
-
[23]
Mattyus, W
G. Mattyus, W. Luo, R. Urtasun, Deeproadmapper: Ex- tracting road topology from aerial images, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3438–3446
2017
-
[24]
F. Bastani, S. He, S. Abbar, M. Alizadeh, H. Balakr- ishnan, S. Chawla, S. Madden, D. DeWitt, Roadtracer: Automatic extraction of road networks from aerial im- ages, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 4720–4728. doi:10.1109/CVPR.2018.00496
-
[25]
Acuna, H
D. Acuna, H. Ling, A. Kar, S. Fidler, Efficient interactive annotation of segmentation datasets with polygon-rnn++, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 859–
2018
-
[26]
doi:10.1109/CVPR.2018.00096
-
[27]
Castrejon, J
L. Castrejon, J. Pont-Tuset, J. T. Barron, F. Marques, J. Ma- lik, Polygon-rnn: Annotating object instances with a poly- gon, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5230– 5238
2017
-
[28]
S. Wei, S. Ji, M. Lu, From lines to polygons: Polyg- onal building contour extraction from high-resolution remote sensing imagery, ISPRS Journal of Photogram- metry and Remote Sensing 210 (2024) 107–121. doi:10.1016/j.isprsjprs.2024.02.006
-
[29]
Isikdogan, A
F. Isikdogan, A. C. Bovik, P. Passalacqua, Surface water mapping by deep learning, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (11) (2017) 4909–4918
2017
-
[30]
G. Tetteh, V . Efremov, N. D. Forkert, M. Schneider, J. Kirschke, B. Weber, C. Zimmer, M. Piraud, B. H. Menze, Deepvesselnet: Vessel segmentation, centerline predic- tion, and bifurcation detection in 3-d angiographic vol- umes, Frontiers in Neuroscience V olume 14 - 2020 (2020). doi:10.3389/fnins.2020.592352. URL https://www.frontiersin.org/journals/n eu...
-
[31]
Radford, J
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Con- ference on Machine Learning, V ol. 139 of Proceedings of Machine Learning Re...
2021
-
[32]
W. Zhang, M. Cai, T. Zhang, Y . Zhuang, X. Mao, Earthgpt: A universal multimodal large language model for multi- sensor image comprehension in remote sensing domain, IEEE Transactions on Geoscience and Remote Sensing 62 (2024) 1–20. doi:10.1109/TGRS.2024.3409624
-
[33]
Y . Zhan, Z. Xiong, Y . Yuan, Skyeyegpt: Unifying re- mote sensing vision-language tasks via instruction tun- ing with large language model, ISPRS Journal of Pho- togrammetry and Remote Sensing 221 (2025) 64–77. doi:https://doi.org/10.1016/j.isprsjprs.2025.01.020. URL https://www.sciencedirect.com/science/ article/pii/S0924271625000206
-
[34]
URLhttps://data.linz.govt.nz
Land Information New Zealand, Linz data service: Open geospatial data portal, accessed: 2024 (2024). URLhttps://data.linz.govt.nz
2024
-
[35]
C. Ortega-Adame, M. Gonzalez-Audicana, A. Salinas- Castillo, et al., Swed: A benchmark dataset for seman- tic segmentation of water bodies from sentinel-2 im- agery, Remote Sensing of Environment 273 (2022) 112859. doi:10.1016/j.rse.2022.112859
-
[36]
D. H. Douglas, T. K. Peucker, Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, The Canadian Cartographer 10 (2) (1973) 112–122. doi:10.3138/FM57-6770-U75U-7727. Appendix A. Geometric Distance Metrics This appendix provides the formal definitions of the geomet- ric distance metrics used to evalu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.