GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair

Rongbo Xiao; Yihan Zhang; Yinhao Xiao

arxiv: 2605.00782 · v1 · submitted 2026-05-01 · 💻 cs.SE · cs.AI

GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair

Yinhao Xiao , Rongbo Xiao , Yihan Zhang This is my paper

Pith reviewed 2026-05-09 19:25 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords GISLLMspatial analysiscode verificationrepair frameworkgeospatial contractsPython workflowscorrectness evaluation

0 comments

The pith

GeoContra enforces geospatial contracts on LLM-generated GIS code to verify and repair spatial analysis, raising correctness by an average of 26.6 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GeoContra, a framework that turns fluent but often invalid LLM-produced Python GIS scripts into reliable ones by defining each task as an executable geospatial contract. These contracts specify natural-language questions, data schemas, CRS metadata, spatial predicates, topology rules, metrics, required operations, and forbidden shortcuts. Generated programs face static rule inspection, runtime validation, and semantic verification, with any violations sent into a bounded repair loop that iterates until the code complies. On 7,079 real tasks spanning 15 Boston zones and 9 task families, the approach lifts spatial correctness from 47.6 percent to 77.5 percent on one closed model and delivers a 26.6 percent average gain across 11 open models. The work addresses the problem that executable GIS code can still produce geographically impossible results such as negative travel times or mismatched coordinate systems.

Core claim

GeoContra represents each geospatial task as an executable contract that encodes natural-language questions, schemas, CRS metadata, expected outputs, spatial predicates, topology constraints, metrics, required operations, and forbidden shortcuts; generated programs then undergo static rule inspection, runtime validation, and semantic verification, with violations fed into a bounded repair loop that produces corrected code.

What carries the argument

The geospatial contract, which encodes task specifications and geographic rules to drive verification and guide the repair loop.

If this is right

LLM-generated GIS scripts that pass the contract checks satisfy the encoded geographic rules and avoid common invalid outputs such as negative travel times or CRS violations.
The verification and repair process applies uniformly to both closed and open-source models, producing measurable correctness gains on thousands of tasks.
Fluent code production can be converted into verifiable spatial analysis by catching missing predicates and brittle output casts before execution.
The bounded repair loop systematically corrects violations without requiring manual intervention for each error.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The contract-based method could be adapted to other rule-heavy domains such as financial modeling or sensor data pipelines where domain-specific invariants must be enforced.
Automatically deriving or learning contract components from existing GIS datasets might reduce the manual effort needed to define new task families.
Widespread adoption would shift GIS analyst workflows toward specifying contracts rather than writing or debugging code directly.
The observed gains suggest that iterative feedback loops can partially compensate for LLMs' persistent weaknesses in maintaining spatial consistency.

Load-bearing premise

The geospatial contracts are assumed to comprehensively capture all relevant geographic rules, topology constraints, and plausibility checks so that detected violations match real-world invalidity and the repair loop can always produce correct programs.

What would settle it

A case in which a program passes every static, runtime, and semantic check yet produces an output that violates an unencoded geographic constraint, such as an impossible spatial relationship in a new city not covered by the Boston test set.

Figures

Figures reproduced from arXiv: 2605.00782 by Rongbo Xiao, Yihan Zhang, Yinhao Xiao.

**Figure 1.** Figure 1: Overview of GeoContra. The pipeline begins with a natural-language GIS task and view at source ↗

**Figure 2.** Figure 2: Detailed verification and repair engine. The static checker inspects AST structure, view at source ↗

**Figure 3.** Figure 3: Spatial-correctness gains by task family and closed-model family. Each cell is Geo view at source ↗

**Figure 4.** Figure 4: Open-model spatial correctness. Each model has 300 LLM-only and 300 GeoContra view at source ↗

**Figure 5.** Figure 5: Average final violations per task after the last generation or repair round. view at source ↗

read the original abstract

Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS systems generate fluent scripts but rarely enforce these geographic rules at scale. We present GeoContra, a verification and repair framework for LLM-driven Python GIS workflows. It represents each task as an executable geospatial contract-including natural-language questions, schemas, CRS metadata, expected outputs, spatial predicates, topology, metrics, required operations, and forbidden shortcuts. Generated programs undergo static rule inspection, runtime validation, and semantic verification, with violations fed back into a bounded repair loop. Evaluated on 7,079 real geospatial tasks across 15 Boston-area zones, 9 task families, and 11 open-source models (600 runs each), GeoContra improves spatial correctness on closed models from 47.6% to 77.5% for DeepSeek-V4 and from 57.7% to 81.5% for Kimi-K2.5. Across 11 open models, average correctness rises by 26.6%. GeoContra turns fluent code production into verifiable spatial analysis, catching negative travel times, CRS/field-schema violations, missing predicates, and brittle output casts that otherwise yield executable but geographically invalid results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GeoContra gets real lifts in LLM GIS code correctness via contracts and repair, but the gains rest on contracts whose completeness isn't independently checked.

read the letter

GeoContra wraps GIS tasks in executable contracts that bundle natural-language questions, schemas, CRS info, spatial predicates, topology rules, and forbidden shortcuts. Generated Python then runs through static inspection, runtime checks, and semantic verification, with failures fed into a bounded repair loop. On 7079 tasks spanning 15 Boston zones and 9 families, it lifts average correctness by 26.6% across 11 models, with bigger jumps on closed models like DeepSeek-V4 and Kimi-K2.5. The scale and the concrete error types caught (bad CRS casts, negative travel times, missing predicates) are the strongest parts here. The evaluation setup with hundreds of runs per model gives the numbers some weight, and the idea of grounding repair in geographic semantics rather than generic syntax is a clear step beyond plain prompting or post-hoc linting. The main soft spot is the contracts. The framework assumes they capture the relevant geographic constraints for these tasks, yet the abstract and reported results give no sign of external validation against OGC/ISO standards or expert review. If a contract omits a rule, programs that pass all stages can still be executable but wrong in practice. That makes the correctness metric relative to the contracts rather than absolute geographic validity. Task selection criteria and how ground truth was set also need more detail to judge reproducibility. This is aimed at people building or using LLM tools for spatial analysis in planning or environmental work. The empirical footprint is large enough and the engineering is concrete enough that it deserves a serious referee rather than a desk reject, though reviewers will press on contract construction and whether the repair loop introduces new geographic errors.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces GeoContra, a verification and repair framework for LLM-generated Python GIS code. Each task is represented as an executable geospatial contract (natural-language question, schema, CRS metadata, spatial predicates, topology, metrics, required operations, and forbidden shortcuts). Generated programs undergo static rule inspection, runtime validation, and semantic verification, with violations fed into a bounded repair loop. On 7,079 real geospatial tasks spanning 15 Boston-area zones, 9 task families, and 11 models (600 runs each), the framework raises spatial correctness from 47.6% to 77.5% for DeepSeek-V4, from 57.7% to 81.5% for Kimi-K2.5, and by 26.6% on average across open models.

Significance. If the contracts comprehensively encode geographic constraints and the evaluation is robust, GeoContra offers a practical path to reliable LLM-driven spatial analysis by catching executable but invalid outputs (e.g., CRS violations, negative travel times, missing predicates). The large-scale empirical evaluation on real-world tasks is a clear strength, providing concrete, reproducible evidence of improvement that could influence both GIS software engineering and LLM application design.

major comments (1)

[Abstract and Evaluation] Abstract and Evaluation description: The central claim that GeoContra improves true spatial correctness rests on the assumption that the defined geospatial contracts comprehensively capture all relevant geographic rules, topology constraints, and plausibility checks for the 9 task families and 15 zones. No details are given on contract derivation, independent validation against OGC/ISO standards, or expert review, nor on whether the repair loop can introduce new geographic errors. This makes the reported 26.6% average lift dependent on an untested completeness assumption; an error analysis or external validation of contract coverage is required to confirm that gains reflect geographic validity rather than contract compliance.

minor comments (2)

Clarify the exact criteria used to select the 7,079 tasks and how ground-truth correctness was independently established.
Specify the 11 models (including whether DeepSeek-V4 and Kimi-K2.5 are treated as closed or open) and list the 9 task families with brief definitions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the scale and reproducibility of our evaluation. We address the major comment below and commit to targeted revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: The central claim that GeoContra improves true spatial correctness rests on the assumption that the defined geospatial contracts comprehensively capture all relevant geographic rules, topology constraints, and plausibility checks for the 9 task families and 15 zones. No details are given on contract derivation, independent validation against OGC/ISO standards, or expert review, nor on whether the repair loop can introduce new geographic errors. This makes the reported 26.6% average lift dependent on an untested completeness assumption; an error analysis or external validation of contract coverage is required to confirm that gains reflect geographic validity rather than contract compliance.

Authors: We agree that contract completeness is central to interpreting the reported gains. Contracts were derived directly from the natural-language task descriptions, required outputs, schemas, and CRS metadata supplied with each of the 7,079 real tasks; for every task family we encoded the minimal set of spatial predicates, topology relations, metric constraints, and forbidden operations needed to match the intended geographic semantics. We will add a new subsection (Section 3.2) that documents this derivation process, including one concrete example per task family and a mapping of each contract element to the corresponding OGC Simple Features and ISO 19107 concepts. While we did not commission an external expert audit, the contracts were cross-validated against the ground-truth outputs and against common GIS failure modes (CRS mismatches, negative travel times, missing topology predicates) that the evaluation explicitly measures. We will also insert an error-analysis subsection (Section 5.4) that tabulates violation categories before and after each repair iteration; because the repair loop is bounded and re-executes the full static-runtime-semantic suite after every edit, new geographic errors are rejected rather than accepted. These additions will make explicit that the 26.6 % lift corresponds to elimination of the very executable-but-invalid outputs the contracts were designed to catch. revision: partial

Circularity Check

0 steps flagged

Empirical engineering framework with no derivation chain or self-referential reductions

full rationale

The paper describes a verification-and-repair system for LLM-generated GIS code, evaluated empirically on 7,079 independent real-world tasks drawn from Boston zones and task families. No equations, fitted parameters, or mathematical predictions appear; correctness gains are measured directly against the same contract-based checks used in the repair loop, but the evaluation tasks and models are external to any internal fitting. No self-citation is invoked as a load-bearing uniqueness theorem or ansatz, and the framework's contract definitions are presented as engineering choices rather than derived from prior results by the same authors. The reported improvements therefore rest on observable pass/fail rates on held-out tasks rather than any reduction to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that a finite set of contract predicates can capture geographic validity and that the repair loop terminates with valid code. No numerical free parameters are mentioned; the main invented entity is the geospatial contract itself.

axioms (1)

domain assumption Geospatial contracts can be written to fully specify required coordinate semantics, topology, units, and plausibility constraints for a given task
Invoked when the framework claims to catch all violations that produce geographically invalid results.

invented entities (1)

executable geospatial contract no independent evidence
purpose: Structured representation that bundles natural-language questions, schemas, CRS metadata, spatial predicates, and forbidden operations for verification
New artifact introduced by the paper to enable the verification and repair pipeline.

pith-pipeline@v0.9.0 · 5522 in / 1443 out tokens · 36082 ms · 2026-05-09T19:25:41.905608+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Janowicz, K., Gao, S., Mai, G., Hu, Y., and Zhu, R. (2025). GeoFM: How will geo- foundation models reshape spatial data science and GeoAI?International Journal of Geo- graphical Information Science

work page 2025
[2]

Nature Machine Intelligence. (2025). Towards responsible geospatial foundation models. Nature Machine Intelligence, 7, 395

work page 2025
[3]

SkySense++: A multi-modal remote sensing foundation model to- wards universal interpretation for Earth observation imagery.Nature Machine Intelligence, 7, 836–852

Wu, Z.,et al.(2025). SkySense++: A multi-modal remote sensing foundation model to- wards universal interpretation for Earth observation imagery.Nature Machine Intelligence, 7, 836–852

work page 2025
[4]

Wei, L., Li, G., and Gao, S. (2025). GeoTool-GPT: A knowledge-based question-answering framework involving geospatial analysis tools.International Journal of Geographical Infor- mation Science, 39(3), 620–650. 18

work page 2025
[5]

Ji, Y., Gao, S., Nie, Y., and Majic, I. (2025). Revealing the impact of cross-domain knowl- edge on LLMs in understanding topological spatial relations in vector data.International Journal of Geographical Information Science

work page 2025
[6]

GeoCogent: An LLM-based agent for geospatial code generation

Hou, S.,et al.(2025). GeoCogent: An LLM-based agent for geospatial code generation. International Journal of Geographical Information Science

work page 2025
[7]

GeoAgent: A hierarchical LLM-based multi-agent architecture for autonomous spatial analysis.International Journal of Geographical Information Science

Lin, X.,et al.(2026). GeoAgent: A hierarchical LLM-based multi-agent architecture for autonomous spatial analysis.International Journal of Geographical Information Science

work page 2026
[8]

K2: A foundation language model for geoscience knowledge under- standing and utilization.Proceedings of the ACM Web Conference / WSDM Companion

Deng, C.,et al.(2024). K2: A foundation language model for geoscience knowledge under- standing and utilization.Proceedings of the ACM Web Conference / WSDM Companion

work page 2024
[9]

BB-GeoGPT: A framework for learning a large language model for geographic information science.Information Processing & Management, 61(5), 103808

Zhang, Y.,et al.(2024). BB-GeoGPT: A framework for learning a large language model for geographic information science.Information Processing & Management, 61(5), 103808

work page 2024
[10]

ChatGeoAI: Enabling geospatial analysis for the public through natural language, with large language models.ISPRS International Journal of Geo-Information, 13(12), 438

Mansourian, A.,et al.(2024). ChatGeoAI: Enabling geospatial analysis for the public through natural language, with large language models.ISPRS International Journal of Geo-Information, 13(12), 438

work page 2024
[11]

GeoGPT: Understanding and processing geospatial tasks through an autonomous GPT.International Journal of Applied Earth Observation and Geoinforma- tion, 131, 103976

Zhang, Y.,et al.(2024). GeoGPT: Understanding and processing geospatial tasks through an autonomous GPT.International Journal of Applied Earth Observation and Geoinforma- tion, 131, 103976

work page 2024
[12]

MapGPT: An autonomous framework for mapping by integrating large language models and cartographic tools.Cartography and Geographic Information Science

Zhang, Y.,et al.(2024). MapGPT: An autonomous framework for mapping by integrating large language models and cartographic tools.Cartography and Geographic Information Science

work page 2024
[13]

A flood knowledge-constrained large language model interactable with GIS: When AI meets flooding.International Journal of Geographical Information Science, 38(11), 2180–2205

Zhu, R.,et al.(2024). A flood knowledge-constrained large language model interactable with GIS: When AI meets flooding.International Journal of Geographical Information Science, 38(11), 2180–2205

work page 2024
[14]

Evaluation of code LLMs on geospatial code generation.Proceed- ings of the ACM SIGSPATIAL International Workshop on Geospatial Artificial Intelligence

Gramacki, P.,et al.(2024). Evaluation of code LLMs on geospatial code generation.Proceed- ings of the ACM SIGSPATIAL International Workshop on Geospatial Artificial Intelligence

work page 2024
[15]

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban- AI

Zhang, W., andGao, S.(2024).AutomatinggeospatialanalysisworkflowsusingChatGPT-4. Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban- AI

work page 2024
[16]

GeoCode-GPT: A large language model for geospatial code generation tasks.International Journal of Applied Earth Observation and Geoinformation, 141, 104456

Hou, S.,et al.(2025). GeoCode-GPT: A large language model for geospatial code generation tasks.International Journal of Applied Earth Observation and Geoinformation, 141, 104456

work page 2025
[17]

Can large language models generate geospatial code?Geo-spatial Information Science

Hou, S.,et al.(2025). Can large language models generate geospatial code?Geo-spatial Information Science

work page 2025
[18]

AutoGEEval++: A multi-level and multi-geospatial-modality auto- mated benchmark for Google Earth Engine code generation.Geo-spatial Information Sci- ence

Wu, C.,et al.(2025). AutoGEEval++: A multi-level and multi-geospatial-modality auto- mated benchmark for Google Earth Engine code generation.Geo-spatial Information Sci- ence. 19

work page 2025

[1] [1]

Janowicz, K., Gao, S., Mai, G., Hu, Y., and Zhu, R. (2025). GeoFM: How will geo- foundation models reshape spatial data science and GeoAI?International Journal of Geo- graphical Information Science

work page 2025

[2] [2]

Nature Machine Intelligence. (2025). Towards responsible geospatial foundation models. Nature Machine Intelligence, 7, 395

work page 2025

[3] [3]

SkySense++: A multi-modal remote sensing foundation model to- wards universal interpretation for Earth observation imagery.Nature Machine Intelligence, 7, 836–852

Wu, Z.,et al.(2025). SkySense++: A multi-modal remote sensing foundation model to- wards universal interpretation for Earth observation imagery.Nature Machine Intelligence, 7, 836–852

work page 2025

[4] [4]

Wei, L., Li, G., and Gao, S. (2025). GeoTool-GPT: A knowledge-based question-answering framework involving geospatial analysis tools.International Journal of Geographical Infor- mation Science, 39(3), 620–650. 18

work page 2025

[5] [5]

Ji, Y., Gao, S., Nie, Y., and Majic, I. (2025). Revealing the impact of cross-domain knowl- edge on LLMs in understanding topological spatial relations in vector data.International Journal of Geographical Information Science

work page 2025

[6] [6]

GeoCogent: An LLM-based agent for geospatial code generation

Hou, S.,et al.(2025). GeoCogent: An LLM-based agent for geospatial code generation. International Journal of Geographical Information Science

work page 2025

[7] [7]

GeoAgent: A hierarchical LLM-based multi-agent architecture for autonomous spatial analysis.International Journal of Geographical Information Science

Lin, X.,et al.(2026). GeoAgent: A hierarchical LLM-based multi-agent architecture for autonomous spatial analysis.International Journal of Geographical Information Science

work page 2026

[8] [8]

K2: A foundation language model for geoscience knowledge under- standing and utilization.Proceedings of the ACM Web Conference / WSDM Companion

Deng, C.,et al.(2024). K2: A foundation language model for geoscience knowledge under- standing and utilization.Proceedings of the ACM Web Conference / WSDM Companion

work page 2024

[9] [9]

BB-GeoGPT: A framework for learning a large language model for geographic information science.Information Processing & Management, 61(5), 103808

Zhang, Y.,et al.(2024). BB-GeoGPT: A framework for learning a large language model for geographic information science.Information Processing & Management, 61(5), 103808

work page 2024

[10] [10]

ChatGeoAI: Enabling geospatial analysis for the public through natural language, with large language models.ISPRS International Journal of Geo-Information, 13(12), 438

Mansourian, A.,et al.(2024). ChatGeoAI: Enabling geospatial analysis for the public through natural language, with large language models.ISPRS International Journal of Geo-Information, 13(12), 438

work page 2024

[11] [11]

GeoGPT: Understanding and processing geospatial tasks through an autonomous GPT.International Journal of Applied Earth Observation and Geoinforma- tion, 131, 103976

Zhang, Y.,et al.(2024). GeoGPT: Understanding and processing geospatial tasks through an autonomous GPT.International Journal of Applied Earth Observation and Geoinforma- tion, 131, 103976

work page 2024

[12] [12]

MapGPT: An autonomous framework for mapping by integrating large language models and cartographic tools.Cartography and Geographic Information Science

Zhang, Y.,et al.(2024). MapGPT: An autonomous framework for mapping by integrating large language models and cartographic tools.Cartography and Geographic Information Science

work page 2024

[13] [13]

A flood knowledge-constrained large language model interactable with GIS: When AI meets flooding.International Journal of Geographical Information Science, 38(11), 2180–2205

Zhu, R.,et al.(2024). A flood knowledge-constrained large language model interactable with GIS: When AI meets flooding.International Journal of Geographical Information Science, 38(11), 2180–2205

work page 2024

[14] [14]

Evaluation of code LLMs on geospatial code generation.Proceed- ings of the ACM SIGSPATIAL International Workshop on Geospatial Artificial Intelligence

Gramacki, P.,et al.(2024). Evaluation of code LLMs on geospatial code generation.Proceed- ings of the ACM SIGSPATIAL International Workshop on Geospatial Artificial Intelligence

work page 2024

[15] [15]

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban- AI

Zhang, W., andGao, S.(2024).AutomatinggeospatialanalysisworkflowsusingChatGPT-4. Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban- AI

work page 2024

[16] [16]

GeoCode-GPT: A large language model for geospatial code generation tasks.International Journal of Applied Earth Observation and Geoinformation, 141, 104456

Hou, S.,et al.(2025). GeoCode-GPT: A large language model for geospatial code generation tasks.International Journal of Applied Earth Observation and Geoinformation, 141, 104456

work page 2025

[17] [17]

Can large language models generate geospatial code?Geo-spatial Information Science

Hou, S.,et al.(2025). Can large language models generate geospatial code?Geo-spatial Information Science

work page 2025

[18] [18]

AutoGEEval++: A multi-level and multi-geospatial-modality auto- mated benchmark for Google Earth Engine code generation.Geo-spatial Information Sci- ence

Wu, C.,et al.(2025). AutoGEEval++: A multi-level and multi-geospatial-modality auto- mated benchmark for Google Earth Engine code generation.Geo-spatial Information Sci- ence. 19

work page 2025