Historical Knowledge Graphs for Global Maritime Estimated Time of Arrival

Neofytos Dimitriou

arxiv: 2605.18408 · v1 · pith:ZKMLQ2DZnew · submitted 2026-05-18 · 💻 cs.CV

Historical Knowledge Graphs for Global Maritime Estimated Time of Arrival

Neofytos Dimitriou This is my paper

Pith reviewed 2026-05-20 10:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords AIS dataknowledge graphestimated time of arrivalvessel trajectoriestravel time predictionspeed distributionsmaritime forecastingglobal maritime

0 comments

The pith

A knowledge graph built only from historical AIS data predicts global vessel arrival times with median errors of 23 to 31 minutes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a global maritime knowledge graph can be assembled directly from Automatic Identification System records to forecast travel times between arbitrary locations without weather, currents, or other external inputs. Trajectories are first cleaned and segmented, then the graph records speed distributions broken down by vessel type, time of travel, and direction. Queries use a priority hierarchy with fallback rules to produce arrival estimates. Such forecasts matter for coordinating port arrivals, cutting idle time, and lowering fuel use and emissions across shipping routes. On held-out data the method reports median RMSE values of 22.75 minutes at segment level and 30.90 minutes at full-trajectory level.

Core claim

By preprocessing noisy AIS messages into segmented trajectories with a Gaussian-mixture-model pipeline, iteratively populating a graph of 5,433 geohash-3 nodes and 12,334 edges with speed distributions stratified by vessel type, time of travel, and direction, and retrieving predictions through a hierarchical priority-based query system with principled fallbacks, the resulting structure delivers median RMSE of 22.75 minutes on segments and 30.90 minutes on trajectories on a temporally held-out test set, and comparable figures on an external test set.

What carries the argument

The historical maritime knowledge graph that stores stratified speed distributions and answers travel-time queries via hierarchical fallback rules.

If this is right

Global travel-time prediction is feasible using only publicly available AIS records.
Just-in-time arrival planning at ports becomes practical at worldwide scale.
Vessel speed optimization can reduce fuel consumption and associated emissions.
A single graph structure supports queries between any pair of locations without retraining.
The same historical baseline can serve as a foundation for later integration of dynamic factors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The segmentation and stratification approach could be adapted to other movement domains that produce noisy location traces, such as road or rail networks.
Adding a lightweight real-time correction layer on top of the static graph might narrow errors further without discarding the historical core.
Port operators could test the predictions against live schedules to measure direct operational savings in waiting time.

Load-bearing premise

Historical speed distributions grouped only by vessel type, time of travel, and direction remain representative of future conditions even when weather, currents, or vessel loading differ from past averages.

What would settle it

A clear rise in prediction error during periods of unusual weather or for vessels carrying atypical loads compared with the historical strata used to build the graph would indicate that the distributions no longer suffice.

Figures

Figures reproduced from arXiv: 2605.18408 by Neofytos Dimitriou.

**Figure 2.** Figure 2: (Left) Pairwise relationships of displacement, time [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Accurate vessel estimated-time-of-arrival forecasts are critical for port operations and decarbonization, yet global-scale travel-time prediction remains difficult without costly contextual data. Herein, I present a methodology for constructing a historical maritime knowledge graph using only Automatic Identification System (AIS) data. First, segmented trajectories are extracted from noisy AIS data using a Gaussian-mixture-model-based preprocessing pipeline. The graph is then constructed by iteratively processing the trajectories and storing speed distributions stratified by vessel type, time of travel, and direction of travel; the resulting global graph comprises 5,433 geohash-3 nodes and 12,334 edges. The graph can be queried to retrieve travel-time predictions between any two location via a hierarchical, priority-based system that uses historical statistics with principled fallback. On a temporally held-out test set, median RMSE is 22.75 min (segment-level) and 30.90 min (trajectory-level), with 69.1% of trajectories within 20% of actual arrival time. On a second external test set, median RMSE is 27.36 min (segment-level) and 37.46 min (trajectory-level), with 62.1% of trajectories within 20%. These results corroborate the promise of our method, enabling global travel-time prediction and providing a strong foundation for just-in-time arrival planning and emissions reduction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper turns AIS data into a geohash knowledge graph of speed distributions and reports usable RMSE numbers on held-out and external tests, but it rests on the assumption that past patterns by type and time will hold without weather or currents.

read the letter

The paper builds a global graph from AIS trajectories where geohash-3 nodes connect via edges that store speed distributions stratified by vessel type, travel time, and direction. They segment the data with a GMM pipeline, end up with 5433 nodes and 12334 edges, and query the graph with a hierarchical fallback system to produce ETA predictions. On a temporal hold-out they get median RMSE of 22.75 minutes at segment level and 30.9 at trajectory level, with 69 percent of trajectories inside 20 percent error; the external set shows 27.36 and 37.46 minutes respectively with 62 percent inside 20 percent. Those numbers are concrete and the construction is new as a maritime application.

Referee Report

3 major / 2 minor

Summary. The paper claims to enable accurate global vessel estimated-time-of-arrival (ETA) prediction using only Automatic Identification System (AIS) data by constructing a historical maritime knowledge graph. Segmented trajectories are first extracted via a Gaussian-mixture-model preprocessing pipeline. The graph (5,433 geohash-3 nodes, 12,334 edges) stores empirical speed distributions stratified by vessel type, time of travel, and direction; predictions are obtained via a hierarchical priority-based query system with principled fallbacks. On a temporally held-out test set the method reports median RMSE of 22.75 min (segment-level) and 30.90 min (trajectory-level) with 69.1% of trajectories within 20% of actual arrival time; comparable figures (27.36 min / 37.46 min, 62.1% within 20%) are given on a second external test set.

Significance. If the reported performance generalizes, the work is significant because it shows that global-scale maritime ETA forecasting is feasible from publicly available AIS data alone, without costly contextual inputs such as weather or currents. This supplies a practical foundation for just-in-time port arrivals and associated decarbonization benefits. Concrete quantitative results on both temporally held-out and external validation sets, together with the pragmatic design of stratified historical statistics and fallback mechanisms, constitute clear strengths.

major comments (3)

[Abstract / preprocessing pipeline] Abstract and preprocessing description: the accuracy of the Gaussian-mixture-model-based trajectory segmentation is never quantified (no precision, recall, or error-rate metrics; no comparison against manual or alternative segmentations). Because the speed distributions that populate the knowledge-graph edges are derived directly from these segments, uncharacterized segmentation errors are load-bearing for the reliability of the reported RMSE values.
[Evaluation methodology] Evaluation section: details on the temporal hold-out procedure are absent (e.g., exact split dates, confirmation that no trajectory spans the train/test boundary, or verification that graph construction uses only past data). Without these safeguards it is impossible to rule out leakage, which directly undermines the generalization claims supported by the held-out RMSE figures.
[Graph construction and query system] Method and discussion: speed distributions are conditioned solely on vessel type, time of travel, and direction. The manuscript provides no sensitivity analysis or discussion of how unmodeled factors (weather, currents, vessel loading, traffic) affect the distributions. Because the headline RMSE numbers already embed any mismatch between historical and test-period conditions, the external-set results alone do not establish robustness under varying regimes.

minor comments (2)

[Abstract] The precise definition of 'time of travel' used for stratification (departure time, segment midpoint, etc.) is not stated; a short clarification would remove ambiguity.
[Prediction query system] A schematic or pseudocode describing the hierarchical priority-based query and fallback logic would substantially improve reproducibility and reader understanding.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment below, indicating where revisions will be made to improve the manuscript.

read point-by-point responses

Referee: [Abstract / preprocessing pipeline] Abstract and preprocessing description: the accuracy of the Gaussian-mixture-model-based trajectory segmentation is never quantified (no precision, recall, or error-rate metrics; no comparison against manual or alternative segmentations). Because the speed distributions that populate the knowledge-graph edges are derived directly from these segments, uncharacterized segmentation errors are load-bearing for the reliability of the reported RMSE values.

Authors: We agree that formal quantification of segmentation accuracy is important for validating the downstream speed distributions. The GMM pipeline was selected for its ability to distinguish stationary and moving states in noisy AIS streams, but quantitative metrics against manual or alternative segmentations were not included in the original submission. In the revised manuscript we will add a dedicated evaluation subsection reporting precision, recall, and boundary-error statistics on a manually annotated sample of trajectories. revision: yes
Referee: [Evaluation methodology] Evaluation section: details on the temporal hold-out procedure are absent (e.g., exact split dates, confirmation that no trajectory spans the train/test boundary, or verification that graph construction uses only past data). Without these safeguards it is impossible to rule out leakage, which directly undermines the generalization claims supported by the held-out RMSE figures.

Authors: The temporal hold-out was performed with a fixed cutoff date chosen so that every test trajectory lies entirely after the training period and no trajectory crosses the boundary; the knowledge graph was built exclusively from training data. We acknowledge that these procedural details were omitted from the original text. The revised evaluation section will state the exact split date, describe the boundary-check procedure, and confirm that only past data were used for graph construction. revision: yes
Referee: [Graph construction and query system] Method and discussion: speed distributions are conditioned solely on vessel type, time of travel, and direction. The manuscript provides no sensitivity analysis or discussion of how unmodeled factors (weather, currents, vessel loading, traffic) affect the distributions. Because the headline RMSE numbers already embed any mismatch between historical and test-period conditions, the external-set results alone do not establish robustness under varying regimes.

Authors: We recognize that conditioning only on vessel type, time, and direction leaves unmodeled influences such as weather, currents, and traffic unexamined. The reported RMSE values therefore reflect average historical conditions, and the external test set supplies limited evidence of robustness. A full sensitivity analysis would require additional contextual data sources outside the AIS-only scope of this study. In the revision we will expand the discussion to explicitly note these limitations and outline directions for future work that could incorporate such variables. revision: partial

Circularity Check

0 steps flagged

No significant circularity; performance metrics measured on independent held-out sets

full rationale

The paper constructs a knowledge graph by extracting segmented trajectories from AIS data via GMM preprocessing and populating edges with empirical speed distributions stratified solely by vessel type, time of travel, and direction. Travel-time predictions are obtained by querying this graph using a hierarchical priority-based system with fallback to historical statistics. The reported median RMSE values (22.75 min segment-level and 30.90 min trajectory-level on the temporally held-out set; 27.36 min and 37.46 min on the external set) along with the 69.1% and 62.1% within-20% figures are computed directly from comparisons against ground-truth arrival times in those independent test sets. No derivation step reduces by construction to its own inputs, no fitted parameters are relabeled as predictions, and no self-citation or uniqueness claim is invoked to justify the central results; the evaluation remains externally falsifiable against the held-out data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the premise that AIS-derived historical speeds capture the dominant variability in travel times; no new physical entities are postulated and the only free choices are the geohash granularity and stratification bins.

free parameters (1)

geohash-3 resolution
Chosen granularity for nodes; affects coverage and prediction granularity but no fitted value is stated.

axioms (1)

domain assumption AIS data contains sufficient positional and timestamp information to reconstruct representative trajectories after GMM-based cleaning.
Invoked in the preprocessing pipeline description.

pith-pipeline@v0.9.0 · 5765 in / 1218 out tokens · 44105 ms · 2026-05-20T10:47:17.835918+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The graph is then constructed by iteratively processing the trajectories and storing speed distributions stratified by vessel type, time of travel, and direction of travel; the resulting global graph comprises 5,433 geohash-3 nodes and 12,334 edges.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Travel-time predictions leverage hierarchical priority levels for speed estimation, progressively relaxing specificity when historical data is sparse

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

and Smith, A.B

Jones, C.D. and Smith, A.B. and Roberts, E.F. Article Title. Proceedings Title. 2003

work page 2003
[2]

Maritime Policy & Management , volume =

Shuo Jiang and Lei Liu and Peng Peng and Mengqiao Xu and Ran Yan , title =. Maritime Policy & Management , volume =. 2025 , publisher =

work page 2025
[3]

Flexible Services and Manufacturing Journal , year=

El Mekkaoui, Sara and Benabbou, Loubna and Berrado, Abdelaziz , title=. Flexible Services and Manufacturing Journal , year=

work page
[4]

and Michaelides, Michalis P

Evmides, Nicos and Aslam, Sheraz and Ramez, Tzioyntmprian T. and Michaelides, Michalis P. and Herodotou, Herodotos , TITLE =. Journal of Marine Science and Engineering , VOLUME =. 2024 , NUMBER =

work page 2024
[5]

Estimated Time of Arrival Using Historical Vessel Tracking Data , year=

Alessandrini, Alfredo and Mazzarella, Fabio and Vespe, Michele , journal=. Estimated Time of Arrival Using Historical Vessel Tracking Data , year=

work page
[6]

2021 , issn =

Vessel estimated time of arrival prediction system based on a path-finding algorithm , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.martra.2021.100012 , url =

work page doi:10.1016/j.martra.2021.100012 2021
[7]

CoRR , volume=

Deqing Zhai and Xiuju Fu and Xiaofeng Yin and Haiyan Xu and Wanbing Zhang and Ning Li , title=. CoRR , volume=. 2022 , cdate=

work page 2022
[8]

Proceedings of the 30th ACM International Conference on Information & Knowledge Management , year=

ETA Prediction with Graph Neural Networks in Google Maps , author=. Proceedings of the 30th ACM International Conference on Information & Knowledge Management , year=

work page
[9]

Eason, B

G. Eason, B. Noble, and I. N. Sneddon, ``On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,'' Phil. Trans. Roy. Soc. London, vol. A247, pp. 529--551, April 1955

work page 1955
[10]

Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol

J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68--73

work page
[11]

I. S. Jacobs and C. P. Bean, ``Fine particles, thin films and exchange anisotropy,'' in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271--350

work page 1963
[12]

Elissa, ``Title of paper if known,'' unpublished

K. Elissa, ``Title of paper if known,'' unpublished

work page
[13]

Nicole, ``Title of paper with only first word capitalized,'' J

R. Nicole, ``Title of paper with only first word capitalized,'' J. Name Stand. Abbrev., in press

work page
[14]

Yorozu, M

Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, ``Electron spectroscopy studies on magneto-optical media and plastic substrate interface,'' IEEE Transl. J. Magn. Japan, vol. 2, pp. 740--741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982]

work page 1987
[15]

Young, The Technical Writer's Handbook

M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989

work page 1989

[1] [1]

and Smith, A.B

Jones, C.D. and Smith, A.B. and Roberts, E.F. Article Title. Proceedings Title. 2003

work page 2003

[2] [2]

Maritime Policy & Management , volume =

Shuo Jiang and Lei Liu and Peng Peng and Mengqiao Xu and Ran Yan , title =. Maritime Policy & Management , volume =. 2025 , publisher =

work page 2025

[3] [3]

Flexible Services and Manufacturing Journal , year=

El Mekkaoui, Sara and Benabbou, Loubna and Berrado, Abdelaziz , title=. Flexible Services and Manufacturing Journal , year=

work page

[4] [4]

and Michaelides, Michalis P

Evmides, Nicos and Aslam, Sheraz and Ramez, Tzioyntmprian T. and Michaelides, Michalis P. and Herodotou, Herodotos , TITLE =. Journal of Marine Science and Engineering , VOLUME =. 2024 , NUMBER =

work page 2024

[5] [5]

Estimated Time of Arrival Using Historical Vessel Tracking Data , year=

Alessandrini, Alfredo and Mazzarella, Fabio and Vespe, Michele , journal=. Estimated Time of Arrival Using Historical Vessel Tracking Data , year=

work page

[6] [6]

2021 , issn =

Vessel estimated time of arrival prediction system based on a path-finding algorithm , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.martra.2021.100012 , url =

work page doi:10.1016/j.martra.2021.100012 2021

[7] [7]

CoRR , volume=

Deqing Zhai and Xiuju Fu and Xiaofeng Yin and Haiyan Xu and Wanbing Zhang and Ning Li , title=. CoRR , volume=. 2022 , cdate=

work page 2022

[8] [8]

Proceedings of the 30th ACM International Conference on Information & Knowledge Management , year=

ETA Prediction with Graph Neural Networks in Google Maps , author=. Proceedings of the 30th ACM International Conference on Information & Knowledge Management , year=

work page

[9] [9]

Eason, B

G. Eason, B. Noble, and I. N. Sneddon, ``On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,'' Phil. Trans. Roy. Soc. London, vol. A247, pp. 529--551, April 1955

work page 1955

[10] [10]

Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol

J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68--73

work page

[11] [11]

I. S. Jacobs and C. P. Bean, ``Fine particles, thin films and exchange anisotropy,'' in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271--350

work page 1963

[12] [12]

Elissa, ``Title of paper if known,'' unpublished

K. Elissa, ``Title of paper if known,'' unpublished

work page

[13] [13]

Nicole, ``Title of paper with only first word capitalized,'' J

R. Nicole, ``Title of paper with only first word capitalized,'' J. Name Stand. Abbrev., in press

work page

[14] [14]

Yorozu, M

Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, ``Electron spectroscopy studies on magneto-optical media and plastic substrate interface,'' IEEE Transl. J. Magn. Japan, vol. 2, pp. 740--741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982]

work page 1987

[15] [15]

Young, The Technical Writer's Handbook

M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989

work page 1989