pith. sign in

arxiv: 2605.18408 · v1 · pith:ZKMLQ2DZnew · submitted 2026-05-18 · 💻 cs.CV

Historical Knowledge Graphs for Global Maritime Estimated Time of Arrival

Pith reviewed 2026-05-20 10:47 UTC · model grok-4.3

classification 💻 cs.CV
keywords AIS dataknowledge graphestimated time of arrivalvessel trajectoriestravel time predictionspeed distributionsmaritime forecastingglobal maritime
0
0 comments X

The pith

A knowledge graph built only from historical AIS data predicts global vessel arrival times with median errors of 23 to 31 minutes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a global maritime knowledge graph can be assembled directly from Automatic Identification System records to forecast travel times between arbitrary locations without weather, currents, or other external inputs. Trajectories are first cleaned and segmented, then the graph records speed distributions broken down by vessel type, time of travel, and direction. Queries use a priority hierarchy with fallback rules to produce arrival estimates. Such forecasts matter for coordinating port arrivals, cutting idle time, and lowering fuel use and emissions across shipping routes. On held-out data the method reports median RMSE values of 22.75 minutes at segment level and 30.90 minutes at full-trajectory level.

Core claim

By preprocessing noisy AIS messages into segmented trajectories with a Gaussian-mixture-model pipeline, iteratively populating a graph of 5,433 geohash-3 nodes and 12,334 edges with speed distributions stratified by vessel type, time of travel, and direction, and retrieving predictions through a hierarchical priority-based query system with principled fallbacks, the resulting structure delivers median RMSE of 22.75 minutes on segments and 30.90 minutes on trajectories on a temporally held-out test set, and comparable figures on an external test set.

What carries the argument

The historical maritime knowledge graph that stores stratified speed distributions and answers travel-time queries via hierarchical fallback rules.

If this is right

  • Global travel-time prediction is feasible using only publicly available AIS records.
  • Just-in-time arrival planning at ports becomes practical at worldwide scale.
  • Vessel speed optimization can reduce fuel consumption and associated emissions.
  • A single graph structure supports queries between any pair of locations without retraining.
  • The same historical baseline can serve as a foundation for later integration of dynamic factors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The segmentation and stratification approach could be adapted to other movement domains that produce noisy location traces, such as road or rail networks.
  • Adding a lightweight real-time correction layer on top of the static graph might narrow errors further without discarding the historical core.
  • Port operators could test the predictions against live schedules to measure direct operational savings in waiting time.

Load-bearing premise

Historical speed distributions grouped only by vessel type, time of travel, and direction remain representative of future conditions even when weather, currents, or vessel loading differ from past averages.

What would settle it

A clear rise in prediction error during periods of unusual weather or for vessels carrying atypical loads compared with the historical strata used to build the graph would indicate that the distributions no longer suffice.

Figures

Figures reproduced from arXiv: 2605.18408 by Neofytos Dimitriou.

Figure 1
Figure 1. Figure 1: Spatial visualization of the average per-node errors of the test set trajectories. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (Left) Pairwise relationships of displacement, time [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

Accurate vessel estimated-time-of-arrival forecasts are critical for port operations and decarbonization, yet global-scale travel-time prediction remains difficult without costly contextual data. Herein, I present a methodology for constructing a historical maritime knowledge graph using only Automatic Identification System (AIS) data. First, segmented trajectories are extracted from noisy AIS data using a Gaussian-mixture-model-based preprocessing pipeline. The graph is then constructed by iteratively processing the trajectories and storing speed distributions stratified by vessel type, time of travel, and direction of travel; the resulting global graph comprises 5,433 geohash-3 nodes and 12,334 edges. The graph can be queried to retrieve travel-time predictions between any two location via a hierarchical, priority-based system that uses historical statistics with principled fallback. On a temporally held-out test set, median RMSE is 22.75 min (segment-level) and 30.90 min (trajectory-level), with 69.1% of trajectories within 20% of actual arrival time. On a second external test set, median RMSE is 27.36 min (segment-level) and 37.46 min (trajectory-level), with 62.1% of trajectories within 20%. These results corroborate the promise of our method, enabling global travel-time prediction and providing a strong foundation for just-in-time arrival planning and emissions reduction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to enable accurate global vessel estimated-time-of-arrival (ETA) prediction using only Automatic Identification System (AIS) data by constructing a historical maritime knowledge graph. Segmented trajectories are first extracted via a Gaussian-mixture-model preprocessing pipeline. The graph (5,433 geohash-3 nodes, 12,334 edges) stores empirical speed distributions stratified by vessel type, time of travel, and direction; predictions are obtained via a hierarchical priority-based query system with principled fallbacks. On a temporally held-out test set the method reports median RMSE of 22.75 min (segment-level) and 30.90 min (trajectory-level) with 69.1% of trajectories within 20% of actual arrival time; comparable figures (27.36 min / 37.46 min, 62.1% within 20%) are given on a second external test set.

Significance. If the reported performance generalizes, the work is significant because it shows that global-scale maritime ETA forecasting is feasible from publicly available AIS data alone, without costly contextual inputs such as weather or currents. This supplies a practical foundation for just-in-time port arrivals and associated decarbonization benefits. Concrete quantitative results on both temporally held-out and external validation sets, together with the pragmatic design of stratified historical statistics and fallback mechanisms, constitute clear strengths.

major comments (3)
  1. [Abstract / preprocessing pipeline] Abstract and preprocessing description: the accuracy of the Gaussian-mixture-model-based trajectory segmentation is never quantified (no precision, recall, or error-rate metrics; no comparison against manual or alternative segmentations). Because the speed distributions that populate the knowledge-graph edges are derived directly from these segments, uncharacterized segmentation errors are load-bearing for the reliability of the reported RMSE values.
  2. [Evaluation methodology] Evaluation section: details on the temporal hold-out procedure are absent (e.g., exact split dates, confirmation that no trajectory spans the train/test boundary, or verification that graph construction uses only past data). Without these safeguards it is impossible to rule out leakage, which directly undermines the generalization claims supported by the held-out RMSE figures.
  3. [Graph construction and query system] Method and discussion: speed distributions are conditioned solely on vessel type, time of travel, and direction. The manuscript provides no sensitivity analysis or discussion of how unmodeled factors (weather, currents, vessel loading, traffic) affect the distributions. Because the headline RMSE numbers already embed any mismatch between historical and test-period conditions, the external-set results alone do not establish robustness under varying regimes.
minor comments (2)
  1. [Abstract] The precise definition of 'time of travel' used for stratification (departure time, segment midpoint, etc.) is not stated; a short clarification would remove ambiguity.
  2. [Prediction query system] A schematic or pseudocode describing the hierarchical priority-based query and fallback logic would substantially improve reproducibility and reader understanding.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment below, indicating where revisions will be made to improve the manuscript.

read point-by-point responses
  1. Referee: [Abstract / preprocessing pipeline] Abstract and preprocessing description: the accuracy of the Gaussian-mixture-model-based trajectory segmentation is never quantified (no precision, recall, or error-rate metrics; no comparison against manual or alternative segmentations). Because the speed distributions that populate the knowledge-graph edges are derived directly from these segments, uncharacterized segmentation errors are load-bearing for the reliability of the reported RMSE values.

    Authors: We agree that formal quantification of segmentation accuracy is important for validating the downstream speed distributions. The GMM pipeline was selected for its ability to distinguish stationary and moving states in noisy AIS streams, but quantitative metrics against manual or alternative segmentations were not included in the original submission. In the revised manuscript we will add a dedicated evaluation subsection reporting precision, recall, and boundary-error statistics on a manually annotated sample of trajectories. revision: yes

  2. Referee: [Evaluation methodology] Evaluation section: details on the temporal hold-out procedure are absent (e.g., exact split dates, confirmation that no trajectory spans the train/test boundary, or verification that graph construction uses only past data). Without these safeguards it is impossible to rule out leakage, which directly undermines the generalization claims supported by the held-out RMSE figures.

    Authors: The temporal hold-out was performed with a fixed cutoff date chosen so that every test trajectory lies entirely after the training period and no trajectory crosses the boundary; the knowledge graph was built exclusively from training data. We acknowledge that these procedural details were omitted from the original text. The revised evaluation section will state the exact split date, describe the boundary-check procedure, and confirm that only past data were used for graph construction. revision: yes

  3. Referee: [Graph construction and query system] Method and discussion: speed distributions are conditioned solely on vessel type, time of travel, and direction. The manuscript provides no sensitivity analysis or discussion of how unmodeled factors (weather, currents, vessel loading, traffic) affect the distributions. Because the headline RMSE numbers already embed any mismatch between historical and test-period conditions, the external-set results alone do not establish robustness under varying regimes.

    Authors: We recognize that conditioning only on vessel type, time, and direction leaves unmodeled influences such as weather, currents, and traffic unexamined. The reported RMSE values therefore reflect average historical conditions, and the external test set supplies limited evidence of robustness. A full sensitivity analysis would require additional contextual data sources outside the AIS-only scope of this study. In the revision we will expand the discussion to explicitly note these limitations and outline directions for future work that could incorporate such variables. revision: partial

Circularity Check

0 steps flagged

No significant circularity; performance metrics measured on independent held-out sets

full rationale

The paper constructs a knowledge graph by extracting segmented trajectories from AIS data via GMM preprocessing and populating edges with empirical speed distributions stratified solely by vessel type, time of travel, and direction. Travel-time predictions are obtained by querying this graph using a hierarchical priority-based system with fallback to historical statistics. The reported median RMSE values (22.75 min segment-level and 30.90 min trajectory-level on the temporally held-out set; 27.36 min and 37.46 min on the external set) along with the 69.1% and 62.1% within-20% figures are computed directly from comparisons against ground-truth arrival times in those independent test sets. No derivation step reduces by construction to its own inputs, no fitted parameters are relabeled as predictions, and no self-citation or uniqueness claim is invoked to justify the central results; the evaluation remains externally falsifiable against the held-out data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the premise that AIS-derived historical speeds capture the dominant variability in travel times; no new physical entities are postulated and the only free choices are the geohash granularity and stratification bins.

free parameters (1)
  • geohash-3 resolution
    Chosen granularity for nodes; affects coverage and prediction granularity but no fitted value is stated.
axioms (1)
  • domain assumption AIS data contains sufficient positional and timestamp information to reconstruct representative trajectories after GMM-based cleaning.
    Invoked in the preprocessing pipeline description.

pith-pipeline@v0.9.0 · 5765 in / 1218 out tokens · 44105 ms · 2026-05-20T10:47:17.835918+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    and Smith, A.B

    Jones, C.D. and Smith, A.B. and Roberts, E.F. Article Title. Proceedings Title. 2003

  2. [2]

    Maritime Policy & Management , volume =

    Shuo Jiang and Lei Liu and Peng Peng and Mengqiao Xu and Ran Yan , title =. Maritime Policy & Management , volume =. 2025 , publisher =

  3. [3]

    Flexible Services and Manufacturing Journal , year=

    El Mekkaoui, Sara and Benabbou, Loubna and Berrado, Abdelaziz , title=. Flexible Services and Manufacturing Journal , year=

  4. [4]

    and Michaelides, Michalis P

    Evmides, Nicos and Aslam, Sheraz and Ramez, Tzioyntmprian T. and Michaelides, Michalis P. and Herodotou, Herodotos , TITLE =. Journal of Marine Science and Engineering , VOLUME =. 2024 , NUMBER =

  5. [5]

    Estimated Time of Arrival Using Historical Vessel Tracking Data , year=

    Alessandrini, Alfredo and Mazzarella, Fabio and Vespe, Michele , journal=. Estimated Time of Arrival Using Historical Vessel Tracking Data , year=

  6. [6]

    2021 , issn =

    Vessel estimated time of arrival prediction system based on a path-finding algorithm , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.martra.2021.100012 , url =

  7. [7]

    CoRR , volume=

    Deqing Zhai and Xiuju Fu and Xiaofeng Yin and Haiyan Xu and Wanbing Zhang and Ning Li , title=. CoRR , volume=. 2022 , cdate=

  8. [8]

    Proceedings of the 30th ACM International Conference on Information & Knowledge Management , year=

    ETA Prediction with Graph Neural Networks in Google Maps , author=. Proceedings of the 30th ACM International Conference on Information & Knowledge Management , year=

  9. [9]

    Eason, B

    G. Eason, B. Noble, and I. N. Sneddon, ``On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,'' Phil. Trans. Roy. Soc. London, vol. A247, pp. 529--551, April 1955

  10. [10]

    Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol

    J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68--73

  11. [11]

    I. S. Jacobs and C. P. Bean, ``Fine particles, thin films and exchange anisotropy,'' in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271--350

  12. [12]

    Elissa, ``Title of paper if known,'' unpublished

    K. Elissa, ``Title of paper if known,'' unpublished

  13. [13]

    Nicole, ``Title of paper with only first word capitalized,'' J

    R. Nicole, ``Title of paper with only first word capitalized,'' J. Name Stand. Abbrev., in press

  14. [14]

    Yorozu, M

    Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, ``Electron spectroscopy studies on magneto-optical media and plastic substrate interface,'' IEEE Transl. J. Magn. Japan, vol. 2, pp. 740--741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982]

  15. [15]

    Young, The Technical Writer's Handbook

    M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989