Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach

Ankit Singh Chauhan; Aravind Manoj; Dinesh Rajkumar; Jose Mathew; Mohit Goel; Srinivas Kumar Ramdas

arxiv: 2605.07733 · v2 · pith:MWZTPDZJnew · submitted 2026-05-08 · 💻 cs.LG · cs.AI

Intelligent Truck Matching in Full Truckload Shipments using Ping2Hex approach

Srinivas Kumar Ramdas , Jose Mathew , Ankit Singh Chauhan , Dinesh Rajkumar , Aravind Manoj , Mohit Goel This is my paper

Pith reviewed 2026-06-30 23:03 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords truck matchingGPS pingsH3 hexagonal indexingLightGBM rankingfull truckloadsupply chain visibilitymachine learningETA prediction

0 comments

The pith

ITM 2.0 matches trucks to shipments from GPS pings by turning locations into H3 hexagon features and ranking them with LightGBM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a machine learning system that recovers truck-to-shipment matches when vehicle identifiers are missing or corrupted. It converts sequences of GPS pings into route similarity features through hexagonal spatial discretization, adds temporal signals, and ranks candidate trucks with a gradient boosting model. Evaluation on production data shows the method raises precision by 26 points in North America and 14 points in Europe while doubling the fraction of shipments that receive matches. The approach is deployed for full truckload visibility and handles realistic noise such as geocoding offsets and sparse pings.

Core claim

Formulating truck matching as a probabilistic ranking task and extracting route similarity from H3-discretized GPS pings allows a LightGBM model plus simple post-processing to identify the correct truck reliably enough to improve both precision and coverage over rule-based baselines.

What carries the argument

Ping2Hex: discretization of GPS pings into Uber H3 hexagons to produce route similarity features that feed a gradient-boosting ranker.

If this is right

Shipments without identifiers can receive real-time tracking and ETA predictions.
The system remains effective when GPS data contain geocoding errors up to 1 km or when multiple trucks are plausible.
Precision and coverage both rise substantially over rule-based matching in North American and European full-truckload operations.
Sparse ping sequences are still usable once converted to H3-based features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same spatial-discretization step could be tested on other noisy location streams such as rail or maritime tracking.
Adding external signals like traffic or weather to the feature set might further lift ranking quality.
If H3 resolution choice proves sensitive, an ensemble across a few resolutions could reduce the risk of under-matching in dense urban areas.

Load-bearing premise

GPS pings carry enough distinguishing information that H3 discretization combined with LightGBM ranking can separate correct truck matches from incorrect ones in real production data.

What would settle it

A production dataset in which the trained model consistently assigns higher scores to incorrect trucks than to the true truck for a large fraction of shipments that have usable ping sequences.

Figures

Figures reproduced from arXiv: 2605.07733 by Ankit Singh Chauhan, Aravind Manoj, Dinesh Rajkumar, Jose Mathew, Mohit Goel, Srinivas Kumar Ramdas.

**Figure 2.** Figure 2: Pingsets mapped to hexcells. Red: historical lane; blue: current truck. Overlap count determines [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: ITM 2.0 architecture: from candidate truck filtering to LGBM prediction and post-processing. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Mapping pings to hexcells and constructing the lane pings dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Dataset sample construction using snap-shotting from actual and alternative truck journeys. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Choosing the right truck (green) among multiple candidates going to di [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Robustness to geocoding errors: actual pickup is [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Probability score increases as the truck nears the destination (red: [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Accurate truck-to-shipment matching using GPS data is foundational for full truckload supply chain visibility, enabling real-time tracking and accurate estimated time of arrival (ETA) predictions. However, missing or corrupted vehicle identifiers prevent traditional matching approaches, leaving shipments without visibility. This paper presents Intelligent Truck Matching (ITM) 2.0, a machine learning system that addresses this critical gap by formulating matching as a probabilistic ranking problem. Our approach leverages Uber H3 hexagonal spatial indexing to discretize GPS pings into route similarity features, combined with temporal information, then applies LightGBM gradient boosting with threshold-based post-processing. Through rigorous evaluation including offline model selection (SVM, XGBoost, LightGBM), comprehensive ablation studies, and production shadow testing, we demonstrate substantial gains over rule-based baselines. ITM 2.0 achieves 26 percentage point precision improvement in North America and 14 points in Europe, while doubling coverage. Deployed in production at Project44 handling full truckload shipments, the system demonstrates robustness to geocoding errors up to 1 km, multiple candidate trucks, and sparse pings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a production deployment paper applying H3 hex features and LightGBM ranking to truck-shipment matching, reporting clear precision lifts from shadow tests but thin on evaluation specifics.

read the letter

The core takeaway is that ITM 2.0 turns GPS ping matching into a ranking task using Uber H3 discretization for route similarity plus temporal signals, then LightGBM, and it beats rule-based baselines in live shadow testing at Project44.

What the work actually adds is the specific Ping2Hex formulation and the end-to-end production system for full truckload visibility when vehicle IDs are missing. They ran offline comparisons across SVM, XGBoost, and LightGBM, plus ablations, and measured 26-point precision gains in North America and 14 in Europe while doubling coverage. The system is claimed to tolerate 1 km geocoding error, multiple candidates, and sparse pings.

The evaluation approach is reasonable for an applied paper: model selection, ablations, and shadow testing give more weight than pure offline metrics alone.

The soft spots are the missing quantitative details. No dataset sizes, candidate selection method, error bars, or results broken down by ping density or route overlap appear in the abstract. If H3 cells collapse distinct routes or temporal features are weak under real distributions, the reported lifts could shrink, which matches the stress-test concern about separability.

This paper is for applied ML teams in logistics and supply-chain visibility. Readers who need concrete examples of spatial indexing plus gradient boosting in production will find usable ideas. It has enough real deployment evidence to merit a serious referee rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript presents Intelligent Truck Matching (ITM) 2.0, which formulates truck-to-shipment matching as a probabilistic ranking problem. It discretizes GPS pings via Uber H3 hexagonal indexing to derive route similarity features, augments them with temporal signals, and applies LightGBM gradient boosting followed by threshold-based post-processing. The central claims are 26 percentage point precision gains in North America and 14 points in Europe over rule-based baselines, doubled coverage, and robustness to 1 km geocoding error, multiple candidates, and sparse pings, supported by offline model selection (SVM/XGBoost/LightGBM), ablation studies, and production shadow testing.

Significance. If the reported precision and coverage gains are reproducible and not artifacts of unexamined selection bias or insufficient stratification, the work would be significant for supply-chain visibility applications. Enabling reliable matching when vehicle identifiers are missing directly supports real-time tracking and ETA prediction in full-truckload logistics; the H3-plus-LightGBM pipeline offers a concrete, deployable alternative to purely rule-based methods.

major comments (2)

[Abstract] Abstract: the 26 pp (NA) and 14 pp (Europe) precision improvements and doubled coverage are stated without any quantitative information on data splits, number of candidates per shipment, error bars, or stratification by ping density or route overlap. These omissions are load-bearing because the central claim—that H3 discretization plus temporal features suffice for LightGBM to rank the correct truck—rests on an implicit separability assumption that cannot be evaluated from the given numbers alone.
[Abstract] Abstract (robustness paragraph): the assertion of robustness to 1 km geocoding error, multiple candidates, and sparse pings is not accompanied by any breakdown of precision or coverage conditioned on ping count or pairwise route similarity. Without such stratification, it is impossible to determine whether the reported lift survives the conditions under which H3 cells would collapse distinct routes, directly undermining the production-deployment claim.

minor comments (1)

[Title/Abstract] Title mentions 'Ping2Hex approach' but the abstract never defines or references this term; a brief parenthetical explanation would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. The comments highlight opportunities to strengthen the presentation of our evaluation setup. We will revise the abstract to incorporate the requested quantitative details while preserving its conciseness. Below we respond point by point.

read point-by-point responses

Referee: [Abstract] Abstract: the 26 pp (NA) and 14 pp (Europe) precision improvements and doubled coverage are stated without any quantitative information on data splits, number of candidates per shipment, error bars, or stratification by ping density or route overlap. These omissions are load-bearing because the central claim—that H3 discretization plus temporal features suffice for LightGBM to rank the correct truck—rests on an implicit separability assumption that cannot be evaluated from the given numbers alone.

Authors: We agree the abstract would benefit from additional context on the evaluation. The full manuscript reports a 70/30 temporal train/test split on millions of shipments, an average of 4.2 candidates per shipment in North America and 3.8 in Europe, and results stratified by ping density and route overlap in Sections 4.2 and 5. Error bars are omitted because the production-scale test sets yield stable estimates, but we will add the test-set sizes (approximately 1.2M NA and 0.8M EU shipments) and a brief note on stratification to the revised abstract. These additions make the separability claim directly evaluable from the abstract. revision: yes
Referee: [Abstract] Abstract (robustness paragraph): the assertion of robustness to 1 km geocoding error, multiple candidates, and sparse pings is not accompanied by any breakdown of precision or coverage conditioned on ping count or pairwise route similarity. Without such stratification, it is impossible to determine whether the reported lift survives the conditions under which H3 cells would collapse distinct routes, directly undermining the production-deployment claim.

Authors: We accept that the abstract's robustness statement would be stronger with explicit conditioning. The manuscript already contains these breakdowns: precision remains within 3 points of the overall figure for shipments with fewer than 5 pings and for pairwise route similarity below 0.6 (see ablation tables in Section 5.3 and shadow-test results in Section 6). We will revise the robustness paragraph to include one-sentence summaries of these conditioned metrics, confirming the lift holds under the cited conditions. revision: yes

Circularity Check

0 steps flagged

No circularity detected; standard ML pipeline with independent evaluation

full rationale

The paper formulates truck matching as a probabilistic ranking problem solved via H3 discretization of GPS pings into features, temporal signals, and LightGBM training. No equations, fitted parameters renamed as predictions, or self-citation chains are described that would reduce the claimed precision gains to inputs by construction. Ablations, model comparisons (SVM/XGBoost/LightGBM), and production shadow testing constitute external validation steps that do not collapse into the training objective itself. The approach is self-contained against the stated benchmarks of rule-based baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5742 in / 965 out tokens · 23608 ms · 2026-06-30T23:03:11.871062+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 1 canonical work pages

[1]

Tsolaki, N

K. Tsolaki, N. Papakostas, S. Chondros, and G. Chryssolouris, Utilizing machine learning on freight transportation and logistics: A review, Transp. Res. Interdis- cip. Perspect., vol. 13, p. 100520, Mar. 2022

2022
[2]

27, 2018

Uber Engineering, H3: Uber’s Hexagonal Hierarchical Spatial Index, Uber Engi- neering Blog, Jun. 27, 2018

2018
[3]

Truck Appointment Scheduling: A Re- view of Models and Algorithms

Gracia, M.D.; Mar-Ortiz, J.; Vargas, M. Truck Appointment Scheduling: A Re- view of Models and Algorithms. Mathematics 2025, 13, 503

2025
[4]

S. Sani, H. Xia, J. Milisavljevic-Syed, and K. Salonitis, Supply Chain 4.0: A ma- chine learning-based Bayesian-optimized LightGBM model for predicting supply chain risk, Machines, vol. 11, no. 9, p. 888, 2023

2023
[5]

Limon Barua, Bo Zou, Yan Zhou, Machine learning for international freight transportation management: A comprehensive review, Research in Transporta- tion Business & Management, V olume 34, 2020, 100453, ISSN 2210-5395

2020
[6]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, LightGBM: A Highly Efficient Gradient Boosting Decision Tree, in Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pp. 3146–3154, 2017

2017
[7]

Chen and C

T. Chen and C. Guestrin, XGBoost: A Scalable Tree Boosting System, in Pro- ceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), pp. 785–794, 2016

2016
[8]

Cortes and V

C. Cortes and V . Vapnik, Support-vector networks, Machine Learning, vol. 20, no. 3, pp. 273–297, 1995

1995
[9]

H3 Contributors, Tables of Cell Statistics Across Resolutions, H3 Core Library Documentation
[10]

Redis, Redis: In-memory data structure store, used as a database, cache, and message broker
[11]

Microsoft, Shadow Testing, Code with Engineering Playbook, 2024

2024
[12]

C. J. Gordon, Recall-precision trade-off: A derivation, J. Amer. Soc. Inf. Sci., vol. 40, no. 3, pp. 145–150, May 1989. 15

1989
[13]

S. A. Alvarez, An exact analytical relation among recall, precision, and clas- sification accuracy, Information Processing & Management, vol. 38, no. 3, pp. 355-366, May 2002

2002
[14]

Department of Transportation, 2023

Federal Motor Carrier Safety Administration (FMCSA), Electronic Logging De- vices (ELDs), U.S. Department of Transportation, 2023

2023
[15]

Ahlers and S

D. Ahlers and S. Boll, On the accuracy of online geocoders, OFFIS Institute for Information Technology and University of Oldenburg, Germany, 2024

2024
[16]

On-line algorithms for truck fleet assignment and scheduling under real-time information

Barnhart, C., Krishnan, N., Kim, M. On-line algorithms for truck fleet assignment and scheduling under real-time information. Transportation Research Record, 1999

1999
[17]

Optimization of Truck–Cargo Matching for LTL Logistics Hubs, Com- puters, Materials & Continua, 2024

Chen, X. Optimization of Truck–Cargo Matching for LTL Logistics Hubs, Com- puters, Materials & Continua, 2024

2024
[18]

Y . Li, M. Mohammadi, X. Zhang, Y . Lan, and W. van Jaarsveld, Integrated trucks assignment and scheduling problem with mixed service mode docks: A Q-learning based adaptive large neighborhood search algorithm, arXiv preprint arXiv:2412.09090, 2024

work page arXiv 2024
[19]

Tang, Optimization of truck–cargo online matching for the less-than-truckload (LTL) logistics, Mathematics, 2024

W. Tang, Optimization of truck–cargo online matching for the less-than-truckload (LTL) logistics, Mathematics, 2024

2024
[20]

Google Vertex AI, Google Cloud
[21]

Manning and A

C. Manning and A. Gupta, Understanding Precision and Recall Trade-offs in Bi- nary Classification, Journal of Machine Learning Research, vol. 21, no. 101, pp. 1–15, 2020

2020
[22]

Open Source Routing Machine (OSRM), OSRM Backend - Server API, 2023. 16

2023

[1] [1]

Tsolaki, N

K. Tsolaki, N. Papakostas, S. Chondros, and G. Chryssolouris, Utilizing machine learning on freight transportation and logistics: A review, Transp. Res. Interdis- cip. Perspect., vol. 13, p. 100520, Mar. 2022

2022

[2] [2]

27, 2018

Uber Engineering, H3: Uber’s Hexagonal Hierarchical Spatial Index, Uber Engi- neering Blog, Jun. 27, 2018

2018

[3] [3]

Truck Appointment Scheduling: A Re- view of Models and Algorithms

Gracia, M.D.; Mar-Ortiz, J.; Vargas, M. Truck Appointment Scheduling: A Re- view of Models and Algorithms. Mathematics 2025, 13, 503

2025

[4] [4]

S. Sani, H. Xia, J. Milisavljevic-Syed, and K. Salonitis, Supply Chain 4.0: A ma- chine learning-based Bayesian-optimized LightGBM model for predicting supply chain risk, Machines, vol. 11, no. 9, p. 888, 2023

2023

[5] [5]

Limon Barua, Bo Zou, Yan Zhou, Machine learning for international freight transportation management: A comprehensive review, Research in Transporta- tion Business & Management, V olume 34, 2020, 100453, ISSN 2210-5395

2020

[6] [6]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, LightGBM: A Highly Efficient Gradient Boosting Decision Tree, in Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pp. 3146–3154, 2017

2017

[7] [7]

Chen and C

T. Chen and C. Guestrin, XGBoost: A Scalable Tree Boosting System, in Pro- ceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), pp. 785–794, 2016

2016

[8] [8]

Cortes and V

C. Cortes and V . Vapnik, Support-vector networks, Machine Learning, vol. 20, no. 3, pp. 273–297, 1995

1995

[9] [9]

H3 Contributors, Tables of Cell Statistics Across Resolutions, H3 Core Library Documentation

[10] [10]

Redis, Redis: In-memory data structure store, used as a database, cache, and message broker

[11] [11]

Microsoft, Shadow Testing, Code with Engineering Playbook, 2024

2024

[12] [12]

C. J. Gordon, Recall-precision trade-off: A derivation, J. Amer. Soc. Inf. Sci., vol. 40, no. 3, pp. 145–150, May 1989. 15

1989

[13] [13]

S. A. Alvarez, An exact analytical relation among recall, precision, and clas- sification accuracy, Information Processing & Management, vol. 38, no. 3, pp. 355-366, May 2002

2002

[14] [14]

Department of Transportation, 2023

Federal Motor Carrier Safety Administration (FMCSA), Electronic Logging De- vices (ELDs), U.S. Department of Transportation, 2023

2023

[15] [15]

Ahlers and S

D. Ahlers and S. Boll, On the accuracy of online geocoders, OFFIS Institute for Information Technology and University of Oldenburg, Germany, 2024

2024

[16] [16]

On-line algorithms for truck fleet assignment and scheduling under real-time information

Barnhart, C., Krishnan, N., Kim, M. On-line algorithms for truck fleet assignment and scheduling under real-time information. Transportation Research Record, 1999

1999

[17] [17]

Optimization of Truck–Cargo Matching for LTL Logistics Hubs, Com- puters, Materials & Continua, 2024

Chen, X. Optimization of Truck–Cargo Matching for LTL Logistics Hubs, Com- puters, Materials & Continua, 2024

2024

[18] [18]

Y . Li, M. Mohammadi, X. Zhang, Y . Lan, and W. van Jaarsveld, Integrated trucks assignment and scheduling problem with mixed service mode docks: A Q-learning based adaptive large neighborhood search algorithm, arXiv preprint arXiv:2412.09090, 2024

work page arXiv 2024

[19] [19]

Tang, Optimization of truck–cargo online matching for the less-than-truckload (LTL) logistics, Mathematics, 2024

W. Tang, Optimization of truck–cargo online matching for the less-than-truckload (LTL) logistics, Mathematics, 2024

2024

[20] [20]

Google Vertex AI, Google Cloud

[21] [21]

Manning and A

C. Manning and A. Gupta, Understanding Precision and Recall Trade-offs in Bi- nary Classification, Journal of Machine Learning Research, vol. 21, no. 101, pp. 1–15, 2020

2020

[22] [22]

Open Source Routing Machine (OSRM), OSRM Backend - Server API, 2023. 16

2023