CADENCE: Predicting Realized MAPF Execution Time Beyond Sum of Costs
Pith reviewed 2026-06-28 06:26 UTC · model grok-4.3
The pith
Primitive motion burden in MAPF plans predicts real execution time better than Sum of Costs alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Primitive motion burden supplies the strongest additional signal beyond Sum of Costs for predicting realized MAPF execution time. In both scenario-held-out ridge regression and trial-level mixed-effects models, these features reduce mean absolute error by 48.6 to 59.8 percent and root mean squared error by 44.2 to 61.4 percent relative to Sum-of-Costs-only baselines. Interaction-aware coordination features produce smaller, less uniform improvements that appear most clearly in the mixed-effects analysis.
What carries the argument
Primitive motion burden, which counts basic motion demands in a plan such as makespan, number of turns, consecutive moves, and start-stop transitions.
If this is right
- Plans that minimize primitive motion burden will tend to finish sooner in hardware than plans with equal Sum of Costs but higher motion burden.
- Offline planners can use these features to rank candidate solutions by expected real-world duration before any robot begins moving.
- Sum of Costs remains informative but incomplete once motion-burden terms are added to the prediction model.
- Interaction coordination features offer modest extra value mainly when modeling variation across repeated trials of the same plan.
Where Pith is reading between the lines
- Warehouses could embed motion-burden terms directly into MAPF cost functions to bias search toward plans that run faster on the floor.
- The same feature set might transfer to other continuous-motion robots if their primitive actions share similar start-stop and turning costs.
- Dynamic environments with moving obstacles could increase the relative value of the coordination features that showed smaller gains in the static testbed.
Load-bearing premise
The fifteen scenarios and seven-by-seven differential-drive workcell are representative enough that the identified predictive features will generalize to other robot teams and environments.
What would settle it
Replicate the 480-trial corpus on a different robot type or in a substantially larger or more dynamic workspace and measure whether primitive motion burden still produces comparable error reductions over Sum of Costs.
Figures
read the original abstract
Multi-Agent Path Finding (MAPF) algorithms are increasingly used to plan motion for robot teams in industrial warehouses and robotic shared workspaces, but standard MAPF algorithm evaluation metrics, such as Sum of Costs (SoC), makespan, and planner runtime, can obscure how planner choices translate into realistic execution performance. We present CADENCE (Coordination and Action-Driven Estimation for Networked Continuous Execution), a hardware study of this evaluation gap on a fixed 7 by 7 workcell with seven differential drive robots, asking which features available before execution can best predict final wall-clock completion time. We compare SoC, total planned travel cost, primitive motion burden (how much basic motion the plan requires, such as makespan, turns, consecutive moves, and start-stop transitions), and interaction aware coordination structure (how much inter-robot coordination the plan induces, such as dependency links, interacting robot pairs, dependency depth, and crowding exposure). To test this, we generate 120 plans across 15 scenarios -- 5 Empty, 5 Medium Random, and 5 Bottleneck and execute each plan four times, yielding a 480 trial hardware corpus. Using both a scenario-held -- out ridge model and a trial-level mixed-effects model, we find that SoC alone is informative but incomplete, while primitive motion burden gives the strongest improvement, reducing held out error by about 48.6%-59.8% in MAE and 44.2%-61.4% in RMSE relative to SoC-only models. Interaction-aware coordination features add smaller, less uniform gains, most clearly in the mixed-effects analysis. Across both models and uncertainty checks, primitive motion burden is the most reliable additional signal beyond SoC, suggesting that much of the execution time gap is already visible in the offline plan before any robot starts moving.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CADENCE, an empirical hardware study examining the gap between standard MAPF metrics (primarily Sum of Costs) and realized wall-clock execution time on a 7x7 differential-drive robot workcell. Using 120 plans across 15 scenarios (5 Empty, 5 Medium Random, 5 Bottleneck) executed in 480 trials, the authors compare SoC against primitive motion burden features (makespan, turns, consecutive moves, start-stop transitions) and interaction-aware coordination features (dependency links, interacting pairs, dependency depth, crowding). Scenario-held-out ridge regression and trial-level mixed-effects models show that adding primitive motion burden yields the largest gains, reducing held-out MAE by 48.6-59.8% and RMSE by 44.2-61.4% relative to SoC-only baselines, while coordination features provide smaller, less consistent benefits.
Significance. If the quantitative improvements hold under broader validation, the work demonstrates that offline plan features—especially primitive motion burden—capture a substantial fraction of execution-time variance that SoC misses, with direct implications for MAPF algorithm selection and evaluation in warehouse robotics. The 480-trial hardware corpus and dual-model approach (ridge + mixed-effects) constitute a reproducible empirical contribution that moves beyond purely simulated SoC analysis.
major comments (2)
- [Evaluation / Results] Evaluation section (scenario-held-out results): the reported 48.6-59.8% MAE reduction is obtained from within-type held-out splits across only three narrow scenario classes sharing identical robot kinematics, workcell geometry, and motion-primitive library. No cross-type ablation (train on Empty+Medium, test on Bottleneck) is reported, so the claim that primitive motion burden is “the most reliable additional signal” rests on an untested transfer assumption; any systematic hardware-specific bias in turn or start-stop timing will be learned and inflate apparent predictive power.
- [Methods] Methods (model fitting and feature definitions): the abstract and results cite precise error reductions from ridge and mixed-effects models, yet the manuscript provides insufficient detail on exact feature vector construction for “primitive motion burden,” ridge regularization choice, or statistical significance testing of the improvement; without these, the central quantitative claim cannot be independently verified or reproduced from the 120-plan corpus.
minor comments (2)
- [Experimental Setup] Table or figure presenting the 480-trial corpus should explicitly list the number of plans per scenario type and the exact held-out partition sizes to clarify the scenario-held-out protocol.
- [Feature Definitions] Notation for interaction-aware features (e.g., “dependency depth”) should be defined with a short equation or pseudocode in the feature section to avoid ambiguity when readers attempt to replicate the coordination-structure model.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below with clarifications and indicate planned revisions where the manuscript can be strengthened.
read point-by-point responses
-
Referee: [Evaluation / Results] Evaluation section (scenario-held-out results): the reported 48.6-59.8% MAE reduction is obtained from within-type held-out splits across only three narrow scenario classes sharing identical robot kinematics, workcell geometry, and motion-primitive library. No cross-type ablation (train on Empty+Medium, test on Bottleneck) is reported, so the claim that primitive motion burden is “the most reliable additional signal” rests on an untested transfer assumption; any systematic hardware-specific bias in turn or start-stop timing will be learned and inflate apparent predictive power.
Authors: The scenario-held-out evaluation tests generalization to unseen scenarios within each of the three classes (Empty, Medium Random, Bottleneck), which were selected to span a representative range of congestion levels in the fixed workcell. The consistent error reductions across these held-out scenarios support that primitive motion burden captures execution variance beyond SoC in this setting. We do not claim cross-environment transfer and agree that hardware-specific biases could influence results; the study scope is limited to this robot platform and geometry. We will revise the manuscript to explicitly delimit the generalization claims and add a limitations paragraph discussing potential biases and the value of future cross-type or cross-platform validation. revision: partial
-
Referee: [Methods] Methods (model fitting and feature definitions): the abstract and results cite precise error reductions from ridge and mixed-effects models, yet the manuscript provides insufficient detail on exact feature vector construction for “primitive motion burden,” ridge regularization choice, or statistical significance testing of the improvement; without these, the central quantitative claim cannot be independently verified or reproduced from the 120-plan corpus.
Authors: We agree that additional methodological detail is required for reproducibility. The revised manuscript will expand the Methods section to include: exact mathematical definitions and computation of each primitive motion burden feature (makespan, number of turns, consecutive moves, start-stop transitions); the ridge regression implementation details including how the regularization parameter was selected (e.g., via nested cross-validation on the training folds); and the statistical tests performed to assess significance of the MAE/RMSE improvements (paired tests across scenarios). These changes will allow independent verification from the released plan corpus. revision: yes
Circularity Check
No circularity: empirical regression on held-out hardware data with no derivation reducing to inputs by construction
full rationale
The paper is an empirical hardware study that collects 480 execution trials across 15 scenarios, extracts plan features (SoC, primitive motion burden, coordination metrics), and fits ridge and mixed-effects regression models to predict wall-clock time. Evaluation uses scenario-held-out splits. No equations, ansatzes, or self-citations are presented that would make any reported improvement (e.g., 48.6-59.8% MAE reduction) equivalent to the input data by construction. The approach is standard supervised learning on external measurements and remains self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Multi-agent path finding with delay probabilities,
H. Ma, T. K. S. Kumar, and S. Koenig, “Multi-agent path finding with delay probabilities,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017
2017
-
[2]
Persistent and robust execution of mapf schedules in warehouses,
W. H ¨onig, S. Kiesel, A. Tinka, J. W. Durham, and N. Ayanian, “Persistent and robust execution of mapf schedules in warehouses,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1125–1131, 2019
2019
-
[3]
Bidirectional temporal plan graph: Enabling switchable passing orders for more efficient multi-agent path finding plan execution,
Y . Su, R. Veerapaneni, and J. Li, “Bidirectional temporal plan graph: Enabling switchable passing orders for more efficient multi-agent path finding plan execution,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 559–17 566
2024
-
[4]
Multi-agent pathfinding: Definitions, variants, and benchmarks,
R. Stern, N. Sturtevant, A. Felner, S. Koenig, H. Ma, T. Walker, J. Li, D. Atzmon, L. Cohen, T. K. S. Kumar, R. Bartak, and E. Boyarski, “Multi-agent pathfinding: Definitions, variants, and benchmarks,” in Proceedings of the Annual Symposium on Combinatorial Search, 2019
2019
-
[5]
Conflict-based search for optimal multi-agent pathfinding,
G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-based search for optimal multi-agent pathfinding,”Artificial Intelligence, vol. 219, pp. 40–66, 2015
2015
-
[6]
Coordinating hundreds of cooperative, autonomous vehicles in warehouses,
P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,”AI Magazine, vol. 29, no. 1, pp. 9–20, 2008
2008
-
[7]
Analyzing planner design trade-offs for mapf under adg-based realistic execution,
J. Yan, Z. Li, W. Kang, S. F. Smith, and J. Li, “Analyzing planner design trade-offs for mapf under adg-based realistic execution,” 2025
2025
-
[8]
Which mapf model works best for automated warehousing?
S. Varambally, J. Li, and S. Koenig, “Which mapf model works best for automated warehousing?”Proceedings of the International Symposium on Combinatorial Search, vol. 15, no. 1, pp. 190–198, 2022
2022
-
[9]
Traffic flow optimisation for lifelong multi-agent path finding,
Z. Chen, D. Harabor, J. Li, and P. J. Stuckey, “Traffic flow optimisation for lifelong multi-agent path finding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 18, 2024, pp. 20 674– 20 682
2024
-
[10]
P3gasus: Pre-planned path execution graphs for multi-agent systems at ultra-large scale,
T. Duhan, C. He, and G. Sartoretti, “P3gasus: Pre-planned path execution graphs for multi-agent systems at ultra-large scale,”IEEE Robotics and Automation Letters, vol. 11, no. 2, pp. 1274–1281, 2026
2026
-
[11]
Improved heuristics for multi-agent path finding with conflict-based search,
J. Li, A. Felner, E. Boyarski, H. Ma, and S. Koenig, “Improved heuristics for multi-agent path finding with conflict-based search,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, pp. 442–449
2019
-
[12]
Eecbs: A bounded-suboptimal search for multi-agent path finding,
J. Li, W. Ruml, and S. Koenig, “Eecbs: A bounded-suboptimal search for multi-agent path finding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, 2021, pp. 12 353–12 362
2021
-
[13]
Lacam: Search- based algorithm for quick multi-agent pathfinding,
K. Okumura, M. Machida, X. Defago, and Y . Tamura, “Lacam: Search- based algorithm for quick multi-agent pathfinding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, 2023, pp. 11 655–11 662
2023
-
[14]
Engineering lacam*: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,
K. Okumura, R. Kuroiwa, and Y . Tamura, “Engineering lacam*: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,” inIn- ternational Conference on Autonomous Agents and Multiagent Systems, 2024
2024
-
[15]
Engineering LaCAM: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,
K. Okumura, “Engineering LaCAM: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,” 2024, presented at AAMAS-24. [Online]. Available: https://arxiv.org/abs/2308.04292
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.