CADENCE: Predicting Realized MAPF Execution Time Beyond Sum of Costs

Abhishek S; Badrikanath Praharaj; Sreeram MV

arxiv: 2606.04746 · v2 · pith:FI4O2Z7Inew · submitted 2026-06-03 · 💻 cs.RO

CADENCE: Predicting Realized MAPF Execution Time Beyond Sum of Costs

Abhishek S , Badrikanath Praharaj , Sreeram MV This is my paper

Pith reviewed 2026-06-28 06:26 UTC · model grok-4.3

classification 💻 cs.RO

keywords MAPFexecution time predictionmulti-agent path findingrobot team planninghardware evaluationprimitive motion burdensum of costs

0 comments

The pith

Primitive motion burden in MAPF plans predicts real execution time better than Sum of Costs alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests which pre-execution features from multi-agent path plans best forecast actual wall-clock completion time for robot teams. Standard metrics like Sum of Costs capture some but not all of the gap between planned and realized performance. Across 480 hardware trials with seven differential-drive robots, adding primitive motion burden features such as turns and start-stop actions cuts prediction error by roughly half compared with Sum of Costs models alone. Coordination features that track robot interactions add smaller and less consistent gains. The result suggests planners can already see much of the execution penalty in the offline plan before any robot moves.

Core claim

Primitive motion burden supplies the strongest additional signal beyond Sum of Costs for predicting realized MAPF execution time. In both scenario-held-out ridge regression and trial-level mixed-effects models, these features reduce mean absolute error by 48.6 to 59.8 percent and root mean squared error by 44.2 to 61.4 percent relative to Sum-of-Costs-only baselines. Interaction-aware coordination features produce smaller, less uniform improvements that appear most clearly in the mixed-effects analysis.

What carries the argument

Primitive motion burden, which counts basic motion demands in a plan such as makespan, number of turns, consecutive moves, and start-stop transitions.

If this is right

Plans that minimize primitive motion burden will tend to finish sooner in hardware than plans with equal Sum of Costs but higher motion burden.
Offline planners can use these features to rank candidate solutions by expected real-world duration before any robot begins moving.
Sum of Costs remains informative but incomplete once motion-burden terms are added to the prediction model.
Interaction coordination features offer modest extra value mainly when modeling variation across repeated trials of the same plan.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Warehouses could embed motion-burden terms directly into MAPF cost functions to bias search toward plans that run faster on the floor.
The same feature set might transfer to other continuous-motion robots if their primitive actions share similar start-stop and turning costs.
Dynamic environments with moving obstacles could increase the relative value of the coordination features that showed smaller gains in the static testbed.

Load-bearing premise

The fifteen scenarios and seven-by-seven differential-drive workcell are representative enough that the identified predictive features will generalize to other robot teams and environments.

What would settle it

Replicate the 480-trial corpus on a different robot type or in a substantially larger or more dynamic workspace and measure whether primitive motion burden still produces comparable error reductions over Sum of Costs.

Figures

Figures reproduced from arXiv: 2606.04746 by Abhishek S, Badrikanath Praharaj, Sreeram MV.

**Figure 1.** Figure 1: Hardware OptiTrack trace. Color-coded trajectories for Robots 1–7 on a representative 7 × 7 bottleneck execution. Circles mark starts; squares mark goals. This question matters because physical execution introduces structure that scalar plan costs suppress. Robots must preserve the same-agent action order, satisfy cross-robot precedence constraints, pass through locally contested spaces, and often incur st… view at source ↗

**Figure 2.** Figure 2: Hardware platform and scenario-family overview. This paper makes three contributions: • A real hardware execution time corpus for offline MAPF plans on a fixed multi-robot platform, built with an interaction-regime-stratified scenario library. • A disciplined three-layer decomposition of plan-side predictors of realized execution time. We separate plan descriptors into three analytically distinct layers: … view at source ↗

**Figure 3.** Figure 3: Executor dependency-wait diagnostic. In one representative hardware trial, dependency-gated waiting accounts for a visible share of several robots’ wall-clock finish times. This illustrates the execution mechanism rather than a corpus-level result. B. Pre-execution feature extraction All predictors are computed before hardware execution from the MAPF plan alone; no hardware timing outcome enters the extra… view at source ↗

**Figure 4.** Figure 4: SoC-only model. Each point is a plan mean; the dashed line is the SoC-only model. The vertical spread shows why SoC alone is not a complete predictor of hardware time. In the ridge model, SoC reduces held-out MAE from [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Multi-Agent Path Finding (MAPF) algorithms are increasingly used to plan motion for robot teams in industrial warehouses and robotic shared workspaces, but standard MAPF algorithm evaluation metrics, such as Sum of Costs (SoC), makespan, and planner runtime, can obscure how planner choices translate into realistic execution performance. We present CADENCE (Coordination and Action-Driven Estimation for Networked Continuous Execution), a hardware study of this evaluation gap on a fixed 7 by 7 workcell with seven differential drive robots, asking which features available before execution can best predict final wall-clock completion time. We compare SoC, total planned travel cost, primitive motion burden (how much basic motion the plan requires, such as makespan, turns, consecutive moves, and start-stop transitions), and interaction aware coordination structure (how much inter-robot coordination the plan induces, such as dependency links, interacting robot pairs, dependency depth, and crowding exposure). To test this, we generate 120 plans across 15 scenarios -- 5 Empty, 5 Medium Random, and 5 Bottleneck and execute each plan four times, yielding a 480 trial hardware corpus. Using both a scenario-held -- out ridge model and a trial-level mixed-effects model, we find that SoC alone is informative but incomplete, while primitive motion burden gives the strongest improvement, reducing held out error by about 48.6%-59.8% in MAE and 44.2%-61.4% in RMSE relative to SoC-only models. Interaction-aware coordination features add smaller, less uniform gains, most clearly in the mixed-effects analysis. Across both models and uncertainty checks, primitive motion burden is the most reliable additional signal beyond SoC, suggesting that much of the execution time gap is already visible in the offline plan before any robot starts moving.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Primitive motion burden features cut prediction error for real MAPF execution time by roughly half over SoC in this 7-robot hardware study, but the single workcell and lack of cross-type tests limit how far the result travels.

read the letter

The main thing to know is that adding counts of turns, start-stops, and consecutive moves to Sum of Costs improves held-out MAE and RMSE by 48-60% on their 480-trial dataset, while the coordination features (dependency depth, crowding) add smaller and less consistent gains. The paper ran 120 plans across 15 scenarios in three classes on seven differential-drive robots in a fixed 7x7 cell, then fit a scenario-held-out ridge regressor and a mixed-effects model at the trial level.

What stands out is the hardware corpus itself. Most MAPF papers stop at simulated costs; here they actually executed the plans four times each and measured wall-clock time. That gives a concrete signal that SoC alone misses a lot of the real duration, and the motion-burden features capture more of it than the interaction terms in this setup.

The soft spot is exactly the one the stress-test flags. All 15 scenarios share the same robots, same grid size, same motion-primitive library, and same workcell geometry. The held-out splits therefore cannot reveal whether the motion-burden advantage is an artifact of how this particular hardware translates turns and stops into time. No train-on-Empty-Medium/test-on-Bottleneck split or any other-robot experiment is mentioned, so the claim that primitive motion burden is “the most reliable additional signal” rests on an untested transfer assumption. Feature definitions and exact fitting details are also thin in the abstract, though the full text presumably supplies them.

This paper is for people who deploy MAPF planners on real warehouse robots and want better offline predictors of completion time. A reader already working on industrial multi-robot systems could pick up the feature list and test it on their own hardware without much trouble. It is not reshaping planning theory, but the empirical comparison is honest and the data collection is non-trivial.

I would send it to peer review. The hardware results are worth referee scrutiny even if the generalization questions are obvious; a revised version with at least one cross-type or cross-robot check would be stronger, but the current version is already a usable data point.

Referee Report

2 major / 2 minor

Summary. The paper introduces CADENCE, an empirical hardware study examining the gap between standard MAPF metrics (primarily Sum of Costs) and realized wall-clock execution time on a 7x7 differential-drive robot workcell. Using 120 plans across 15 scenarios (5 Empty, 5 Medium Random, 5 Bottleneck) executed in 480 trials, the authors compare SoC against primitive motion burden features (makespan, turns, consecutive moves, start-stop transitions) and interaction-aware coordination features (dependency links, interacting pairs, dependency depth, crowding). Scenario-held-out ridge regression and trial-level mixed-effects models show that adding primitive motion burden yields the largest gains, reducing held-out MAE by 48.6-59.8% and RMSE by 44.2-61.4% relative to SoC-only baselines, while coordination features provide smaller, less consistent benefits.

Significance. If the quantitative improvements hold under broader validation, the work demonstrates that offline plan features—especially primitive motion burden—capture a substantial fraction of execution-time variance that SoC misses, with direct implications for MAPF algorithm selection and evaluation in warehouse robotics. The 480-trial hardware corpus and dual-model approach (ridge + mixed-effects) constitute a reproducible empirical contribution that moves beyond purely simulated SoC analysis.

major comments (2)

[Evaluation / Results] Evaluation section (scenario-held-out results): the reported 48.6-59.8% MAE reduction is obtained from within-type held-out splits across only three narrow scenario classes sharing identical robot kinematics, workcell geometry, and motion-primitive library. No cross-type ablation (train on Empty+Medium, test on Bottleneck) is reported, so the claim that primitive motion burden is “the most reliable additional signal” rests on an untested transfer assumption; any systematic hardware-specific bias in turn or start-stop timing will be learned and inflate apparent predictive power.
[Methods] Methods (model fitting and feature definitions): the abstract and results cite precise error reductions from ridge and mixed-effects models, yet the manuscript provides insufficient detail on exact feature vector construction for “primitive motion burden,” ridge regularization choice, or statistical significance testing of the improvement; without these, the central quantitative claim cannot be independently verified or reproduced from the 120-plan corpus.

minor comments (2)

[Experimental Setup] Table or figure presenting the 480-trial corpus should explicitly list the number of plans per scenario type and the exact held-out partition sizes to clarify the scenario-held-out protocol.
[Feature Definitions] Notation for interaction-aware features (e.g., “dependency depth”) should be defined with a short equation or pseudocode in the feature section to avoid ambiguity when readers attempt to replicate the coordination-structure model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below with clarifications and indicate planned revisions where the manuscript can be strengthened.

read point-by-point responses

Referee: [Evaluation / Results] Evaluation section (scenario-held-out results): the reported 48.6-59.8% MAE reduction is obtained from within-type held-out splits across only three narrow scenario classes sharing identical robot kinematics, workcell geometry, and motion-primitive library. No cross-type ablation (train on Empty+Medium, test on Bottleneck) is reported, so the claim that primitive motion burden is “the most reliable additional signal” rests on an untested transfer assumption; any systematic hardware-specific bias in turn or start-stop timing will be learned and inflate apparent predictive power.

Authors: The scenario-held-out evaluation tests generalization to unseen scenarios within each of the three classes (Empty, Medium Random, Bottleneck), which were selected to span a representative range of congestion levels in the fixed workcell. The consistent error reductions across these held-out scenarios support that primitive motion burden captures execution variance beyond SoC in this setting. We do not claim cross-environment transfer and agree that hardware-specific biases could influence results; the study scope is limited to this robot platform and geometry. We will revise the manuscript to explicitly delimit the generalization claims and add a limitations paragraph discussing potential biases and the value of future cross-type or cross-platform validation. revision: partial
Referee: [Methods] Methods (model fitting and feature definitions): the abstract and results cite precise error reductions from ridge and mixed-effects models, yet the manuscript provides insufficient detail on exact feature vector construction for “primitive motion burden,” ridge regularization choice, or statistical significance testing of the improvement; without these, the central quantitative claim cannot be independently verified or reproduced from the 120-plan corpus.

Authors: We agree that additional methodological detail is required for reproducibility. The revised manuscript will expand the Methods section to include: exact mathematical definitions and computation of each primitive motion burden feature (makespan, number of turns, consecutive moves, start-stop transitions); the ridge regression implementation details including how the regularization parameter was selected (e.g., via nested cross-validation on the training folds); and the statistical tests performed to assess significance of the MAE/RMSE improvements (paired tests across scenarios). These changes will allow independent verification from the released plan corpus. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical regression on held-out hardware data with no derivation reducing to inputs by construction

full rationale

The paper is an empirical hardware study that collects 480 execution trials across 15 scenarios, extracts plan features (SoC, primitive motion burden, coordination metrics), and fits ridge and mixed-effects regression models to predict wall-clock time. Evaluation uses scenario-held-out splits. No equations, ansatzes, or self-citations are presented that would make any reported improvement (e.g., 48.6-59.8% MAE reduction) equivalent to the input data by construction. The approach is standard supervised learning on external measurements and remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, model specifications, or derivations from which free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5870 in / 1146 out tokens · 35183 ms · 2026-06-28T06:26:53.595044+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 1 canonical work pages

[1]

Multi-agent path finding with delay probabilities,

H. Ma, T. K. S. Kumar, and S. Koenig, “Multi-agent path finding with delay probabilities,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017

2017
[2]

Persistent and robust execution of mapf schedules in warehouses,

W. H ¨onig, S. Kiesel, A. Tinka, J. W. Durham, and N. Ayanian, “Persistent and robust execution of mapf schedules in warehouses,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1125–1131, 2019

2019
[3]

Bidirectional temporal plan graph: Enabling switchable passing orders for more efficient multi-agent path finding plan execution,

Y . Su, R. Veerapaneni, and J. Li, “Bidirectional temporal plan graph: Enabling switchable passing orders for more efficient multi-agent path finding plan execution,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 559–17 566

2024
[4]

Multi-agent pathfinding: Definitions, variants, and benchmarks,

R. Stern, N. Sturtevant, A. Felner, S. Koenig, H. Ma, T. Walker, J. Li, D. Atzmon, L. Cohen, T. K. S. Kumar, R. Bartak, and E. Boyarski, “Multi-agent pathfinding: Definitions, variants, and benchmarks,” in Proceedings of the Annual Symposium on Combinatorial Search, 2019

2019
[5]

Conflict-based search for optimal multi-agent pathfinding,

G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-based search for optimal multi-agent pathfinding,”Artificial Intelligence, vol. 219, pp. 40–66, 2015

2015
[6]

Coordinating hundreds of cooperative, autonomous vehicles in warehouses,

P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,”AI Magazine, vol. 29, no. 1, pp. 9–20, 2008

2008
[7]

Analyzing planner design trade-offs for mapf under adg-based realistic execution,

J. Yan, Z. Li, W. Kang, S. F. Smith, and J. Li, “Analyzing planner design trade-offs for mapf under adg-based realistic execution,” 2025

2025
[8]

Which mapf model works best for automated warehousing?

S. Varambally, J. Li, and S. Koenig, “Which mapf model works best for automated warehousing?”Proceedings of the International Symposium on Combinatorial Search, vol. 15, no. 1, pp. 190–198, 2022

2022
[9]

Traffic flow optimisation for lifelong multi-agent path finding,

Z. Chen, D. Harabor, J. Li, and P. J. Stuckey, “Traffic flow optimisation for lifelong multi-agent path finding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 18, 2024, pp. 20 674– 20 682

2024
[10]

P3gasus: Pre-planned path execution graphs for multi-agent systems at ultra-large scale,

T. Duhan, C. He, and G. Sartoretti, “P3gasus: Pre-planned path execution graphs for multi-agent systems at ultra-large scale,”IEEE Robotics and Automation Letters, vol. 11, no. 2, pp. 1274–1281, 2026

2026
[11]

Improved heuristics for multi-agent path finding with conflict-based search,

J. Li, A. Felner, E. Boyarski, H. Ma, and S. Koenig, “Improved heuristics for multi-agent path finding with conflict-based search,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, pp. 442–449

2019
[12]

Eecbs: A bounded-suboptimal search for multi-agent path finding,

J. Li, W. Ruml, and S. Koenig, “Eecbs: A bounded-suboptimal search for multi-agent path finding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, 2021, pp. 12 353–12 362

2021
[13]

Lacam: Search- based algorithm for quick multi-agent pathfinding,

K. Okumura, M. Machida, X. Defago, and Y . Tamura, “Lacam: Search- based algorithm for quick multi-agent pathfinding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, 2023, pp. 11 655–11 662

2023
[14]

Engineering lacam*: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,

K. Okumura, R. Kuroiwa, and Y . Tamura, “Engineering lacam*: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,” inIn- ternational Conference on Autonomous Agents and Multiagent Systems, 2024

2024
[15]

Engineering LaCAM: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,

K. Okumura, “Engineering LaCAM: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,” 2024, presented at AAMAS-24. [Online]. Available: https://arxiv.org/abs/2308.04292

work page arXiv 2024

[1] [1]

Multi-agent path finding with delay probabilities,

H. Ma, T. K. S. Kumar, and S. Koenig, “Multi-agent path finding with delay probabilities,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017

2017

[2] [2]

Persistent and robust execution of mapf schedules in warehouses,

W. H ¨onig, S. Kiesel, A. Tinka, J. W. Durham, and N. Ayanian, “Persistent and robust execution of mapf schedules in warehouses,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1125–1131, 2019

2019

[3] [3]

Bidirectional temporal plan graph: Enabling switchable passing orders for more efficient multi-agent path finding plan execution,

Y . Su, R. Veerapaneni, and J. Li, “Bidirectional temporal plan graph: Enabling switchable passing orders for more efficient multi-agent path finding plan execution,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 559–17 566

2024

[4] [4]

Multi-agent pathfinding: Definitions, variants, and benchmarks,

R. Stern, N. Sturtevant, A. Felner, S. Koenig, H. Ma, T. Walker, J. Li, D. Atzmon, L. Cohen, T. K. S. Kumar, R. Bartak, and E. Boyarski, “Multi-agent pathfinding: Definitions, variants, and benchmarks,” in Proceedings of the Annual Symposium on Combinatorial Search, 2019

2019

[5] [5]

Conflict-based search for optimal multi-agent pathfinding,

G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-based search for optimal multi-agent pathfinding,”Artificial Intelligence, vol. 219, pp. 40–66, 2015

2015

[6] [6]

Coordinating hundreds of cooperative, autonomous vehicles in warehouses,

P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,”AI Magazine, vol. 29, no. 1, pp. 9–20, 2008

2008

[7] [7]

Analyzing planner design trade-offs for mapf under adg-based realistic execution,

J. Yan, Z. Li, W. Kang, S. F. Smith, and J. Li, “Analyzing planner design trade-offs for mapf under adg-based realistic execution,” 2025

2025

[8] [8]

Which mapf model works best for automated warehousing?

S. Varambally, J. Li, and S. Koenig, “Which mapf model works best for automated warehousing?”Proceedings of the International Symposium on Combinatorial Search, vol. 15, no. 1, pp. 190–198, 2022

2022

[9] [9]

Traffic flow optimisation for lifelong multi-agent path finding,

Z. Chen, D. Harabor, J. Li, and P. J. Stuckey, “Traffic flow optimisation for lifelong multi-agent path finding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 18, 2024, pp. 20 674– 20 682

2024

[10] [10]

P3gasus: Pre-planned path execution graphs for multi-agent systems at ultra-large scale,

T. Duhan, C. He, and G. Sartoretti, “P3gasus: Pre-planned path execution graphs for multi-agent systems at ultra-large scale,”IEEE Robotics and Automation Letters, vol. 11, no. 2, pp. 1274–1281, 2026

2026

[11] [11]

Improved heuristics for multi-agent path finding with conflict-based search,

J. Li, A. Felner, E. Boyarski, H. Ma, and S. Koenig, “Improved heuristics for multi-agent path finding with conflict-based search,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, pp. 442–449

2019

[12] [12]

Eecbs: A bounded-suboptimal search for multi-agent path finding,

J. Li, W. Ruml, and S. Koenig, “Eecbs: A bounded-suboptimal search for multi-agent path finding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, 2021, pp. 12 353–12 362

2021

[13] [13]

Lacam: Search- based algorithm for quick multi-agent pathfinding,

K. Okumura, M. Machida, X. Defago, and Y . Tamura, “Lacam: Search- based algorithm for quick multi-agent pathfinding,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, 2023, pp. 11 655–11 662

2023

[14] [14]

Engineering lacam*: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,

K. Okumura, R. Kuroiwa, and Y . Tamura, “Engineering lacam*: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,” inIn- ternational Conference on Autonomous Agents and Multiagent Systems, 2024

2024

[15] [15]

Engineering LaCAM: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,

K. Okumura, “Engineering LaCAM: Towards real-time, large-scale, and near-optimal multi-agent pathfinding,” 2024, presented at AAMAS-24. [Online]. Available: https://arxiv.org/abs/2308.04292

work page arXiv 2024