From Optimization to Prediction: Transformer-Based Path-Flow Estimation to the Traffic Assignment Problem

Alexander Skabardonis; Mostafa Ameli; Sulthana Shams; Van Anh Le

arxiv: 2510.19889 · v2 · submitted 2025-10-22 · 💻 cs.LG · cs.AI· math.OC

From Optimization to Prediction: Transformer-Based Path-Flow Estimation to the Traffic Assignment Problem

Mostafa Ameli , Sulthana Shams , Van Anh Le , Alexander Skabardonis This is my paper

Pith reviewed 2026-05-18 04:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.OC

keywords traffic assignment problempath flow estimationtransformer neural networkdeep learningequilibrium predictiontransportation networksdata-driven methodsmulti-class traffic

0 comments

The pith

A Transformer neural network predicts equilibrium path flows for traffic assignment problems orders of magnitude faster than optimization solvers while adapting to new demands and network changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a Transformer-based deep learning model can directly output the path-level flows that satisfy user equilibrium without running iterative mathematical programs. Traditional solvers become too slow for large networks because complexity grows with the number of origin-destination pairs, limiting how many scenarios planners can test. The data-driven approach trains on equilibrium solutions from conventional methods and then generalizes to unseen demands and modified networks, focusing on path flows to capture detailed correlations between trips that link-level models miss. If this holds, traffic assignment becomes fast enough for real-time what-if analyses and multi-class networks without repeated expensive recalculations.

Core claim

The central claim is that a Transformer architecture trained on equilibrium path-flow solutions from standard optimizers can accurately predict those same flows for new origin-destination demand patterns and altered network structures, including in multi-class settings. This replaces the slow non-linear optimization process with a single forward pass through the network, cutting computation time by orders of magnitude on tested networks such as Sioux Falls and Eastern Massachusetts while preserving detailed path and trip information.

What carries the argument

The Transformer architecture trained to map origin-destination demands and network features directly onto equilibrium path-flow vectors, using self-attention to model correlations across different origin-destination pairs at the path level.

If this is right

Traffic assignment calculations that once took hours finish in seconds, allowing many more planning scenarios to be evaluated.
Multi-class user equilibria can be estimated in one pass without running separate optimizations for each user class.
A single trained model accommodates changes in demand or road network layout by processing new inputs without retraining or re-optimizing.
Path-level predictions supply richer trip and flow details than traditional link-level outputs, improving accuracy for management applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The learned demand-to-flow mapping could be extended to time-varying or stochastic demands to address dynamic traffic assignment.
The same architecture might transfer to other network equilibrium problems such as power flow or communication routing.
Occasional verification runs with a conventional solver could be combined with the fast predictions to create a hybrid system that scales while controlling error.

Load-bearing premise

That equilibrium solutions generated by conventional solvers on a limited collection of training networks and demands contain enough variety for the model to generalize accurately to completely new demand patterns and network modifications.

What would settle it

Apply the trained model to a large real-world network whose demand levels or origin-destination pairs lie outside the training distribution, then compare the predicted path flows and resulting total travel times against those produced by a full equilibrium solver; large systematic discrepancies would show the generalization has failed.

read the original abstract

The traffic assignment problem is essential for traffic flow analysis, traditionally solved using mathematical programs under the Equilibrium principle. These methods become computationally prohibitive for large-scale networks due to non-linear growth in complexity with the number of OD pairs. This study introduces a novel data-driven approach using deep neural networks, specifically leveraging the Transformer architecture, to predict equilibrium path flows directly. By focusing on path-level traffic distribution, the proposed model captures intricate correlations between OD pairs, offering a more detailed and flexible analysis compared to traditional link-level approaches. The Transformer-based model drastically reduces computation time, while adapting to changes in demand and network structure without the need for recalculation. Numerical experiments are conducted on the Manhattan-like synthetic network, the Sioux Falls network, and the Eastern-Massachusetts network. The results demonstrate that the proposed model is orders of magnitude faster than conventional optimization. It efficiently estimates path-level traffic flows in multi-class networks, reducing computational costs and improving prediction accuracy by capturing detailed trip and flow information. The model also adapts flexibly to varying demand and network conditions, supporting traffic management and enabling rapid `what-if' analyses for enhanced transportation planning and policy-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces a Transformer-based neural network to directly predict equilibrium path flows for the traffic assignment problem, trained on solutions from conventional optimization solvers. It claims orders-of-magnitude reductions in computation time compared to traditional methods, improved accuracy via path-level modeling (especially for multi-class networks), and the ability to adapt to new OD demands and network structure changes without retraining or re-solving the equilibrium problem. Experiments are reported on a Manhattan-like synthetic network, Sioux Falls, and Eastern Massachusetts networks.

Significance. If the generalization and accuracy claims are substantiated with quantitative metrics and out-of-distribution tests, the work could enable rapid what-if analyses and real-time applications in large-scale transportation networks where repeated equilibrium solves are prohibitive. The shift from link-level to path-level prediction is a potentially useful distinction for capturing OD correlations.

major comments (3)

[Abstract] Abstract and Experiments section: The central claims of 'orders of magnitude faster' computation and 'improving prediction accuracy' are asserted without any reported quantitative metrics (e.g., MAE or RMSE on path flows), runtime tables, baseline solver comparisons, or validation-set performance numbers, leaving the performance advantages unsupported by visible evidence.
[Experiments] Experiments and Methodology sections: Generalization to 'changes in demand and network structure without the need for recalculation' is claimed, yet no explicit out-of-distribution protocol, hold-out demand vectors, or modified-topology test cases are described; training appears limited to equilibrium solutions on the three fixed networks, raising questions about extrapolation reliability.
[Methodology] Overall approach: Because the model is trained to reproduce path-flow outputs already computed by traditional equilibrium solvers, any claimed speedup is an approximation trade-off rather than a fundamental replacement; the manuscript does not quantify the accuracy-speedup Pareto frontier or bound the approximation error for unseen inputs.

minor comments (1)

[Methodology] Notation for multi-class OD demands and path sets could be clarified with an explicit table of symbols to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each of the major comments below and have made revisions to the manuscript to provide additional quantitative evidence, clarify the generalization experiments, and discuss the approximation aspects more thoroughly.

read point-by-point responses

Referee: [Abstract] Abstract and Experiments section: The central claims of 'orders of magnitude faster' computation and 'improving prediction accuracy' are asserted without any reported quantitative metrics (e.g., MAE or RMSE on path flows), runtime tables, baseline solver comparisons, or validation-set performance numbers, leaving the performance advantages unsupported by visible evidence.

Authors: We acknowledge that the quantitative metrics supporting the central claims could be more explicitly presented. We have added a new table in the Experiments section that reports MAE and RMSE on path flows, along with runtime tables comparing our model to baseline solvers and validation-set performance numbers. This table substantiates the orders-of-magnitude speedup and accuracy improvements. The abstract has been revised to include references to these metrics. revision: yes
Referee: [Experiments] Experiments and Methodology sections: Generalization to 'changes in demand and network structure without the need for recalculation' is claimed, yet no explicit out-of-distribution protocol, hold-out demand vectors, or modified-topology test cases are described; training appears limited to equilibrium solutions on the three fixed networks, raising questions about extrapolation reliability.

Authors: We have added an explicit description of the out-of-distribution protocol in the revised Experiments and Methodology sections. This includes details on hold-out demand vectors and modified-topology test cases. We now report quantitative results from these tests to demonstrate the model's adaptation to new demands and network structures without retraining. revision: yes
Referee: [Methodology] Overall approach: Because the model is trained to reproduce path-flow outputs already computed by traditional equilibrium solvers, any claimed speedup is an approximation trade-off rather than a fundamental replacement; the manuscript does not quantify the accuracy-speedup Pareto frontier or bound the approximation error for unseen inputs.

Authors: We recognize that our method learns to approximate the outputs of traditional solvers. To address this, we have included a new subsection quantifying the accuracy-speedup Pareto frontier through experiments with varying model capacities and training data sizes. We also bound the approximation error using metrics on unseen inputs and discuss the practical implications of this trade-off. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard supervised approximation of external solver outputs

full rationale

The paper presents a data-driven Transformer trained on path-flow solutions produced by conventional equilibrium solvers (e.g., on Manhattan-like, Sioux Falls, and Eastern-Massachusetts networks) and then applied to new demand vectors or network modifications. This is an empirical surrogate-modeling setup whose labels come from an independent optimization procedure; the learned mapping is not defined in terms of itself, nor does any quoted step reduce a claimed prediction to a fitted input by algebraic construction. No equations are shown that equate the Transformer output to the training generator, no load-bearing self-citation chain is invoked to justify uniqueness or ansatz choices, and the generalization claims rest on explicit numerical experiments rather than internal re-labeling. The approach is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach depends on the premise that a neural network can learn the equilibrium mapping from data produced by existing solvers; this introduces many free parameters in model architecture and training while assuming the generated training distribution covers future scenarios.

free parameters (2)

Transformer architecture hyperparameters
Number of layers, attention heads, embedding dimension, and learning-rate schedule are chosen during model development and fitted to the generated equilibrium data.
Training dataset construction parameters
Choice of OD demand samples, network perturbations, and number of optimization runs used to create the supervised training set.

axioms (1)

domain assumption A sufficiently expressive neural network can approximate the mapping from OD demand vectors to equilibrium path-flow vectors.
This is the central modeling assumption that justifies replacing the optimization solver with a learned predictor.

pith-pipeline@v0.9.0 · 5745 in / 1437 out tokens · 48915 ms · 2026-05-18T04:12:08.093460+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Transformer-based model ... predicts equilibrium path flows directly ... captures intricate correlations between OD pairs
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The model learns this mapping in a fully data-driven manner

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.