Exploring Dissatisfaction in Bus Route Reduction through LLM-Calibrated Agent-Based Modeling

Qiumeng Li; Suhong Zhou; Xinxi Yang

arxiv: 2510.26163 · v2 · submitted 2025-10-30 · 💻 cs.CY

Exploring Dissatisfaction in Bus Route Reduction through LLM-Calibrated Agent-Based Modeling

Qiumeng Li , Xinxi Yang , Suhong Zhou This is my paper

Pith reviewed 2026-05-18 03:43 UTC · model grok-4.3

classification 💻 cs.CY

keywords bus route reductionagent-based modelingLLM calibrationpassenger dissatisfactionnetwork structuretransport equitythreshold analysisBeijing public transit

0 comments

The pith

Bus network structure shapes dissatisfaction more than capacity or operations in route cuts

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper deploys an agent-based model of bus passengers calibrated by a large language model on Beijing IC-card data to explore the effects of progressive route reductions. It demonstrates that network structure influences overall stability more than capacity or operational details, with cuts to high-connectivity routes causing exponential growth in dissatisfaction that hits older adults and disabled passengers especially hard. Dissatisfaction develops in three phases—stable, transitional, and critical—and the scenario reveals clear thresholds where further small reductions cause major ridership losses. A reader would care because cities worldwide are trimming bus services amid budget pressures, and the results highlight which routes and groups to protect to sustain equitable and resilient transit.

Core claim

The structural configuration of the bus network exerts a stronger influence on system stability than capacity or operational factors. The elimination of high-connectivity routes led to an exponential rise in total dissatisfaction, particularly among passengers with disabilities and older adults. The evolution of dissatisfaction exhibited three distinct phases - stable, transitional, and critical. The continuous bus route reduction scenario exhibits three-stage thresholds. Once these thresholds are crossed, even a small reduction in routes may lead to a significant loss of passenger flow.

What carries the argument

LLM-calibrated agent-based model using few-shot learning to derive passenger sensitivity parameters for travel time, waiting, transfers, and crowding from IC-card data

If this is right

High-connectivity routes should be maintained to prevent exponential dissatisfaction increases.
Targeted support for older adults and passengers with disabilities is essential for equitable outcomes.
Network structure should take precedence over capacity adjustments in reduction planning.
Thresholds in the reduction process require careful monitoring to avoid critical phase shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar LLM-calibrated models could help planners in other districts identify vulnerable routes before cuts are made.
The phase structure points to potential early-warning indicators for transit system health.
Integrating this with data on alternative transport modes might reveal how they interact with bus dissatisfaction thresholds.

Load-bearing premise

The assumption that parameters from the LLM few-shot learning on IC-card data accurately represent real passenger trade-offs between time, waiting, transfers, and crowding holds without significant omitted factors altering the phases.

What would settle it

Collecting real passenger dissatisfaction data or ridership figures after actual route reductions in Huairou or a comparable area and checking for the predicted three phases and demographic disparities would test the central claim.

read the original abstract

As emerging mobility modes continue to expand, many cities face declining bus ridership, increasing fiscal pressure to sustain underutilized routes, and growing inefficiencies in resource allocation. This study employs an agent-based modelling (ABM) approach calibrated through a large language model (LLM) using few-shot learning to examine how progressive bus route cutbacks affect passenger dissatisfaction across demographic groups and overall network resilience. Using IC-card data from Beijing's Huairou District, the LLM-calibrated ABM estimated passenger sensitivity parameters related to travel time, waiting, transfers, and crowding. Results show that the structural configuration of the bus network exerts a stronger influence on system stability than capacity or operational factors. The elimination of high-connectivity routes led to an exponential rise in total dissatisfaction, particularly among passengers with disabilities and older adults. The evolution of dissatisfaction exhibited three distinct phases - stable, transitional, and critical. Through the analysis of each stage, this study found that the continuous bus route reduction scenario exhibits three-stage thresholds. Once these thresholds are crossed, even a small reduction in routes may lead to a significant loss of passenger flow. Research highlights the nonlinear response of user sentiment to service reductions and underscore the importance of maintaining structural critical routes and providing stable services to vulnerable groups for equitable and resilient transport planning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses LLM few-shot calibration on IC-card data to run ABM simulations of bus route cuts and claims three dissatisfaction phases plus structural dominance, but without validation or alternative checks those outputs read as model-dependent rather than observed.

read the letter

The core contribution here is taking real IC-card records from one Beijing district and using an LLM with few-shot examples to set four passenger sensitivity parameters in an agent-based model, then simulating progressive route reductions to track dissatisfaction growth and equity effects on older and disabled riders. The authors identify a stable phase, a transitional phase, and a critical phase, along with thresholds beyond which small further cuts trigger large passenger losses, and they argue that network structure matters more than capacity or operations for overall stability. That combination of LLM calibration with phased threshold analysis on a real district network is the clearest new element, and it is grounded enough in actual data to make the equity angle concrete rather than abstract. The practical framing around avoiding critical thresholds for vulnerable groups is also a strength for anyone thinking about service planning under declining ridership. The main limitation is that the reported exponential rise, phase boundaries, and structural dominance all flow directly from the calibrated simulation without reported hold-out tests, comparison to standard discrete-choice estimation on the same records, or back-testing against any actual post-reduction ridership or satisfaction data. The LLM prompts and accuracy on the sensitivity parameters are not shown, and demographic splits for disability and age appear to rely on assumptions rather than direct tags in the cards. This leaves open the possibility that the phases and thresholds are artifacts of the model structure or calibration choices. The work is aimed at transport modelers and local planners who already use ABMs and want to explore LLM-assisted calibration for scenario work. It is worth sending to peer review so referees can ask for the missing validation steps and prompt details; the underlying policy question is real and the method has room to be made more robust.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an agent-based modeling (ABM) framework calibrated via large language model (LLM) few-shot learning on IC-card data from Beijing's Huairou District to simulate progressive bus route reductions. It claims that network structural configuration exerts stronger influence on stability than capacity or operational factors, that elimination of high-connectivity routes produces an exponential rise in total dissatisfaction (particularly for passengers with disabilities and older adults), and that dissatisfaction evolves through three distinct phases (stable, transitional, critical) with identifiable thresholds in continuous reduction scenarios beyond which small further cuts trigger significant passenger loss.

Significance. If the simulation results hold after proper validation, the work could usefully highlight nonlinear passenger responses to service cuts and the priority of preserving structurally critical routes for equity and resilience in public transport planning. The LLM-calibration approach for behavioral parameters in ABM is a methodological innovation that, once tested, might extend to other urban mobility studies.

major comments (3)

[Calibration and Parameter Estimation] The central claims of exponential dissatisfaction growth, three distinct phases, and structural dominance rest on ABM outputs whose four sensitivity parameters (travel time, waiting, transfers, crowding) are obtained via LLM few-shot learning on the same IC-card records used to define the network; no section reports hold-out validation, comparison to discrete-choice estimation, or parameter uncertainty.
[Results and Phase Analysis] The reported phase boundaries and thresholds for the continuous route-reduction scenario are generated directly from the calibrated simulation without any back-testing against observed ridership or dissatisfaction trajectories following actual route changes in Huairou or comparable districts.
[Demographic Impact Analysis] Differential dissatisfaction impacts on passengers with disabilities and older adults are asserted, yet the IC-card data source is not described as containing direct demographic tags for these groups, and no separate calibration or sensitivity analysis for these subpopulations is detailed.

minor comments (2)

[Abstract] The abstract states results without accompanying validation metrics, error bars, or robustness checks; adding a concise limitations sentence would improve clarity.
[Methods] Notation for the dissatisfaction metric and the exact definition of the three-stage thresholds should be made explicit with an equation or pseudocode in the methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, indicating revisions where appropriate, and have strengthened the manuscript by adding robustness checks, literature comparisons, and methodological clarifications.

read point-by-point responses

Referee: [Calibration and Parameter Estimation] The central claims of exponential dissatisfaction growth, three distinct phases, and structural dominance rest on ABM outputs whose four sensitivity parameters (travel time, waiting, transfers, crowding) are obtained via LLM few-shot learning on the same IC-card records used to define the network; no section reports hold-out validation, comparison to discrete-choice estimation, or parameter uncertainty.

Authors: We acknowledge the value of explicit validation. In the revised version we have added a dedicated subsection on calibration robustness that reports results from five independent LLM few-shot runs, quantifies parameter uncertainty through bootstrapped standard errors, and provides a direct comparison of the four estimated sensitivities against published discrete-choice coefficients from Beijing and other Chinese transit studies. While a strict temporal hold-out was not feasible given the single-period IC-card snapshot, we performed k-fold cross-validation on the available records and report the resulting parameter stability; these additions address the core concern without altering the original modeling approach. revision: partial
Referee: [Results and Phase Analysis] The reported phase boundaries and thresholds for the continuous route-reduction scenario are generated directly from the calibrated simulation without any back-testing against observed ridership or dissatisfaction trajectories following actual route changes in Huairou or comparable districts.

Authors: We agree that empirical back-testing would be ideal. Because no progressive route-reduction episode with before-and-after ridership data exists for Huairou, we have now inserted a new paragraph in the discussion that benchmarks our simulated thresholds against published ridership declines observed after comparable service cuts in other Chinese districts (e.g., Shanghai and Chengdu). We also explicitly list the absence of direct back-testing as a limitation and outline a data-collection protocol for future validation. These changes preserve the simulation-based nature of the phase analysis while situating it within the available empirical literature. revision: yes
Referee: [Demographic Impact Analysis] Differential dissatisfaction impacts on passengers with disabilities and older adults are asserted, yet the IC-card data source is not described as containing direct demographic tags for these groups, and no separate calibration or sensitivity analysis for these subpopulations is detailed.

Authors: The IC-card records indeed lack explicit demographic tags. Subpopulation behaviors were elicited by conditioning the LLM prompts on well-documented travel-preference differences drawn from the Chinese transit literature. In the revision we have (i) expanded the methods section with the exact prompt templates used for older adults and passengers with disabilities, (ii) added a dedicated sensitivity analysis that varies the subpopulation-specific parameters by ±20 % and shows that the reported differential dissatisfaction remains statistically significant, and (iii) clarified that these results are conditional on the literature-derived priors. This provides the requested transparency and robustness check. revision: yes

Circularity Check

0 steps flagged

No circularity: simulation outputs are downstream of calibration but not equivalent to inputs by construction

full rationale

The paper calibrates four sensitivity parameters (travel time, waiting, transfers, crowding) via LLM few-shot learning on IC-card records, then runs an ABM to generate dissatisfaction trajectories under hypothetical route-reduction scenarios. These scenarios and the resulting phase thresholds, exponential rises, and demographic differentials are not present in the calibration data; they are produced by the model dynamics. No quoted equation or self-citation reduces the reported results to a renaming or direct reuse of the fitted values themselves. The derivation therefore remains self-contained as a standard calibrated simulation exercise rather than a tautological restatement of its inputs.

Axiom & Free-Parameter Ledger

4 free parameters · 2 axioms · 0 invented entities

The central claims rest on four fitted sensitivity parameters derived from data via LLM and on standard domain assumptions about how agents respond to network changes; no new physical entities are introduced.

free parameters (4)

sensitivity to travel time
Estimated via LLM few-shot learning from IC-card data to drive agent route choices
sensitivity to waiting time
Estimated via LLM few-shot learning from IC-card data to drive agent route choices
sensitivity to transfers
Estimated via LLM few-shot learning from IC-card data to drive agent route choices
sensitivity to crowding
Estimated via LLM few-shot learning from IC-card data to drive agent route choices

axioms (2)

domain assumption Passenger agents select routes and react to service changes according to the four calibrated sensitivity parameters
Invoked to generate dissatisfaction and phase behavior in the ABM simulations
domain assumption Network stability and dissatisfaction can be adequately represented by connectivity metrics and aggregate agent responses without external shocks or unmodeled behaviors
Required for the structural configuration to dominate capacity factors and for the three-phase pattern to emerge

pith-pipeline@v0.9.0 · 5760 in / 1947 out tokens · 50128 ms · 2026-05-18T03:43:02.306119+00:00 · methodology

Exploring Dissatisfaction in Bus Route Reduction through LLM-Calibrated Agent-Based Modeling

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)