D$^3$-Subsidy: Online and Sequential Driver Subsidy Decision-Making for Large-Scale Ride-Hailing Market

Haijiao Wang; Hongyang Zhang; Jintao Ke; Laoming Zhang; Rui Su; Siyuan Feng; Taijie Chen; Zhaofeng Ma

arxiv: 2605.20036 · v2 · pith:ZTZQH5GMnew · submitted 2026-05-19 · 💻 cs.LG

D³-Subsidy: Online and Sequential Driver Subsidy Decision-Making for Large-Scale Ride-Hailing Market

Taijie Chen , Rui Su , Siyuan Feng , Laoming Zhang , Hongyang Zhang , Haijiao Wang , Zhaofeng Ma , Jintao Ke This is my paper

Pith reviewed 2026-05-21 07:22 UTC · model grok-4.3

classification 💻 cs.LG

keywords ride-hailing subsidydiffusion modelonline sequential decisioncity-scale controlsupply-demand balancingLagrangian dualcap compliancemulti-city transfer

0 comments

The pith

Prefix-conditioned diffusion generates future trajectories from fixed history to set city-level driver subsidies that respect caps and lift rides plus GMV.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents D³-Subsidy as a hierarchical diffusion-based controller for online, sequential subsidy decisions at city scale in ride-hailing platforms. It uses a prefix-conditioned diffusion model to sample plausible future trajectories solely from immutable historical observations, then decodes those plans into low-dimensional control signals and maps them to order-level incentives via a Lagrangian-dual construction that bakes in subsidy-rate caps. A sympathetic reader would care because the approach promises responsive supply-demand balancing without per-order optimization, multi-city transfer via pretraining, and measurable gains in completed rides and gross merchandise value while keeping violations inside operational bounds. Offline tests and a live A/B experiment are offered as evidence that the method meets the three simultaneous requirements of shock responsiveness, strict caps, and low-latency execution.

Core claim

D³-Subsidy is a hierarchical diffusion-based framework for deployable city-wide subsidy control that bridges the train-inference gap with a prefix-conditioned diffusion model sampling plausible future trajectories from immutable historical observations; these plans are decoded by a context-conditioned inverse module into low-dimensional city-level signals and then mapped to fine-grained incentives through a Lagrangian-dual-derived construction that directly embeds subsidy-rate caps, all supported by multi-city pretraining and parameter-efficient fine-tuning for transfer across heterogeneous cities.

What carries the argument

Prefix-conditioned diffusion model that samples plausible future trajectories from immutable historical observations, which aligns training with the fixed-history constraint of online deployment and supplies forward-looking plans for the downstream inverse module and Lagrangian mapping.

If this is right

Rides and GMV increase in offline evaluations while cap compliance improves.
Real-world A/B test shows significant uplift with budget-related violation metrics staying inside operational thresholds.
City-level plans convert to per-order incentives without iterative optimization, meeting low-latency requirements at scale.
Multi-city pretraining plus parameter-efficient fine-tuning supports transfer to new cities without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prefix-conditioning pattern could be tested on other online resource-allocation problems where only past observations are available at decision time.
If the diffusion trajectories prove robust across market regimes, the framework might reduce reliance on city-specific hand-tuned rules.
Measuring how much the Lagrangian mapping preserves optimality when demand shocks exceed the diffusion model's training distribution would clarify its limits.

Load-bearing premise

The prefix-conditioned diffusion model produces future trajectories that remain plausible and decision-relevant when the only information available at deployment time is immutable historical observations.

What would settle it

A live deployment in which the diffusion-generated trajectories diverge substantially from realized outcomes, causing the resulting subsidy schedule to produce lower rides or GMV than a simple historical-average baseline while still satisfying cap constraints.

Figures

Figures reproduced from arXiv: 2605.20036 by Haijiao Wang, Hongyang Zhang, Jintao Ke, Laoming Zhang, Rui Su, Siyuan Feng, Taijie Chen, Zhaofeng Ma.

**Figure 1.** Figure 1: Overview of the proposed D3 -Subsidy framework. where E𝑡 ′ is the set of broadcasted order–driver pairs in period 𝑡 ′ , 𝑦𝑖𝑗,𝑡′ ∈ {0, 1} indicates whether order 𝑖 is completed by driver 𝑗, and 𝑔𝑖𝑗,𝑡′ denotes the GMV of pair (𝑖, 𝑗) if completed. The augmented state is 𝑥𝑡 = (𝑠𝑡 , 𝜌𝑡 ), and the action is the scalar city-level control 𝜆𝑡 . From city-level control to pair-level subsidies. Given 𝜆𝑡 , the platfor… view at source ↗

**Figure 2.** Figure 2: Comparison of standard trajectory diffusion and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: KPI-conditional policy steering. 200 400 Epoch 0.49 0.50 0.51 0.52 0.53 Diffusion Loss Diffusion Loss Inv Loss 140 160 180 Inv Loss (a) w/o MNDL 200 400 Epoch 1.00 1.02 1.04 1.06 1.08 Diffusion Loss Diffusion Loss Inv Loss 140 160 180 200 Inv Loss (b) w/ MNDL [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Training loss comparison under different settings. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Score under different diffusion steps in City C. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Problem Formulation Let 𝐶 ∈ (0, 1) be the global subsidy-rate cap. Consider the primal problem max 𝑏𝑖 𝑗 ∑︁ 𝑖,𝑗 𝑟𝑖𝑗𝑎𝑖𝑗𝑏𝑖𝑗, s.t. ∑︁ 𝑖,𝑗 𝑎𝑖𝑗𝑏 2 𝑖𝑗 − (𝐶 + 𝛿) ∑︁ 𝑖,𝑗 𝑟𝑖𝑗𝑎𝑖𝑗𝑏𝑖𝑗 ≤ 0, 0 ≤ 𝑏𝑖𝑗 ≤ 𝑏max(𝑖) , ∀𝑖, 𝑗. Let 𝜆 ≥ 0 be the Lagrange multiplier associated with the subsidyrate constraint. Then the optimal subsidy for each (𝑖, 𝑗) under dual parameter 𝜆 (with 𝜆 > 0) is 𝑏 ∗ 𝑖𝑗 (𝜆) = min max{0, 𝜅𝑟𝑖𝑗 }, 𝑏max(𝑖) [… view at source ↗

**Figure 11.** Figure 11: Daily Subsidy Rate in City A [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 9.** Figure 9: Cumulative Rides, GMV and DRV in City A [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 12.** Figure 12: Cumulative Rides, GMV and DRV in City B [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 10.** Figure 10: Per-Window Rides, GMV and DRV in City A [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 13.** Figure 13: Per-Window Rides, GMV and DRV in City B. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 14.** Figure 14: Daily Subsidy Rate in City B. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗

read the original abstract

Ride-hailing platforms like DiDi Chuxing operate in highly dynamic environments where balancing driver supply and passenger demand is critical. Although driver-side subsidies serve as a primary lever to align these forces and improve key KPIs like completed rides (\texttt{Rides}) and gross merchandise value (\texttt{GMV}), optimizing them in production requires simultaneously meeting three constraints: (i) responsiveness to stochastic shocks, (ii) strict subsidy-rate caps, and (iii) low-latency execution at city scale. These requirements rule out expensive per-order optimization, calling for a forward-looking, constraint-aware city-level controller for online sequential decision making. To meet these requirements, we introduce D$^3$-Subsidy (Dynamic Driver-side Diffusion-based Subsidy), a hierarchical diffusion-based framework for deployable city-wide subsidy control. To bridge the train-inference gap, D$^3$-Subsidy employs a prefix-conditioned diffusion model that samples plausible future trajectories from immutable historical observations, ensuring the training protocol aligns with the fixed-history nature of online deployment. These generated plans are then decoded by a context-conditioned inverse module into low-dimensional city-level control signals. For scalable execution, we bridge the gap between city-level planning and fine-grained dispatch via a Lagrangian-dual-derived mapping, which embeds subsidy-rate caps directly into order-driver incentives without iterative optimization. Additionally, a multi-city pretraining strategy with parameter-efficient fine-tuning enables robust transfer across heterogeneous cities. Extensive offline evaluations demonstrate that D$^3$-Subsidy improves \texttt{Rides} and \texttt{GMV} while enhancing cap compliance, and a real-world A/B test confirms significant uplift while keeping budget-related violation metrics within operational thresholds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The diffusion-based subsidy controller targets real deployment constraints but the abstract leaves the empirical claims hard to assess.

read the letter

The main takeaway is a hierarchical setup that uses prefix-conditioned diffusion to sample future trajectories from fixed historical data, then decodes them into city-level subsidy signals and applies a Lagrangian mapping to enforce rate caps in one step. This tries to solve the online sequential decision problem for ride-hailing without per-order optimization or iterative solvers, while adding multi-city pretraining for transfer. That combination of elements for low-latency, cap-aware control is the clearest new piece relative to prior subsidy work I know. It also shows awareness of the train-inference gap by conditioning on prefixes, which is a reasonable engineering move for production settings where only past observations are available at decision time. The offline and A/B claims of better rides, GMV, and cap compliance are presented as evidence that the approach works at scale. If the full paper includes ablations against standard predictors and proper statistical reporting, those results could be practically useful for operators facing budget limits and stochastic demand. The soft spot is that the abstract gives almost no detail on baselines, controls, data splits, or significance tests, so the size of the reported uplift is difficult to judge. The stress-test concern about whether the diffusion trajectories stay decision-relevant under immutable history and typical distribution shifts is fair to raise; without that, the downstream mapping cannot be credited for the gains. This paper is aimed at applied researchers and platform teams working on constrained sequential control in matching markets. A reader who needs concrete methods for city-scale subsidy under hard caps could extract value from the architecture even if the numbers need more scrutiny. It deserves a serious referee to examine the experimental section and check whether the claimed improvements survive standard validation checks.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces D³-Subsidy, a hierarchical diffusion-based framework for online sequential driver subsidy decision-making in large-scale ride-hailing markets. It employs a prefix-conditioned diffusion model to generate future trajectories from immutable historical observations (to bridge the train-inference gap), decodes these into city-level control signals via a context-conditioned inverse module, and uses a Lagrangian-dual-derived mapping to embed subsidy-rate caps into incentives without iterative optimization. A multi-city pretraining strategy with parameter-efficient fine-tuning supports transfer across cities. The central claims are improvements in completed rides (Rides) and GMV, plus enhanced cap compliance, demonstrated via extensive offline evaluations and a real-world A/B test that keeps budget-related violation metrics within thresholds.

Significance. If the prefix-conditioned diffusion trajectories prove decision-relevant and robust under immutable history, the work provides a scalable, constraint-aware controller suitable for production ride-hailing systems. The combination of generative modeling for forward-looking planning with Lagrangian embedding for hard constraints, together with the multi-city transfer approach, represents a practical advance in applying diffusion models to sequential operational decisions. The real-world A/B test component adds deployment relevance, though its evidential weight depends on the missing statistical details.

major comments (2)

[Abstract] Abstract: The reported improvements in Rides and GMV from offline evaluations and the A/B test are asserted without any baselines, statistical significance tests, data exclusion criteria, sample sizes, or error bars. This directly undermines verification of the central claim that D³-Subsidy delivers meaningful uplift while satisfying operational constraints.
[Abstract] Abstract (description of prefix-conditioned diffusion model): The framework's ability to produce plausible, decision-relevant future trajectories when only immutable historical observations are available at inference time is load-bearing for all downstream KPI gains. The manuscript provides no implementation details on the prefix conditioning, no ablation against simpler predictors (e.g., historical averages or autoregressive baselines), and no robustness checks under typical ride-hailing distribution shifts, leaving the train-inference gap bridge unverified.

minor comments (2)

Ensure that all KPI definitions (Rides, GMV, cap compliance, budget violation metrics) are explicitly defined with formulas or precise operational descriptions in the main text, not only in the abstract.
The Lagrangian-dual mapping is described at a high level; a brief pseudocode or equation sketch in the methods section would clarify how subsidy-rate caps are embedded without iterative optimization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation of results and methods.

read point-by-point responses

Referee: [Abstract] Abstract: The reported improvements in Rides and GMV from offline evaluations and the A/B test are asserted without any baselines, statistical significance tests, data exclusion criteria, sample sizes, or error bars. This directly undermines verification of the central claim that D³-Subsidy delivers meaningful uplift while satisfying operational constraints.

Authors: We agree that the abstract, as a concise summary, would benefit from additional context to support immediate verification of the claims. The full manuscript details the baselines (rule-based, optimization, and learning-based methods), statistical tests with p-values, data exclusion criteria, sample sizes, and error bars in Sections 4 and 5. We will revise the abstract to briefly reference the primary baselines and note that reported uplifts are statistically significant (p < 0.05) with full details in the experimental sections. revision: yes
Referee: [Abstract] Abstract (description of prefix-conditioned diffusion model): The framework's ability to produce plausible, decision-relevant future trajectories when only immutable historical observations are available at inference time is load-bearing for all downstream KPI gains. The manuscript provides no implementation details on the prefix conditioning, no ablation against simpler predictors (e.g., historical averages or autoregressive baselines), and no robustness checks under typical ride-hailing distribution shifts, leaving the train-inference gap bridge unverified.

Authors: The prefix conditioning mechanism is described in Section 3.1, where the diffusion model is trained to generate future trajectories conditioned solely on immutable historical prefixes to align with online inference. To directly address the request for verification, we will add explicit implementation details on the conditioning (e.g., prefix length and masking strategy) to the methods section and include new ablations against historical averages and autoregressive predictors, plus robustness experiments under simulated demand shocks and distribution shifts, in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The paper describes a hierarchical framework with a prefix-conditioned diffusion model that generates future trajectories from immutable historical observations to address the train-inference gap for online subsidy decisions. Central claims of KPI improvements (Rides, GMV, cap compliance) are supported by offline evaluations and a real-world A/B test, treating historical data as external input. No equations, fitted parameters renamed as predictions, or self-citation chains are exhibited that would reduce the reported outcomes or diffusion trajectories to the inputs by construction. The approach aligns training with deployment constraints without self-definitional loops or ansatz smuggling via prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method relies on standard diffusion models and Lagrangian duality whose concrete implementation details are absent.

pith-pipeline@v0.9.0 · 5864 in / 1267 out tokens · 42320 ms · 2026-05-21T07:22:03.541621+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

prefix-conditioned diffusion model that samples plausible future trajectories from immutable historical observations... constraint-aware score that penalizes infeasible trajectories... context-conditioned inverse dynamics module... Lagrangian-dual-derived mapping
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Score(ξ) = (C / C_real(ξ))^β * Rides(ξ) if violation else Rides(ξ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 2 internal anchors

[1]

Anurag Ajay, Yilun Du, Abhi Gupta, Joshua B Tenenbaum, Tommi S Jaakkola, and Pulkit Agrawal. 2023. Is Conditional Generative Modeling all you need for Decision Making?. InThe Eleventh International Conference on Learning Represen- tations

work page 2023
[2]

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. 2021. Structured denoising diffusion models in discrete state-spaces. Advances in neural information processing systems34 (2021), 17981–17993

work page 2021
[3]

Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, and Jun Zhu. [n. d.]. Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling. InThe Eleventh International Conference on Learning Representations

work page
[4]

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. 2021. Decision transformer: Reinforcement learning via sequence modeling.Advances in neural information processing systems34 (2021), 15084–15097

work page 2021
[5]

Minshuo Chen, Song Mei, Jianqing Fan, and Mengdi Wang. 2024. An overview of diffusion models: Applications, guided generation, statistical rates and opti- mization.arXiv preprint arXiv:2404.07771(2024)

work page arXiv 2024
[6]

Taijie Chen, Jian Liang, Ya Zhao, and Jintao Ke. 2025. To grab or not? Revealing determinants of drivers’ willingness to grab orders in on-demand ride services. Travel Behaviour and Society41 (2025), 101093

work page 2025
[7]

Taijie Chen, Zijian Shen, Siyuan Feng, Linchuan Yang, and Jintao Ke. 2025. Dy- namic matching radius decision model for on-demand ride services: A deep multi-task learning approach.Transportation Research Part E: Logistics and Trans- portation Review193 (2025), 103822

work page 2025
[8]

Siyuan Feng, Taijie Chen, Yuhao Zhang, Jintao Ke, Zhengfei Zheng, and Hai Yang. 2024. A multi-functional simulation platform for on-demand ride service operations.Communications in Transportation Research4 (2024), 100141

work page 2024
[9]

Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning.Advances in neural information processing systems34 (2021), 20132–20145

work page 2021
[10]

Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep rein- forcement learning without exploration. InInternational Conference on Machine Learning. 2052–2062

work page 2019
[11]

Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Bo Zheng, and Yan Zhang. 2024. Generative auto-bidding via conditional diffusion modeling. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5038–5049

work page 2024
[12]

Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, and Sergey Levine. 2023. Idql: Implicit q-learning as an actor-critic method with diffusion policies.arXiv preprint arXiv:2304.10573(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in neural information processing systems33, 6840–6851

work page 2020
[14]

Jifeng Hu, Yanchao Sun, Sili Huang, SiYuan Guo, Hechang Chen, Li Shen, Lichao Sun, Yi Chang, and Dacheng Tao. 2023. Instructed diffuser with temporal condi- tion guidance for offline reinforcement learning.arXiv preprint arXiv:2306.04875 (2023)

work page arXiv 2023
[15]

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. 2022. Offline Reinforcement Learning with Implicit Q-Learning. InInternational Conference on Learning Rep- resentations

work page 2022
[16]

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conserva- tive q-learning for offline reinforcement learning.Advances in neural information processing systems33 (2020), 1179–1191

work page 2020
[17]

Yewen Li, Jingtong Gao, Nan Jiang, Shuai Mao, Ruyi An, Fei Pan, Xiangyu Zhao, Bo An, Qingpeng Cai, and Peng Jiang. 2025. Generative Auto-Bidding in Large- Scale Competitive Auctions via Diffusion Completer-Aligner.arXiv preprint arXiv:2509.03348(2025)

work page arXiv 2025
[18]

Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffu- sion probabilistic models. InInternational conference on machine learning. PMLR, 8162–8171

work page 2021
[19]

Zhiwei Tony Qin, Hongtu Zhu, and Jieping Ye. 2022. Reinforcement learning for ridesharing: An extended survey.Transportation Research Part C: Emerging Technologies144 (2022), 103852

work page 2022
[20]

Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin Zheng, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, and Jingjing Liu. 2025. Flow matching- based autonomous driving planning with advanced interactive behavior modeling. arXiv preprint arXiv:2510.11083(2025)

work page arXiv 2025
[21]

Zhendong Wang, Jonathan J Hunt, and Mingyuan Zhou. 2022. Diffusion policies as an expressive policy class for offline reinforcement learning.arXiv preprint arXiv:2208.06193(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

Ningke Xie, Wei Tang, Jiangtao Zhu, Junyi Li, and Xiqun Michael Chen. 2023. Understanding causal effects of ride-sourcing subsidy: A novel generative adver- sarial networks approach.Transportation Research Part C: Emerging Technologies 157 (2023), 104371

work page 2023
[23]

Jiaqi Yang, Lexiao Chen, Zicheng Su, Wanjing Ma, Zhichao Zou, and Kun An. 2025. Decision-focused learning for optimal subsidy allocation in ride-hailing services. Transportation Research Part C: Emerging Technologies180 (2025), 105301. 9 KDD ’26, August 9–13, 2026, Jeju, Republic of Korea Chen et al

work page 2025
[24]

Enpeng Yuan and Pascal Van Hentenryck. 2021. Real-time pricing optimization for ride-hailing quality of service. In30th International Joint Conference on Artificial Intelligence (IJCAI-21

work page 2021
[25]

Qi Zhang, Yang Liu, and Zhi-Ping Fan. 2023. Short-term subsidy strategy for new users of ride-hailing platform with user base.Computers & Industrial Engineering 179 (2023), 109177

work page 2023
[26]

Yinan Zheng, Ruiming Liang, Kexin ZHENG, Jinliang Zheng, Liyuan Mao, Jianx- iong Li, Weihao Gu, Rui Ai, Shengbo Eben Li, Xianyuan Zhan, et al . [n. d.]. Diffusion-Based Planning for Autonomous Driving with Flexible Guidance. In The Thirteenth International Conference on Learning Representations

work page
[27]

#∗𝜆%using 𝝀𝒕𝑏!

Zheng Zhu, Jintao Ke, and Hai Wang. 2021. A mean-field Markov decision process model for spatial-temporal subsidies in ride-sourcing markets.Transportation Research Part B: Methodological150 (2021), 540–565. A Operations in Ride-hailing Platforms In a ride-hailing platform, operational decisions arise from the con- tinuous interactions among passengers, d...

work page 2021

[1] [1]

Anurag Ajay, Yilun Du, Abhi Gupta, Joshua B Tenenbaum, Tommi S Jaakkola, and Pulkit Agrawal. 2023. Is Conditional Generative Modeling all you need for Decision Making?. InThe Eleventh International Conference on Learning Represen- tations

work page 2023

[2] [2]

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. 2021. Structured denoising diffusion models in discrete state-spaces. Advances in neural information processing systems34 (2021), 17981–17993

work page 2021

[3] [3]

Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, and Jun Zhu. [n. d.]. Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling. InThe Eleventh International Conference on Learning Representations

work page

[4] [4]

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. 2021. Decision transformer: Reinforcement learning via sequence modeling.Advances in neural information processing systems34 (2021), 15084–15097

work page 2021

[5] [5]

Minshuo Chen, Song Mei, Jianqing Fan, and Mengdi Wang. 2024. An overview of diffusion models: Applications, guided generation, statistical rates and opti- mization.arXiv preprint arXiv:2404.07771(2024)

work page arXiv 2024

[6] [6]

Taijie Chen, Jian Liang, Ya Zhao, and Jintao Ke. 2025. To grab or not? Revealing determinants of drivers’ willingness to grab orders in on-demand ride services. Travel Behaviour and Society41 (2025), 101093

work page 2025

[7] [7]

Taijie Chen, Zijian Shen, Siyuan Feng, Linchuan Yang, and Jintao Ke. 2025. Dy- namic matching radius decision model for on-demand ride services: A deep multi-task learning approach.Transportation Research Part E: Logistics and Trans- portation Review193 (2025), 103822

work page 2025

[8] [8]

Siyuan Feng, Taijie Chen, Yuhao Zhang, Jintao Ke, Zhengfei Zheng, and Hai Yang. 2024. A multi-functional simulation platform for on-demand ride service operations.Communications in Transportation Research4 (2024), 100141

work page 2024

[9] [9]

Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning.Advances in neural information processing systems34 (2021), 20132–20145

work page 2021

[10] [10]

Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep rein- forcement learning without exploration. InInternational Conference on Machine Learning. 2052–2062

work page 2019

[11] [11]

Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Bo Zheng, and Yan Zhang. 2024. Generative auto-bidding via conditional diffusion modeling. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5038–5049

work page 2024

[12] [12]

Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, and Sergey Levine. 2023. Idql: Implicit q-learning as an actor-critic method with diffusion policies.arXiv preprint arXiv:2304.10573(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in neural information processing systems33, 6840–6851

work page 2020

[14] [14]

Jifeng Hu, Yanchao Sun, Sili Huang, SiYuan Guo, Hechang Chen, Li Shen, Lichao Sun, Yi Chang, and Dacheng Tao. 2023. Instructed diffuser with temporal condi- tion guidance for offline reinforcement learning.arXiv preprint arXiv:2306.04875 (2023)

work page arXiv 2023

[15] [15]

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. 2022. Offline Reinforcement Learning with Implicit Q-Learning. InInternational Conference on Learning Rep- resentations

work page 2022

[16] [16]

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conserva- tive q-learning for offline reinforcement learning.Advances in neural information processing systems33 (2020), 1179–1191

work page 2020

[17] [17]

Yewen Li, Jingtong Gao, Nan Jiang, Shuai Mao, Ruyi An, Fei Pan, Xiangyu Zhao, Bo An, Qingpeng Cai, and Peng Jiang. 2025. Generative Auto-Bidding in Large- Scale Competitive Auctions via Diffusion Completer-Aligner.arXiv preprint arXiv:2509.03348(2025)

work page arXiv 2025

[18] [18]

Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffu- sion probabilistic models. InInternational conference on machine learning. PMLR, 8162–8171

work page 2021

[19] [19]

Zhiwei Tony Qin, Hongtu Zhu, and Jieping Ye. 2022. Reinforcement learning for ridesharing: An extended survey.Transportation Research Part C: Emerging Technologies144 (2022), 103852

work page 2022

[20] [20]

Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin Zheng, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, and Jingjing Liu. 2025. Flow matching- based autonomous driving planning with advanced interactive behavior modeling. arXiv preprint arXiv:2510.11083(2025)

work page arXiv 2025

[21] [21]

Zhendong Wang, Jonathan J Hunt, and Mingyuan Zhou. 2022. Diffusion policies as an expressive policy class for offline reinforcement learning.arXiv preprint arXiv:2208.06193(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[22] [22]

Ningke Xie, Wei Tang, Jiangtao Zhu, Junyi Li, and Xiqun Michael Chen. 2023. Understanding causal effects of ride-sourcing subsidy: A novel generative adver- sarial networks approach.Transportation Research Part C: Emerging Technologies 157 (2023), 104371

work page 2023

[23] [23]

Jiaqi Yang, Lexiao Chen, Zicheng Su, Wanjing Ma, Zhichao Zou, and Kun An. 2025. Decision-focused learning for optimal subsidy allocation in ride-hailing services. Transportation Research Part C: Emerging Technologies180 (2025), 105301. 9 KDD ’26, August 9–13, 2026, Jeju, Republic of Korea Chen et al

work page 2025

[24] [24]

Enpeng Yuan and Pascal Van Hentenryck. 2021. Real-time pricing optimization for ride-hailing quality of service. In30th International Joint Conference on Artificial Intelligence (IJCAI-21

work page 2021

[25] [25]

Qi Zhang, Yang Liu, and Zhi-Ping Fan. 2023. Short-term subsidy strategy for new users of ride-hailing platform with user base.Computers & Industrial Engineering 179 (2023), 109177

work page 2023

[26] [26]

Yinan Zheng, Ruiming Liang, Kexin ZHENG, Jinliang Zheng, Liyuan Mao, Jianx- iong Li, Weihao Gu, Rui Ai, Shengbo Eben Li, Xianyuan Zhan, et al . [n. d.]. Diffusion-Based Planning for Autonomous Driving with Flexible Guidance. In The Thirteenth International Conference on Learning Representations

work page

[27] [27]

#∗𝜆%using 𝝀𝒕𝑏!

Zheng Zhu, Jintao Ke, and Hai Wang. 2021. A mean-field Markov decision process model for spatial-temporal subsidies in ride-sourcing markets.Transportation Research Part B: Methodological150 (2021), 540–565. A Operations in Ride-hailing Platforms In a ride-hailing platform, operational decisions arise from the con- tinuous interactions among passengers, d...

work page 2021