UCATSC: Uncertainty-Aware Constrained Traffic Signal Control Under Vision-Based Partial Observability

Balaji Bodagala; Jayawant Bodagala

arxiv: 2602.07784 · v3 · submitted 2026-02-08 · 💻 cs.CV

UCATSC: Uncertainty-Aware Constrained Traffic Signal Control Under Vision-Based Partial Observability

Jayawant Bodagala , Balaji Bodagala This is my paper

Pith reviewed 2026-05-16 06:51 UTC · model grok-4.3

classification 💻 cs.CV

keywords traffic signal controlpartial observabilitybelief statedilemma zoneservice ageconstrained decision makingvision-based sensingSUMO simulation

0 comments

The pith

UCATSC keeps a compact belief state over queues, arrivals, and service ages, then filters signal phases with short-horizon rollouts against dilemma-zone and starvation constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Camera-based traffic control is partially observable because detections are missed and measurements are noisy, and once a yellow light starts the decision cannot be reversed. The paper builds UCATSC around a reduced movement-level belief state that tracks queue lengths, arrival rates, and how long each movement has already waited. Candidate phase actions are scored by finite-horizon counterfactual rollouts inside that belief space and then screened by explicit predictive checks for dilemma-zone safety and service-age limits. In SUMO experiments the constrained versions remove observed dilemma-zone violations, keep service ages bounded under starvation loads, and still deliver mobility performance comparable to classical and reinforcement-learning baselines while running at millisecond speeds.

Core claim

UCATSC maintains a reduced movement-level belief state over queue, arrival, and service-age variables, evaluates admissible phase actions through finite-horizon counterfactual rollouts in belief space, and filters candidate actions using predictive dilemma-zone safety and service-age/starvation constraints before execution.

What carries the argument

The reduced movement-level belief state over queue, arrival, and service-age variables, evaluated by finite-horizon counterfactual rollouts that are then filtered by dilemma-zone and starvation constraints.

If this is right

Constrained UCATSC variants eliminate observed dilemma-zone violations in the tested SUMO scenarios.
Service age stays bounded during starvation stress tests.
Online runtime remains at millisecond scale.
Mobility metrics stay competitive with classical baselines and both plain and safety-masked DQN agents.
The same decision layer scales to a three-intersection corridor without loss of the safety and liveness properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same belief-state filtering could be wrapped around existing reinforcement-learning traffic controllers to add explicit safety and liveness guarantees without retraining the policy.
If the vision-to-belief interface proves reliable on real cameras, the method offers a way to certify constraint satisfaction without requiring full state reconstruction.
Extending the horizon or adding emission variables to the belief state would be a direct next step to address environmental objectives the current formulation leaves aside.

Load-bearing premise

The reduced belief state and short rollouts capture enough of real traffic uncertainty that the safety and starvation constraints remain valid outside the simulated scenarios.

What would settle it

A controlled field deployment in which the number of dilemma-zone violations or the maximum observed service ages exceed the simulation bounds would show that the belief approximation is insufficient.

Figures

Figures reproduced from arXiv: 2602.07784 by Balaji Bodagala, Jayawant Bodagala.

**Figure 2.** Figure 2: Three-intersection SUMO arterial-corridor benchmark [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: DQN-RL training trace over the 500-episode training [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Delay–safety tradeoff in the asymmetric degraded-vision [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 4.** Figure 4: Distribution of maximum east–west service age in the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Clean S1X liveness ablation. UCATSC-no-liveness re [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 8.** Figure 8: Paired-seed comparison of UCATSC-det and UCATSC [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Targeted delay–safety tradeoff for DQN-RL, safety [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 12.** Figure 12: Constraint-focused stress-test metric matrix including [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

**Figure 15.** Figure 15: Intersection-wise mean queue heatmap in the three [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗

**Figure 14.** Figure 14: Runtime comparison across controllers including DQN [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

**Figure 17.** Figure 17: Delay–spillback tradeoff in the corridor extension. [PITH_FULL_IMAGE:figures/full_fig_p013_17.png] view at source ↗

**Figure 19.** Figure 19: Mean detection probability 𝑝det over time in physical vision-degradation scenarios [PITH_FULL_IMAGE:figures/full_fig_p014_19.png] view at source ↗

**Figure 21.** Figure 21: Risk-proxy distribution across physical-testbed con [PITH_FULL_IMAGE:figures/full_fig_p014_21.png] view at source ↗

**Figure 22.** Figure 22: Total emission-linked proxy distribution across [PITH_FULL_IMAGE:figures/full_fig_p014_22.png] view at source ↗

**Figure 23.** Figure 23: Median cumulative total emission-linked proxy over [PITH_FULL_IMAGE:figures/full_fig_p015_23.png] view at source ↗

read the original abstract

Camera-based adaptive traffic signal control is inherently partially observable: detections can be missed, vehicle speeds and distances can be noisy, and a phase-change decision becomes temporally irreversible once yellow onset is initiated. This paper presents UCATSC, an interpretable uncertainty-aware constrained decision layer for vision-based adaptive signal control. UCATSC maintains a reduced movement-level belief state over queue, arrival, and service-age variables; evaluates admissible phase actions through finite-horizon counterfactual rollouts in belief space; and filters candidate actions using predictive dilemma-zone safety and service-age/starvation constraints before execution. The method is evaluated primarily in SUMO using matched seeds, classical baselines, a trained DQN-RL baseline, a safety-masked DQN variant, targeted safety/liveness/uncertainty stress tests, and a three-intersection corridor extension. In the tested scenarios, UCATSC demonstrates competitive mobility performance, eliminates observed dilemma-zone violations for the constrained variants, bounds service age in starvation stress tests, and maintains millisecond-level online runtime. A controlled physical vision testbed is included as supplementary feasibility evidence for the vision-to-belief interface; it is not presented as field validation of safety, emissions, or deployment readiness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UCATSC adds a constrained belief-space rollout layer with dilemma-zone and starvation filters that keeps mobility competitive while removing observed violations in the SUMO tests, but the reduced per-movement state leaves spatial error correlations unmodeled.

read the letter

The core contribution is a decision layer that keeps a compact belief over queue length, arrival rate, and service age per movement, runs short counterfactual rollouts to score actions, and then applies hard filters for predicted dilemma-zone entry and starvation before committing to a phase. In the reported SUMO runs this produces travel times close to the DQN baseline, zero recorded dilemma-zone violations on the constrained variants, and bounded service ages under the starvation stress cases, all at millisecond latency. The physical testbed shows the vision-to-belief pipeline can be wired up, which is useful even if it is only feasibility evidence.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes UCATSC, an interpretable uncertainty-aware constrained decision layer for vision-based adaptive traffic signal control under partial observability. It maintains a reduced movement-level belief state over per-movement queue, arrival, and service-age variables; evaluates actions via finite-horizon counterfactual rollouts in belief space; and filters actions using predictive dilemma-zone safety and service-age/starvation constraints. SUMO evaluations with classical and DQN baselines, plus targeted stress tests and a corridor extension, report competitive mobility, zero observed dilemma-zone violations for constrained variants, bounded service age, and millisecond runtime; a supplementary physical testbed demonstrates the vision-to-belief interface.

Significance. If the central claims hold, the work provides a structured, interpretable alternative to black-box RL for safety-critical traffic control by explicitly propagating uncertainty through rollouts and enforcing hard constraints on dilemma zones and starvation. The use of matched-seed SUMO tests, stress scenarios, and a physical feasibility testbed strengthens reproducibility and offers falsifiable safety/liveness predictions that could inform deployment of vision-based TSC systems.

major comments (1)

[Method (belief state and safety filter)] The safety filter (described in the method section on constrained rollouts) computes dilemma-zone violation probabilities from finite-horizon predictions in the reduced per-movement belief state. Because this state aggregates only marginal queue/arrival/service-age statistics and does not model spatially correlated vision errors (e.g., occlusions across lanes), the predicted joint distribution of positions and speeds may be biased; this directly undermines the claim that constrained variants eliminate observed violations, as the rollout may under-estimate risk even when marginals match simulation ground truth.

minor comments (3)

[Abstract and §5 (Evaluation)] The abstract and results section report positive outcomes without error bars, confidence intervals, or exact counts of simulation runs per scenario; adding these would clarify statistical significance of the mobility and zero-violation claims.
[Supplementary material description] The physical testbed is presented only as supplementary feasibility evidence; the manuscript should explicitly state its scope (e.g., no safety or emissions validation) to avoid over-interpretation by readers.
[Method (belief update)] Notation for the belief update and transition model could be clarified with a single equation block or pseudocode to make the finite-horizon rollout procedure reproducible from the text alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the single major comment below with a point-by-point response.

read point-by-point responses

Referee: The safety filter (described in the method section on constrained rollouts) computes dilemma-zone violation probabilities from finite-horizon predictions in the reduced per-movement belief state. Because this state aggregates only marginal queue/arrival/service-age statistics and does not model spatially correlated vision errors (e.g., occlusions across lanes), the predicted joint distribution of positions and speeds may be biased; this directly undermines the claim that constrained variants eliminate observed violations, as the rollout may under-estimate risk even when marginals match simulation ground truth.

Authors: We agree that the reduced per-movement belief state relies on marginal statistics and implicitly assumes independence across movements, which does not capture potential spatial correlations in real-world vision errors such as occlusions. This is a deliberate design choice to maintain computational tractability and interpretability for millisecond-scale online decisions. In the SUMO evaluations, vehicle-level detection errors are generated independently, so the marginal predictions match the simulator ground truth and the constrained variants indeed produce zero observed dilemma-zone violations under matched seeds. We acknowledge that the risk estimates could be optimistic under correlated real-world noise. We will therefore make a partial revision by adding a dedicated paragraph in the Discussion section that explicitly states this modeling assumption, qualifies the safety claims as holding in the tested simulation environments, and outlines future directions such as particle-filter extensions for correlation modeling. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper presents UCATSC as a decision layer that maintains a reduced belief state, performs finite-horizon rollouts, and applies safety/liveness filters before execution. All reported performance metrics (mobility, dilemma-zone elimination, service-age bounds, runtime) are obtained by direct evaluation against external baselines (SUMO, classical controllers, DQN variants) and stress tests rather than by re-deriving or fitting quantities from the authors' prior work. No equations, uniqueness theorems, or ansatzes are shown to reduce to self-defined inputs or self-citations; the method is therefore independent of any circular reduction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a reduced movement-level belief state plus short-horizon rollouts can stand in for full traffic dynamics under vision noise, plus design choices for horizon length and constraint thresholds that are not derived from first principles.

free parameters (2)

rollout horizon length
Finite-horizon length for counterfactual simulations is a tunable parameter whose value affects safety filtering and is not derived in the abstract.
dilemma-zone and starvation thresholds
Predictive safety and service-age bounds are constraint parameters whose specific values are chosen to achieve the reported elimination of violations.

axioms (1)

domain assumption The reduced belief state over queue, arrival, and service-age variables adequately captures partial observability from vision detections
Invoked to justify that rollouts in belief space produce reliable safety predictions.

pith-pipeline@v0.9.0 · 5515 in / 1433 out tokens · 26415 ms · 2026-05-16T06:51:54.985210+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

Review of road traffic control strategies,

M. Papageorgiou, C. Diakaki, V. Dinopoulou, A. Kotsialos, and Y. Wang, “Review of road traffic control strategies,”Proceedings of the IEEE, vol. 91, no. 12, pp. 2043–2067, 2003

work page 2043
[2]

Traffic congestion and greenhouse gases,

M. Barth and K. Boriboonsomsin, “Traffic congestion and greenhouse gases,”ACCESS Magazine, no. 35, pp. 2–9, 2009

work page 2009
[3]

Integration modeling framework for estimating mobile source emissions,

H. Rakha and K. Ahn, “Integration modeling framework for estimating mobile source emissions,”Journal of Transportation Engineering, vol. 130, no. 2, pp. 183–193, 2004

work page 2004
[4]

Emission impacts of traffic signal control strategies: Field evaluation,

Y. Zhang, M. Barth, and K. Boriboonsomsin, “Emission impacts of traffic signal control strategies: Field evaluation,”Transportation Research Part D: Transport and Environment, vol. 16, no. 4, pp. 296–303, 2011

work page 2011
[5]

Emission facts: Idling vehicle emissions,

U.S. Environmental Protection Agency, “Emission facts: Idling vehicle emissions,” U.S. EPA, Tech. Rep. EPA-420-F-08-025, 2008

work page 2008
[6]

Near-roadway air quality: Synthesizing the findings from real-world data,

A. A. Karner, D. S. Eisinger, and D. A. Niemeier, “Near-roadway air quality: Synthesizing the findings from real-world data,”Environmental Science & Technology, vol. 44, no. 14, pp. 5334–5344, 2010

work page 2010
[7]

SCOOT— a traffic responsive method of coordinating signals,

P. B. Hunt, D. I. Robertson, R. D. Bretherton, and R. I. Winton, “SCOOT— a traffic responsive method of coordinating signals,” Transport and Road Research Laboratory, Tech. Rep., 1981

work page 1981
[8]

SCATS: The sydney coordinated adaptive traffic system,

P. R. Lowrie, “SCATS: The sydney coordinated adaptive traffic system,” inProc. International Conference on Road Traffic Signalling, 1982

work page 1982
[9]

A real-time traffic signal control system: Architecture, algorithms, and analysis,

P. Mirchandani and L. Head, “A real-time traffic signal control system: Architecture, algorithms, and analysis,”Transportation Research Part C: Emerging Technologies, vol. 9, no. 6, pp. 415–432, 2001

work page 2001
[10]

OPAC: A demand-responsive strategy for traffic signal control,

N. H. Gartner, “OPAC: A demand-responsive strategy for traffic signal control,”Transportation Research Record, no. 906, pp. 75–81, 1983

work page 1983
[11]

Max pressure control of a network of signalized intersections,

P. Varaiya, “Max pressure control of a network of signalized intersections,” Transportation Research Part C: Emerging Technologies, vol. 36, pp. 177–195, 2013

work page 2013
[12]

A multiagent approach to autonomous intersec- tion management,

K. Dresner and P. Stone, “A multiagent approach to autonomous intersec- tion management,”Journal of Artificial Intelligence Research, vol. 31, pp. 591–656, 2008

work page 2008
[13]

PressLight: Learning max pressure control to coordinate traffic signals in arterial network,

H. Wei, C. Chen, G. Zheng, K. Wu, V. Gayah, K. Xu, and Z. Li, “PressLight: Learning max pressure control to coordinate traffic signals in arterial network,” inProc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 1290–1298

work page 2019
[14]

Expert-level control of traffic signals,

F. Bellettiet al., “Expert-level control of traffic signals,”Nature, vol. 604, pp. 236–241, 2022

work page 2022
[15]

Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis,

S. Sivaraman and M. M. Trivedi, “Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 4, pp. 1773–1795, 2013

work page 2013
[16]

Real-time vehicle detection and tracking for vision-based traffic monitoring,

J. Azimjonov and A. M. Ozbayoglu, “Real-time vehicle detection and tracking for vision-based traffic monitoring,”Sensors, vol. 18, no. 11, 2018

work page 2018
[17]

A survey of video pro- cessing techniques for traffic applications,

V. Kastrinaki, M. Zervakis, and K. Kalaitzakis, “A survey of video pro- cessing techniques for traffic applications,”Image and Vision Computing, vol. 21, no. 4, pp. 359–381, 2003

work page 2003
[18]

Policy invariance under reward transformations: Theory and application to reward shaping,

A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inProc. International Conference on Machine Learning, 1999, pp. 278–287

work page 1999
[19]

Safe and psychologically pleasant traf- fic signal control with reinforcement learning using action masking,

A. M¨ uller and M. Sabatelli, “Safe and psychologically pleasant traf- fic signal control with reinforcement learning using action masking,” arXiv:2206.10122, 2022

work page arXiv 2022
[20]

Constrained reinforcement learning for traffic signal control,

R. Zhouet al., “Constrained reinforcement learning for traffic signal control,”Expert Systems with Applications, vol. 239, 2024

work page 2024
[21]

Adaptive traffic control systems: Domestic and foreign state of practice,

A. Stevanovic, “Adaptive traffic control systems: Domestic and foreign state of practice,” Federal Highway Administration, Tech. Rep., 2010

work page 2010
[22]

CityFlow: A multi-agent reinforce- ment learning environment for large scale city traffic scenario,

G. Zheng, H. Liu, K. Xu, and Z. Li, “CityFlow: A multi-agent reinforce- ment learning environment for large scale city traffic scenario,” inProc. World Wide Web Conference, 2019, pp. 3620–3624

work page 2019
[23]

Planning and acting in partially observable stochastic domains,

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,”Artificial Intelligence, vol. 101, no. 1–2, pp. 99–134, 1998

work page 1998
[24]

Thrun, W

S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics. Cambridge, MA, USA: MIT Press, 2005

work page 2005
[25]

Concrete Problems in AI Safety

D. Amodeiet al., “Concrete problems in ai safety,” arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[26]

Learning an interpretable traffic signal control policy,

J. Ault, J. P. Hanna, and G. Sharon, “Learning an interpretable traffic signal control policy,” inProc. International Conference on Autonomous Agents and Multiagent Systems, 2020

work page 2020
[27]

Traffic signal timing manual,

Federal Highway Administration, “Traffic signal timing manual,” FHWA, Tech. Rep. FHWA-HOP-08-024, 2008

work page 2008
[28]

The problem of the amber signal light in traffic flow,

D. Gazis, R. Herman, and A. Maradudin, “The problem of the amber signal light in traffic flow,”Operations Research, vol. 8, no. 1, pp. 112– 132, 1960

work page 1960
[29]

A review of the yellow interval dilemma,

C. Liu, R. Herman, and D. Gazis, “A review of the yellow interval dilemma,”Transportation Research Part A: Policy and Practice, vol. 30, no. 5, pp. 333–348, 1996

work page 1996
[30]

Microscopic traffic simulation using SUMO,

P. Alvarez Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P. Fl¨otter¨od, R. Hilbrich, L. L¨ ucken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using SUMO,” inProc. IEEE Interna- tional Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 2575–2582

work page 2018

[1] [1]

Review of road traffic control strategies,

M. Papageorgiou, C. Diakaki, V. Dinopoulou, A. Kotsialos, and Y. Wang, “Review of road traffic control strategies,”Proceedings of the IEEE, vol. 91, no. 12, pp. 2043–2067, 2003

work page 2043

[2] [2]

Traffic congestion and greenhouse gases,

M. Barth and K. Boriboonsomsin, “Traffic congestion and greenhouse gases,”ACCESS Magazine, no. 35, pp. 2–9, 2009

work page 2009

[3] [3]

Integration modeling framework for estimating mobile source emissions,

H. Rakha and K. Ahn, “Integration modeling framework for estimating mobile source emissions,”Journal of Transportation Engineering, vol. 130, no. 2, pp. 183–193, 2004

work page 2004

[4] [4]

Emission impacts of traffic signal control strategies: Field evaluation,

Y. Zhang, M. Barth, and K. Boriboonsomsin, “Emission impacts of traffic signal control strategies: Field evaluation,”Transportation Research Part D: Transport and Environment, vol. 16, no. 4, pp. 296–303, 2011

work page 2011

[5] [5]

Emission facts: Idling vehicle emissions,

U.S. Environmental Protection Agency, “Emission facts: Idling vehicle emissions,” U.S. EPA, Tech. Rep. EPA-420-F-08-025, 2008

work page 2008

[6] [6]

Near-roadway air quality: Synthesizing the findings from real-world data,

A. A. Karner, D. S. Eisinger, and D. A. Niemeier, “Near-roadway air quality: Synthesizing the findings from real-world data,”Environmental Science & Technology, vol. 44, no. 14, pp. 5334–5344, 2010

work page 2010

[7] [7]

SCOOT— a traffic responsive method of coordinating signals,

P. B. Hunt, D. I. Robertson, R. D. Bretherton, and R. I. Winton, “SCOOT— a traffic responsive method of coordinating signals,” Transport and Road Research Laboratory, Tech. Rep., 1981

work page 1981

[8] [8]

SCATS: The sydney coordinated adaptive traffic system,

P. R. Lowrie, “SCATS: The sydney coordinated adaptive traffic system,” inProc. International Conference on Road Traffic Signalling, 1982

work page 1982

[9] [9]

A real-time traffic signal control system: Architecture, algorithms, and analysis,

P. Mirchandani and L. Head, “A real-time traffic signal control system: Architecture, algorithms, and analysis,”Transportation Research Part C: Emerging Technologies, vol. 9, no. 6, pp. 415–432, 2001

work page 2001

[10] [10]

OPAC: A demand-responsive strategy for traffic signal control,

N. H. Gartner, “OPAC: A demand-responsive strategy for traffic signal control,”Transportation Research Record, no. 906, pp. 75–81, 1983

work page 1983

[11] [11]

Max pressure control of a network of signalized intersections,

P. Varaiya, “Max pressure control of a network of signalized intersections,” Transportation Research Part C: Emerging Technologies, vol. 36, pp. 177–195, 2013

work page 2013

[12] [12]

A multiagent approach to autonomous intersec- tion management,

K. Dresner and P. Stone, “A multiagent approach to autonomous intersec- tion management,”Journal of Artificial Intelligence Research, vol. 31, pp. 591–656, 2008

work page 2008

[13] [13]

PressLight: Learning max pressure control to coordinate traffic signals in arterial network,

H. Wei, C. Chen, G. Zheng, K. Wu, V. Gayah, K. Xu, and Z. Li, “PressLight: Learning max pressure control to coordinate traffic signals in arterial network,” inProc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 1290–1298

work page 2019

[14] [14]

Expert-level control of traffic signals,

F. Bellettiet al., “Expert-level control of traffic signals,”Nature, vol. 604, pp. 236–241, 2022

work page 2022

[15] [15]

Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis,

S. Sivaraman and M. M. Trivedi, “Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 4, pp. 1773–1795, 2013

work page 2013

[16] [16]

Real-time vehicle detection and tracking for vision-based traffic monitoring,

J. Azimjonov and A. M. Ozbayoglu, “Real-time vehicle detection and tracking for vision-based traffic monitoring,”Sensors, vol. 18, no. 11, 2018

work page 2018

[17] [17]

A survey of video pro- cessing techniques for traffic applications,

V. Kastrinaki, M. Zervakis, and K. Kalaitzakis, “A survey of video pro- cessing techniques for traffic applications,”Image and Vision Computing, vol. 21, no. 4, pp. 359–381, 2003

work page 2003

[18] [18]

Policy invariance under reward transformations: Theory and application to reward shaping,

A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inProc. International Conference on Machine Learning, 1999, pp. 278–287

work page 1999

[19] [19]

Safe and psychologically pleasant traf- fic signal control with reinforcement learning using action masking,

A. M¨ uller and M. Sabatelli, “Safe and psychologically pleasant traf- fic signal control with reinforcement learning using action masking,” arXiv:2206.10122, 2022

work page arXiv 2022

[20] [20]

Constrained reinforcement learning for traffic signal control,

R. Zhouet al., “Constrained reinforcement learning for traffic signal control,”Expert Systems with Applications, vol. 239, 2024

work page 2024

[21] [21]

Adaptive traffic control systems: Domestic and foreign state of practice,

A. Stevanovic, “Adaptive traffic control systems: Domestic and foreign state of practice,” Federal Highway Administration, Tech. Rep., 2010

work page 2010

[22] [22]

CityFlow: A multi-agent reinforce- ment learning environment for large scale city traffic scenario,

G. Zheng, H. Liu, K. Xu, and Z. Li, “CityFlow: A multi-agent reinforce- ment learning environment for large scale city traffic scenario,” inProc. World Wide Web Conference, 2019, pp. 3620–3624

work page 2019

[23] [23]

Planning and acting in partially observable stochastic domains,

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,”Artificial Intelligence, vol. 101, no. 1–2, pp. 99–134, 1998

work page 1998

[24] [24]

Thrun, W

S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics. Cambridge, MA, USA: MIT Press, 2005

work page 2005

[25] [25]

Concrete Problems in AI Safety

D. Amodeiet al., “Concrete problems in ai safety,” arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[26] [26]

Learning an interpretable traffic signal control policy,

J. Ault, J. P. Hanna, and G. Sharon, “Learning an interpretable traffic signal control policy,” inProc. International Conference on Autonomous Agents and Multiagent Systems, 2020

work page 2020

[27] [27]

Traffic signal timing manual,

Federal Highway Administration, “Traffic signal timing manual,” FHWA, Tech. Rep. FHWA-HOP-08-024, 2008

work page 2008

[28] [28]

The problem of the amber signal light in traffic flow,

D. Gazis, R. Herman, and A. Maradudin, “The problem of the amber signal light in traffic flow,”Operations Research, vol. 8, no. 1, pp. 112– 132, 1960

work page 1960

[29] [29]

A review of the yellow interval dilemma,

C. Liu, R. Herman, and D. Gazis, “A review of the yellow interval dilemma,”Transportation Research Part A: Policy and Practice, vol. 30, no. 5, pp. 333–348, 1996

work page 1996

[30] [30]

Microscopic traffic simulation using SUMO,

P. Alvarez Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P. Fl¨otter¨od, R. Hilbrich, L. L¨ ucken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using SUMO,” inProc. IEEE Interna- tional Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 2575–2582

work page 2018