OracleTSC introduces a reward hurdle and uncertainty regularization to stabilize LLM-based reinforcement learning for traffic signal control, delivering 75% lower travel time and 67% lower queue length on benchmarks plus cross-intersection generalization.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
unclear 1representative citing papers
BALAR is a task-agnostic Bayesian loop that maintains structured beliefs over latent states, selects questions via expected mutual information, and expands its state space when needed, delivering 14.6-38.5% accuracy gains over baselines on detective, puzzle, and clinical diagnosis benchmarks.
citing papers explorer
-
OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control
OracleTSC introduces a reward hurdle and uncertainty regularization to stabilize LLM-based reinforcement learning for traffic signal control, delivering 75% lower travel time and 67% lower queue length on benchmarks plus cross-intersection generalization.
-
BALAR : A Bayesian Agentic Loop for Active Reasoning
BALAR is a task-agnostic Bayesian loop that maintains structured beliefs over latent states, selects questions via expected mutual information, and expands its state space when needed, delivering 14.6-38.5% accuracy gains over baselines on detective, puzzle, and clinical diagnosis benchmarks.