CuraLight uses RL-generated trajectories and multi-LLM debate to curate training data for an LLM traffic-signal controller, yielding 5-7% gains in travel time, queue length, and waiting time over baselines in SUMO simulations of real networks.
Traffic-r1: Reinforced llms bring human-like reasoning to traffic signal control systems
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SignalClaw synthesizes interpretable, composable traffic signal control skills through LLM-guided evolution that matches top baselines on routine SUMO scenarios and outperforms them on emergency and transit events while remaining editable by engineers.
ReRec uses reinforcement fine-tuning with dual-graph reward shaping, reasoning-aware advantage estimation, and online curriculum scheduling to improve LLM reasoning and performance in recommendation tasks.
citing papers explorer
-
CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control
CuraLight uses RL-generated trajectories and multi-LLM debate to curate training data for an LLM traffic-signal controller, yielding 5-7% gains in travel time, queue length, and waiting time over baselines in SUMO simulations of real networks.
-
SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills
SignalClaw synthesizes interpretable, composable traffic signal control skills through LLM-guided evolution that matches top baselines on routine SUMO scenarios and outperforms them on emergency and transit events while remaining editable by engineers.
-
ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning
ReRec uses reinforcement fine-tuning with dual-graph reward shaping, reasoning-aware advantage estimation, and online curriculum scheduling to improve LLM reasoning and performance in recommendation tasks.