DGLight uses a frozen CoLight DQN critic to score LLM-generated actions and optimize the policy via GRPO, yielding the strongest LLM-based traffic signal controller on Jinan and Hangzhou benchmarks while remaining competitive with RL baselines.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control
DGLight uses a frozen CoLight DQN critic to score LLM-generated actions and optimize the policy via GRPO, yielding the strongest LLM-based traffic signal controller on Jinan and Hangzhou benchmarks while remaining competitive with RL baselines.