An LLM-enhanced MARL system with differential attention critic produces lower economic costs and voltage violations than baselines in simulated real-time P2P electricity trading.
arXiv preprint arXiv:1909.01387 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Dual-Window Smoothing uses an execution window for deterministic smoothness and a value window to correct critic bias, plus a first-order temporal regularizer, to achieve smoother RL control than explicit chunking or standard baselines.
citing papers explorer
-
LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading
An LLM-enhanced MARL system with differential attention critic produces lower economic costs and voltage violations than baselines in simulated real-time P2P electricity trading.
-
Implicit Action Chunking for Smooth Continuous Control
Dual-Window Smoothing uses an execution window for deterministic smoothness and a value window to correct critic bias, plus a first-order temporal regularizer, to achieve smoother RL control than explicit chunking or standard baselines.