arXiv preprint arXiv:1909.01387 , year=

Making efficient use of demonstrations to solve hard exploration problems , author= · 1909 · arXiv 1909.01387

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading

cs.MA · 2025-07-20 · unverdicted · novelty 6.0

An LLM-enhanced MARL system with differential attention critic produces lower economic costs and voltage violations than baselines in simulated real-time P2P electricity trading.

Implicit Action Chunking for Smooth Continuous Control

cs.RO · 2026-05-19 · unverdicted · novelty 5.0

Dual-Window Smoothing uses an execution window for deterministic smoothness and a value window to correct critic bias, plus a first-order temporal regularizer, to achieve smoother RL control than explicit chunking or standard baselines.

citing papers explorer

Showing 2 of 2 citing papers.

LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading cs.MA · 2025-07-20 · unverdicted · none · ref 6
An LLM-enhanced MARL system with differential attention critic produces lower economic costs and voltage violations than baselines in simulated real-time P2P electricity trading.
Implicit Action Chunking for Smooth Continuous Control cs.RO · 2026-05-19 · unverdicted · none · ref 33
Dual-Window Smoothing uses an execution window for deterministic smoothness and a value window to correct critic bias, plus a first-order temporal regularizer, to achieve smoother RL control than explicit chunking or standard baselines.

arXiv preprint arXiv:1909.01387 , year=

fields

years

verdicts

representative citing papers

citing papers explorer