PP-DTD achieves linear convergence to a neighborhood of the optimum under constant step-sizes and O(T^{-1}) under decaying step-sizes for distributed TD policy evaluation in MARL over directed graphs, claimed as the first with rates comparable to single-agent TD.
Human-level control through deep reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
NaviSlim uses a gated slimmable architecture to dynamically scale neural model complexity and onboard sensor power for context-aware navigation in micro-drones, reporting 57-92% average model reduction and 61-80% sensor utilization in AirSim simulations versus static full-complexity baselines.
citing papers explorer
-
Distributed TD Tracking with Linear Function Approximation over Directed Communication Networks
PP-DTD achieves linear convergence to a neighborhood of the optimum under constant step-sizes and O(T^{-1}) under decaying step-sizes for distributed TD policy evaluation in MARL over directed graphs, claimed as the first with rates comparable to single-agent TD.
-
NaviSlim: Adaptive Context-Aware Navigation and Sensing via Dynamic Slimmable Networks
NaviSlim uses a gated slimmable architecture to dynamically scale neural model complexity and onboard sensor power for context-aware navigation in micro-drones, reporting 57-92% average model reduction and 61-80% sensor utilization in AirSim simulations versus static full-complexity baselines.