MPC-Injection biases off-policy RL locomotion policies toward controller-induced behavior basins by injecting MPC transitions into the replay buffer.
Optimal Active Sensing with Process and Measurement Noise
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Introduces a stochastic DDP algorithm that optimizes nominal controls and feedback gains for belief-state trajectory problems under partial observability without relying on the separation principle.
The paper presents a threat model, taxonomy, and six-dimension measurement framework for AI sandboxes to clarify valid testing claims for safety, security, and regulatory assurance.
citing papers explorer
-
MPC-Injection: Biasing Off-Policy Locomotion RL Toward Controller-Induced Behavior Basins
MPC-Injection biases off-policy RL locomotion policies toward controller-induced behavior basins by injecting MPC transitions into the replay buffer.
-
Stochastic Differential Dynamic Programming for Trajectory Optimization under Partial Observability
Introduces a stochastic DDP algorithm that optimizes nominal controls and feedback gains for belief-state trajectory problems under partial observability without relying on the separation principle.
-
AI Sandboxes: A Threat Model, Taxonomy, and Measurement Framework
The paper presents a threat model, taxonomy, and six-dimension measurement framework for AI sandboxes to clarify valid testing claims for safety, security, and regulatory assurance.