Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization
read the original abstract
Deep reinforcement learning for continuous control often suffers from high variance, low energy efficiency, and poor generalization under distribution shift, as purely data-driven exploration ignores available physical structure. This paper proposes Hybrid Energy-Aware Reward Shaping (H-EARS), which encodes dominant energy terms -- assumed known a priori -- directly as reward potentials at O(n) per-step computation. H-EARS decomposes the shaping potential into task-oriented and energy-based components, supplemented by an action regularization term that deliberately modifies the optimization objective to enforce energy-efficient control. A complete theoretical foundation is established: functional independence of shaping and regularization, energy-based gradient enrichment under positive-definite Hessian conditions, convergence guarantees under function approximation, and approximate potential error bounds. Across four continuous control benchmarks and four baseline algorithms, H-EARS achieves consistent gains in convergence speed, policy stability, and final performance. High-fidelity vehicle simulations validate applicability in safety-critical settings under extreme road conditions.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.