pith. sign in

arxiv: 2603.11600 · v2 · pith:3FGTKV43new · submitted 2026-03-12 · 💻 cs.LG · cs.SY· eess.SY· math.OC

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

classification 💻 cs.LG cs.SYeess.SYmath.OC
keywords shapingundercontrolh-earsrewardconditionscontinuousconvergence
0
0 comments X
read the original abstract

Deep reinforcement learning for continuous control often suffers from high variance, low energy efficiency, and poor generalization under distribution shift, as purely data-driven exploration ignores available physical structure. This paper proposes Hybrid Energy-Aware Reward Shaping (H-EARS), which encodes dominant energy terms -- assumed known a priori -- directly as reward potentials at O(n) per-step computation. H-EARS decomposes the shaping potential into task-oriented and energy-based components, supplemented by an action regularization term that deliberately modifies the optimization objective to enforce energy-efficient control. A complete theoretical foundation is established: functional independence of shaping and regularization, energy-based gradient enrichment under positive-definite Hessian conditions, convergence guarantees under function approximation, and approximate potential error bounds. Across four continuous control benchmarks and four baseline algorithms, H-EARS achieves consistent gains in convergence speed, policy stability, and final performance. High-fidelity vehicle simulations validate applicability in safety-critical settings under extreme road conditions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.