ERFSL uses LLMs to create per-requirement reward components, correct their code via a critic, and optimize weights with genetic-algorithm-style mutation and crossover driven by training logs, succeeding in a zero-shot data collection task.
Addressing function approxi- mation error in actor-critic methods
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
NaviSplit introduces a dynamic multi-branch split DNN framework for UAV navigation that runs perception on-device and control on-edge, achieving 72-81% depth accuracy with 1.2-18 KB transmissions and 95% lower data rate than static alternatives.
S2P learns separate location and insertion primitives simultaneously via visual RL for peg-in-hole tasks, improving sample efficiency and success rates across polygon benchmarks in simulation and real-world tests.
citing papers explorer
-
Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement
ERFSL uses LLMs to create per-requirement reward components, correct their code via a critic, and optimize weights with genetic-algorithm-style mutation and crossover driven by training logs, succeeding in a zero-shot data collection task.
-
NaviSplit: Dynamic Multi-Branch Split DNNs for Efficient Distributed Autonomous Navigation
NaviSplit introduces a dynamic multi-branch split DNN framework for UAV navigation that runs perception on-device and control on-edge, achieving 72-81% depth accuracy with 1.2-18 KB transmissions and 95% lower data rate than static alternatives.
-
A Visual Reinforcement Learning-Based Separate Primitive Policy for Peg-in-Hole Tasks
S2P learns separate location and insertion primitives simultaneously via visual RL for peg-in-hole tasks, improving sample efficiency and success rates across polygon benchmarks in simulation and real-world tests.