International conference on machine learning , pages=

Addressing function approximation error in actor-critic methods , author= · 2018

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Goal-Conditioned Agents that Learn Everything All at Once

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

Implicit Safety Alignment from Crowd Preferences

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

A hierarchical framework extracts implicit safety criteria from crowd preferences and composes them via high-level policy to reduce safety violations in downstream RL tasks without explicit safety rewards.

ERFSL: An Efficient Reward Function Searcher via Language Models for Custom-Environment Multi-Objective Optimization (Student Abstract)

eess.SY · 2026-05-19 · unverdicted · novelty 6.0

ERFSL generates and optimizes LLM-based reward functions for custom multi-objective RL, correcting codes in one iteration and converging weights in 5.2 iterations on average even from 500x errors.

Discrete Flow Matching for Offline-to-Online Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

DRIFT enables stable offline-to-online fine-tuning of CTMC policies in discrete RL via advantage-weighted discrete flow matching, path-space regularization, and candidate-set approximation.

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control

cs.LG · 2026-05-15 · unverdicted · novelty 4.0

EfficientTDMPC extends the TD-MPC family with model ensembles, return averaging, and uncertainty penalties to reach SOTA sample efficiency on hard continuous control benchmarks in low-data regimes.

Transfer Learning for Customized Car Racing Environments

cs.RO · 2026-05-18 · unverdicted · novelty 2.0

The study applies transfer learning to deep RL in OpenAI car racing, observing that model-based approaches outperform model-free methods and that transfer boosts target domain performance.

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving

cs.AI · 2026-05-20

citing papers explorer

Showing 9 of 9 citing papers.

Goal-Conditioned Agents that Learn Everything All at Once cs.LG · 2026-05-22 · unverdicted · none · ref 50
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Implicit Safety Alignment from Crowd Preferences cs.AI · 2026-05-20 · unverdicted · none · ref 30
A hierarchical framework extracts implicit safety criteria from crowd preferences and composes them via high-level policy to reduce safety violations in downstream RL tasks without explicit safety rewards.
ERFSL: An Efficient Reward Function Searcher via Language Models for Custom-Environment Multi-Objective Optimization (Student Abstract) eess.SY · 2026-05-19 · unverdicted · none · ref 18
ERFSL generates and optimizes LLM-based reward functions for custom multi-objective RL, correcting codes in one iteration and converging weights in 5.2 iterations on average even from 500x errors.
Discrete Flow Matching for Offline-to-Online Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 52
DRIFT enables stable offline-to-online fine-tuning of CTMC policies in discrete RL via advantage-weighted discrete flow matching, path-space regularization, and candidate-set approximation.
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unverdicted · none · ref 91
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 57
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control cs.LG · 2026-05-15 · unverdicted · none · ref 21
EfficientTDMPC extends the TD-MPC family with model ensembles, return averaging, and uncertainty penalties to reach SOTA sample efficiency on hard continuous control benchmarks in low-data regimes.
Transfer Learning for Customized Car Racing Environments cs.RO · 2026-05-18 · unverdicted · none · ref 10
The study applies transfer learning to deep RL in OpenAI car racing, observing that model-based approaches outperform model-free methods and that transfer boosts target domain performance.
ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving cs.AI · 2026-05-20 · unreviewed · ref 42

International conference on machine learning , pages=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer