hub Baseline reference

Ogbench: Benchmarking offline goal-conditioned rl

Ogbench: Benchmarking offline goal-conditioned rl , author= · 2025 · arXiv 2410.20092

Baseline reference. 71% of citing Pith papers use this work as a benchmark or comparison.

17 Pith papers citing it

Baseline 71% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 4 background 2 baseline 1

citation-polarity summary

use dataset 4 background 2 baseline 1

representative citing papers

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.

Aligning Flow Map Policies with Optimal Q-Guidance

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.

SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning

cs.LG · 2026-04-27 · unverdicted · novelty 7.0

SpecRLBench is a new benchmark evaluating generalization of LTL-guided RL methods across navigation and manipulation domains with static/dynamic environments and varied robot dynamics.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

Goal-Conditioned Agents that Learn Everything All at Once

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.

Predictive but Not Plannable: RC-aux for Latent World Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

cs.LG · 2026-04-22 · conditional · novelty 6.0

Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.

Fisher Decorator: Refining Flow Policy via a Local Transport Map

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

Hierarchical Planning with Latent World Models

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.

SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows

cs.RO · 2026-02-10 · unverdicted · novelty 6.0

SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.

Reinforcement Learning with Action Chunking

cs.LG · 2025-07-10 · unverdicted · novelty 6.0

Q-chunking improves offline-to-online RL sample efficiency on long-horizon sparse-reward manipulation tasks by applying action chunking to TD learning.

Abstraction for Offline Goal-Conditioned Reinforcement Learning

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.

Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B

cs.LG · 2026-05-01 · unverdicted · novelty 5.0 · 2 refs

Four heads (L26.28, L27.28, L27.2, L27.3) in frozen Gemma 4 31B exhibit joint high importance on text and non-text tasks with hypergeometric significance (P=0.0013) and causal validation on a cube task.

Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

cs.LG · 2026-04-10 · unverdicted · novelty 5.0

Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.

citing papers explorer

Showing 17 of 17 citing papers.

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation cs.LG · 2026-05-13 · unverdicted · none · ref 18
CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.
Aligning Flow Map Policies with Optimal Q-Guidance cs.LG · 2026-05-12 · unverdicted · none · ref 32
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unverdicted · none · ref 57
FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning cs.LG · 2026-04-27 · unverdicted · none · ref 12
SpecRLBench is a new benchmark evaluating generalization of LTL-guided RL methods across navigation and manipulation domains with static/dynamic environments and varied robot dynamics.
Steering Your Diffusion Policy with Latent Space Reinforcement Learning cs.RO · 2025-06-18 · unverdicted · none · ref 86
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
Goal-Conditioned Agents that Learn Everything All at Once cs.LG · 2026-05-22 · unverdicted · none · ref 44
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning cs.LG · 2026-05-10 · unverdicted · none · ref 30
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 33
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities cs.AI · 2026-05-07 · unverdicted · none · ref 30 · 2 links
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning cs.LG · 2026-04-22 · conditional · none · ref 17
Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.
Fisher Decorator: Refining Flow Policy via a Local Transport Map cs.LG · 2026-04-20 · unverdicted · none · ref 5
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
Hierarchical Planning with Latent World Models cs.LG · 2026-04-03 · unverdicted · none · ref 35
Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.
SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows cs.RO · 2026-02-10 · unverdicted · none · ref 43
SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.
Reinforcement Learning with Action Chunking cs.LG · 2025-07-10 · unverdicted · none · ref 54
Q-chunking improves offline-to-online RL sample efficiency on long-horizon sparse-reward manipulation tasks by applying action chunking to TD learning.
Abstraction for Offline Goal-Conditioned Reinforcement Learning cs.LG · 2026-05-21 · unverdicted · none · ref 5
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B cs.LG · 2026-05-01 · unverdicted · none · ref 10 · 2 links
Four heads (L26.28, L27.28, L27.2, L27.3) in frozen Gemma 4 31B exhibit joint high importance on text and non-text tasks with hypergeometric significance (P=0.0013) and causal validation on a cube task.
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning cs.LG · 2026-04-10 · unverdicted · none · ref 3
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.

Ogbench: Benchmarking offline goal-conditioned rl

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer