Scaling offline rl via efficient and expressive shortcut models

Nicolas Espinosa-Dice, Yiyi Zhang, Yiding Chen, Bradley Guo, Owen Oertell, Gokul Swamy, Kiante Brantley, Wen Sun · 2025 · arXiv 2505.22866

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

DROL trains one-step offline RL actors via top-1 dynamic routing of dataset actions to latent candidates, enabling local improvements while preserving data support and retaining cheap inference.

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

DFP is a one-step generative policy using Wasserstein gradient flow on a drifting model backbone, with a top-K behavior cloning surrogate, that reaches SOTA on Robomimic and OGBench manipulation tasks.

Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

PROCO generates synthetic unsafe samples via model-based rollouts and LLM-grounded costs to enable safer policy learning from offline datasets containing few or no violations.

What Does Flow Matching Bring To TD Learning?

cs.LG · 2026-03-04 · conditional · novelty 6.0

Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03

citing papers explorer

Showing 5 of 5 citing papers.

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning cs.LG · 2026-04-24 · unverdicted · none · ref 3
DROL trains one-step offline RL actors via top-1 dynamic routing of dataset actions to latent candidates, enabling local improvements while preserving data support and retaining cheap inference.
Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow cs.LG · 2026-05-08 · unverdicted · none · ref 13
DFP is a one-step generative policy using Wasserstein gradient flow on a drifting model backbone, with a top-K behavior cloning surrogate, that reaches SOTA on Robomimic and OGBench manipulation tasks.
Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data cs.LG · 2026-05-02 · unverdicted · none · ref 15
PROCO generates synthetic unsafe samples via model-based rollouts and LLM-grounded costs to enable safer policy learning from offline datasets containing few or no violations.
What Does Flow Matching Bring To TD Learning? cs.LG · 2026-03-04 · conditional · none · ref 20
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unreviewed · ref 71

Scaling offline rl via efficient and expressive shortcut models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer