DeFlow : Decoupling manifold modeling and value maximization for offline policy extraction

Zhancun Mu et al · 2026 · arXiv 2601.10471

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

DROL trains one-step offline RL actors via top-1 dynamic routing of dataset actions to latent candidates, enabling local improvements while preserving data support and retaining cheap inference.

Fisher Decorator: Refining Flow Policy via a Local Transport Map

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

citing papers explorer

Showing 2 of 2 citing papers.

Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning cs.LG · 2026-04-24 · unverdicted · none · ref 13
DROL trains one-step offline RL actors via top-1 dynamic routing of dataset actions to latent candidates, enabling local improvements while preserving data support and retaining cheap inference.
Fisher Decorator: Refining Flow Policy via a Local Transport Map cs.LG · 2026-04-20 · unverdicted · none · ref 23
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

DeFlow : Decoupling manifold modeling and value maximization for offline policy extraction

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer