Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
Flow q-learning, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.
citing papers explorer
-
Fisher Decorator: Refining Flow Policy via a Local Transport Map
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
-
ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization
ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.