MA-BC partitions divergent expert data and pools non-conflicting pairs to achieve faster convergence to Pareto-optimal policies in MOMDPs, with a matching minimax lower bound.
Diffusion policy: Visuomotor policy learning via action diffusion
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.
citing papers explorer
-
Split the Differences, Pool the Rest: Provably Efficient Multi-Objective Imitation
MA-BC partitions divergent expert data and pools non-conflicting pairs to achieve faster convergence to Pareto-optimal policies in MOMDPs, with a matching minimax lower bound.
-
ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization
ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.