PROWL introduces a KL-constrained adversarial curriculum and prioritized adversarial trajectory buffer to actively discover and correct rare failure modes in action-conditioned video world models.
Evolving curricula with regret-based environment design,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
citing papers explorer
-
PROWL: Prioritized Regret-Driven Optimization for World Model Learning
PROWL introduces a KL-constrained adversarial curriculum and prioritized adversarial trajectory buffer to actively discover and correct rare failure modes in action-conditioned video world models.
-
Learning to Reason at the Frontier of Learnability
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.