DISA decouples partition function estimation using offline importance sampling for distribution-matching LLM-RL, matching or exceeding online baselines like FlowRL on math and code benchmarks while retaining more strategy diversity.
Owen , year = 2013, title =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.
citing papers explorer
-
DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
DISA decouples partition function estimation using offline importance sampling for distribution-matching LLM-RL, matching or exceeding online baselines like FlowRL on math and code benchmarks while retaining more strategy diversity.
-
An adaptive variance estimator for relative sparsity
A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.