Optimal additive baseline IPS asymptotically dominates SNIPS in off-policy evaluation mean squared error.
and Chen, Minmin , title =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Proposes an A/B testing estimator that introduces a hypothetical middle algorithm for stepwise estimation to induce positive correlation, reducing selection errors and halving required data volume.
IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.
citing papers explorer
-
Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation
Optimal additive baseline IPS asymptotically dominates SNIPS in off-policy evaluation mean squared error.
-
A More Accurate Algorithm Comparison through A/B Testing using Offline Evaluation Methods
Proposes an A/B testing estimator that introduces a hypothetical middle algorithm for stepwise estimation to induce positive correlation, reducing selection errors and halving required data volume.
-
Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering
IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.