and Chen, Minmin , title =

Olivier Jeunen, Aleksei Ustimenko · 2024 · arXiv 0457.368816

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

cs.LG · 2026-02-16 · unverdicted · novelty 7.0

Optimal additive baseline IPS asymptotically dominates SNIPS in off-policy evaluation mean squared error.

A More Accurate Algorithm Comparison through A/B Testing using Offline Evaluation Methods

cs.LG · 2026-07-02 · unverdicted · novelty 6.0

Proposes an A/B testing estimator that introduces a hypothetical middle algorithm for stepwise estimation to induce positive correlation, reducing selection errors and halving required data volume.

Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation cs.LG · 2026-02-16 · unverdicted · none · ref 15
Optimal additive baseline IPS asymptotically dominates SNIPS in off-policy evaluation mean squared error.
A More Accurate Algorithm Comparison through A/B Testing using Offline Evaluation Methods cs.LG · 2026-07-02 · unverdicted · none · ref 11
Proposes an A/B testing estimator that introduces a hypothetical middle algorithm for stepwise estimation to induce positive correlation, reducing selection errors and halving required data volume.
Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering cs.CL · 2026-05-12 · unverdicted · none · ref 15
IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.

and Chen, Minmin , title =

fields

years

verdicts

representative citing papers

citing papers explorer