arXiv preprint arXiv:2206.11871 , year=

Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine · 2022 · arXiv 2206.11871

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

cs.AI · 2025-05-25 · unverdicted · novelty 7.0

UniR is a composable reasoning module trained with verifiable rewards and added to frozen LLMs via logit summation, enabling modular composition and weak-to-strong generalization across tasks and model sizes.

Conditional Attribute Estimation with Autoregressive Sequence Models

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

Training Language Models to Self-Correct via Reinforcement Learning

cs.LG · 2024-09-19 · unverdicted · novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

citing papers explorer

Showing 4 of 4 citing papers.

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs cs.AI · 2025-05-25 · unverdicted · none · ref 36
UniR is a composable reasoning module trained with verifiable rewards and added to frozen LLMs via logit summation, enabling modular composition and weak-to-strong generalization across tasks and model sizes.
Conditional Attribute Estimation with Autoregressive Sequence Models cs.AI · 2026-05-13 · unverdicted · none · ref 20
Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.
Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 49
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 176
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

arXiv preprint arXiv:2206.11871 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer