Unsupervised elicitation of language models, 2025

Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, Jacob Goldman- Wetzler, Linda Petrini, Henry Sleight, Collin Burns, He He, Shi Feng, Ethan Perez, Jan Leike · 2025 · arXiv 2506.10139

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Mitigating Label Bias with Interpretable Rubric Embeddings

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Rubric embeddings from expert criteria mitigate label bias in models trained on historical evaluations, reducing group disparities while improving cohort quality on a master's program dataset.

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

cs.LG · 2025-09-17 · unverdicted · novelty 6.0

Parallel inference rollouts aggregated into pseudo-references enable reference-free RL supervision that matches expert-annotated performance on health tasks while using 9x less test-time compute.

citing papers explorer

Showing 2 of 2 citing papers.

Mitigating Label Bias with Interpretable Rubric Embeddings cs.LG · 2026-05-20 · unverdicted · none · ref 42
Rubric embeddings from expert criteria mitigate label bias in models trained on historical evaluations, reducing group disparities while improving cohort quality on a master's program dataset.
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision cs.LG · 2025-09-17 · unverdicted · none · ref 26
Parallel inference rollouts aggregated into pseudo-references enable reference-free RL supervision that matches expert-annotated performance on health tasks while using 9x less test-time compute.

Unsupervised elicitation of language models, 2025

fields

years

verdicts

representative citing papers

citing papers explorer