Title resolution pending

AI safety via debate , author= · 2018

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

Toward Human-AI Complementarity Across Diverse Tasks

cs.HC · 2026-04-13 · unverdicted · novelty 5.0

Human-AI hybrids achieve only +0.4pp over AI alone on diverse tasks because confidence routing fails to identify the small set of cases where humans can correct AI errors.

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

cs.AI · 2025-07-15 · unverdicted · novelty 5.0

Chain-of-thought monitorability provides a promising but fragile method for AI safety oversight that developers should actively preserve.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 18
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer