Title resolution pending

Peiqi Sui, Eamon Duede, Sophie Wu, Richard So · 2024 · DOI 10.18653/v1/2024.acl-long.770

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

ELEPHANT: Measuring and understanding social sycophancy in LLMs

cs.CL · 2025-05-20 · unverdicted · novelty 8.0

LLMs preserve users' face 45 percentage points more than humans and affirm both sides of moral conflicts 48% of the time depending on user stance, per the new ELEPHANT benchmark.

ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations

cs.CL · 2025-09-30 · conditional · novelty 6.0

ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.

citing papers explorer

Showing 2 of 2 citing papers.

ELEPHANT: Measuring and understanding social sycophancy in LLMs cs.CL · 2025-05-20 · unverdicted · none · ref 3
LLMs preserve users' face 45 percentage points more than humans and affirm both sides of moral conflicts 48% of the time depending on user stance, per the new ELEPHANT benchmark.
ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations cs.CL · 2025-09-30 · conditional · none · ref 27
ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer