pith. sign in

arXiv preprint arXiv:2505.11413 , year=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 3 cs.CL 2

years

2026 4 2025 1

verdicts

UNVERDICTED 5

roles

background 1

polarities

background 1

representative citing papers

DataDignity: Training Data Attribution for Large Language Models

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

ScoringModel raises mean Recall@10 to 52.2 on the FakeWiki provenance benchmark from 35.0 for the best baseline, winning 41 of 45 model-by-condition comparisons and gaining 15.7 points on jailbreak-style queries.

Understanding Annotator Safety Policy with Interpretability

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.

citing papers explorer

Showing 5 of 5 citing papers.