if you’d like

Wenxuan Zhou, Sheng Zhang, Hoifung Poon, Muhao Chen · 2023 · DOI 10.18653/v1/2023.findings-emnlp.968

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

representative citing papers

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

LLM judges exhibit high stability under neutral re-evaluation but substantial reversibility under targeted post-decision challenges, quantified via a new Evaluation Robustness Score (ERS).

Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators

cs.AI · 2026-06-05 · unverdicted · novelty 5.0

LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.

CRAFT: A Unified Counterfactual Reasoning Framework for Tabular Question Answering and Fact Verification

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

CRAFT is a unified bidirectional counterfactual reasoning framework that improves LLM performance on tabular QA and fact verification tasks over baselines on WikiTQ and TabFact.

Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks

cs.HC · 2025-09-11 · unverdicted · novelty 5.0

Medium personality expression in LLM agents yields the most positive user perceptions in goal-oriented tasks, further improved by trait alignment.

Explicit Evidence Grounding via Structured Inline Citation Generation

cs.CL · 2026-06-05 · unverdicted · novelty 4.0

FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges cs.AI · 2026-06-03 · unverdicted · none · ref 80
LLM judges exhibit high stability under neutral re-evaluation but substantial reversibility under targeted post-decision challenges, quantified via a new Evaluation Robustness Score (ERS).
Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators cs.AI · 2026-06-05 · unverdicted · none · ref 66
LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.

if you’d like

fields

years

verdicts

representative citing papers

citing papers explorer