if you’d like

Wenxuan Zhou, Sheng Zhang, Hoifung Poon, Muhao Chen · 2023 · DOI 10.18653/v1/2023.findings-emnlp.968

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

LLM judges exhibit high stability under neutral re-evaluation but substantial reversibility under targeted post-decision challenges, quantified via a new Evaluation Robustness Score (ERS).

Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators

cs.AI · 2026-06-05 · unverdicted · novelty 5.0

LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.

CRAFT: A Unified Counterfactual Reasoning Framework for Tabular Question Answering and Fact Verification

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

CRAFT is a unified bidirectional counterfactual reasoning framework that improves LLM performance on tabular QA and fact verification tasks over baselines on WikiTQ and TabFact.

Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks

cs.HC · 2025-09-11 · unverdicted · novelty 5.0

Medium personality expression in LLM agents yields the most positive user perceptions in goal-oriented tasks, further improved by trait alignment.

Explicit Evidence Grounding via Structured Inline Citation Generation

cs.CL · 2026-06-05 · unverdicted · novelty 4.0

FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.

Plans for Evaluating Structured Generative Search Summaries

cs.IR · 2026-05-26 · unverdicted · novelty 3.0

The authors propose an evaluation framework for LLM-generated structured search summaries and describe plans for implementing and testing it.

citing papers explorer

Showing 2 of 2 citing papers after filters.

CRAFT: A Unified Counterfactual Reasoning Framework for Tabular Question Answering and Fact Verification cs.CL · 2026-06-05 · unverdicted · none · ref 57
CRAFT is a unified bidirectional counterfactual reasoning framework that improves LLM performance on tabular QA and fact verification tasks over baselines on WikiTQ and TabFact.
Explicit Evidence Grounding via Structured Inline Citation Generation cs.CL · 2026-06-05 · unverdicted · none · ref 37
FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.

if you’d like

fields

years

verdicts

representative citing papers

citing papers explorer