In: Findings of the Association for Computational Lin- guistics: EMNLP 2024

Park, J · 2024 · DOI 10.18653/v1/2024.findings-emnlp.57

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Auditing Stealth Sycophancy in Mental-Health Dialogue: Structured Clinical-State Diagnostics and Clean Matched Benchmarks

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

Introduces a clean matched benchmark and Dynamic Emotional Signature Graphs (DESG) framework that detects implicit sycophancy via clinical-state transitions and reports a 0.0488 macro-F1 gain over baselines on harmful-risk detection.

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.

AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing

stat.ML · 2026-06-18 · unverdicted · novelty 5.0

AURA is an adaptive uncertainty-aware refinement method for auditing LLM-as-a-judge pairwise decisions that learns human-consistency signals through selective human verification on uncertain cases.

Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators

cs.AI · 2026-06-05 · unverdicted · novelty 5.0

LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Auditing Stealth Sycophancy in Mental-Health Dialogue: Structured Clinical-State Diagnostics and Clean Matched Benchmarks cs.CL · 2026-05-05 · unverdicted · none · ref 25
Introduces a clean matched benchmark and Dynamic Emotional Signature Graphs (DESG) framework that detects implicit sycophancy via clinical-state transitions and reports a 0.0488 macro-F1 gain over baselines on harmful-risk detection.
Debiasing Reward Models via Causally Motivated Inference-Time Intervention cs.CL · 2026-04-30 · unverdicted · none · ref 24
Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.
AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing stat.ML · 2026-06-18 · unverdicted · none · ref 34
AURA is an adaptive uncertainty-aware refinement method for auditing LLM-as-a-judge pairwise decisions that learns human-consistency signals through selective human verification on uncertain cases.
Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators cs.AI · 2026-06-05 · unverdicted · none · ref 97
LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.

In: Findings of the Association for Computational Lin- guistics: EMNLP 2024

fields

years

verdicts

representative citing papers

citing papers explorer