Evalassist: Llm-as-a-judge simplified,

· 2025 · DOI 10.1609/aaai.v39i28.35351

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Presents cue interventions and tie-aware metrics to detect rationalization bias in LLM judges and demonstrates that PROOF-BEFORE-PREFERENCE reduces cue anchoring compared to baselines.

Dialectics of Alignment: Harnessing Unsafe Knowledge for Dynamic Safety Routing

cs.LG · 2026-05-30 · unverdicted · novelty 5.0

SafeMoE isolates unsafe knowledge in domain-specific LoRA experts and routes them via a lightweight gate trained on safe responses to produce safer and more informative LLM outputs with zero-shot generalization.

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

cs.AI · 2026-06-01 · unverdicted · novelty 4.0

BADGER is a new enterprise evaluation framework that adds LLM-assisted SQL component extraction and a Hybrid-EX metric validated on 150 human-annotated queries to existing text-to-SQL and agentic assessment methods.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges cs.CL · 2026-05-13 · unverdicted · none · ref 5
Presents cue interventions and tie-aware metrics to detect rationalization bias in LLM judges and demonstrates that PROOF-BEFORE-PREFERENCE reduces cue anchoring compared to baselines.
Dialectics of Alignment: Harnessing Unsafe Knowledge for Dynamic Safety Routing cs.LG · 2026-05-30 · unverdicted · none · ref 21
SafeMoE isolates unsafe knowledge in domain-specific LoRA experts and routes them via a lightweight gate trained on safe responses to produce safer and more informative LLM outputs with zero-shot generalization.
BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning cs.AI · 2026-06-01 · unverdicted · none · ref 4
BADGER is a new enterprise evaluation framework that adds LLM-assisted SQL component extraction and a Hybrid-EX metric validated on 150 human-annotated queries to existing text-to-SQL and agentic assessment methods.

Evalassist: Llm-as-a-judge simplified,

fields

years

verdicts

representative citing papers

citing papers explorer