Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation

Multi-agent-as-judge: Aligning LLM-agent-based automated evaluation with multi-dimensional human evaluation , author= · 2025 · arXiv 2507.21028

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

CollabSim is a new CSCW-grounded simulation framework that enables controlled multi-agent experiments to measure collaborative competence in LLM agents.

LLM-as-a-Judge in Healthcare: A Scoping Analysis of Applications, Methods, and Human Alignment

cs.CY · 2026-05-24 · unverdicted · novelty 6.0

Scoping review of 134 studies on LLM-as-a-Judge in healthcare finds concentration in clinical decision support and NLP, frequent use of OpenAI models with prompt engineering, and moderate-to-strong human alignment where validated.

Self-Refining Topology Optimization via an LLM-Based Multi-Agent Framework

cs.MA · 2026-05-22 · unverdicted · novelty 6.0

TopOptAgents deploys six LLM agents in self-refining loops to automate the full topology optimization workflow and succeeds on problem classes where single LLMs fail.

SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems

cs.AI · 2026-04-17 · unverdicted · novelty 6.0

SocialGrid benchmark shows even top LLMs achieve below 60% in embodied planning and task completion, with deception detection near random chance regardless of model scale.

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

cs.LG · 2026-04-25

citing papers explorer

Showing 1 of 1 citing paper after filters.

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation cs.LG · 2026-04-25 · unreviewed · ref 13

Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer