One-eval: An agentic system for automated and traceable llm evaluation.arXiv preprint arXiv:2603.09821, 2026

Chengyu Shen, Yanheng Hou, Minghui Pan, Runming He, Zhen Hao Wong, Meiyi Qiang, Zhou Liu, Hao Liang, Peichao Lai, Zeang Sheng, et al · 2026 · arXiv 2603.09821

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Benchmark Everything Everywhere All at Once

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Benchmark Agent is an autonomous agentic system that constructs benchmarks for LLMs and MLLMs via query analysis, subtask design, annotation and quality control, yielding 15 benchmarks with minimal human input.

citing papers explorer

Showing 1 of 1 citing paper.

Benchmark Everything Everywhere All at Once cs.AI · 2026-06-04 · unverdicted · none · ref 35
Benchmark Agent is an autonomous agentic system that constructs benchmarks for LLMs and MLLMs via query analysis, subtask design, annotation and quality control, yielding 15 benchmarks with minimal human input.

One-eval: An agentic system for automated and traceable llm evaluation.arXiv preprint arXiv:2603.09821, 2026

fields

years

verdicts

representative citing papers

citing papers explorer