pith. sign in

One-eval: An agentic system for automated and traceable llm evaluation.arXiv preprint arXiv:2603.09821, 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Benchmark Everything Everywhere All at Once

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

Benchmark Agent is an autonomous agentic system that constructs benchmarks for LLMs and MLLMs via query analysis, subtask design, annotation and quality control, yielding 15 benchmarks with minimal human input.

citing papers explorer

Showing 1 of 1 citing paper.

  • Benchmark Everything Everywhere All at Once cs.AI · 2026-06-04 · unverdicted · none · ref 35

    Benchmark Agent is an autonomous agentic system that constructs benchmarks for LLMs and MLLMs via query analysis, subtask design, annotation and quality control, yielding 15 benchmarks with minimal human input.