pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

The Evaluation Trap: Benchmark Design as Theoretical Commitment

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.

citing papers explorer

Showing 1 of 1 citing paper.

  • The Evaluation Trap: Benchmark Design as Theoretical Commitment cs.AI · 2026-05-13 · unverdicted · none · ref 27

    AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.