← back to paper
arxiv: 2605.10075 · 2 revisions
Active Testing of Large Language Models via Approximate Neyman Allocation