pith. sign in

Smith, and Yanai Elazar

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.AI 1 cs.CL 1

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

TASTE automates generation of high-coverage difficult agent benchmarks via adaptive contrastive n-gram sampling of tool sequences, yielding τ^c-Bench where models saturating τ²-Bench drop sharply and unique tool combinations more than double.

Measuring Form and Function in Language Models

cs.CL · 2026-05-27 · unverdicted · novelty 5.0

Proposes CAC prompting to benchmark language models on syntactic and discourse properties of determiners against child acquisition data, finding large models approach but do not match human performance on both.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks cs.AI · 2026-05-27 · unverdicted · none · ref 41

    TASTE automates generation of high-coverage difficult agent benchmarks via adaptive contrastive n-gram sampling of tool sequences, yielding τ^c-Bench where models saturating τ²-Bench drop sharply and unique tool combinations more than double.