Smith, and Yanai Elazar

William Merrill, Noah A · 2024 · DOI 10.18653/v1/2024.emnlp-main.800

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

TASTE automates generation of high-coverage difficult agent benchmarks via adaptive contrastive n-gram sampling of tool sequences, yielding τ^c-Bench where models saturating τ²-Bench drop sharply and unique tool combinations more than double.

Measuring Form and Function in Language Models

cs.CL · 2026-05-27 · unverdicted · novelty 5.0

Proposes CAC prompting to benchmark language models on syntactic and discourse properties of determiners against child acquisition data, finding large models approach but do not match human performance on both.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks cs.AI · 2026-05-27 · unverdicted · none · ref 41
TASTE automates generation of high-coverage difficult agent benchmarks via adaptive contrastive n-gram sampling of tool sequences, yielding τ^c-Bench where models saturating τ²-Bench drop sharply and unique tool combinations more than double.

Smith, and Yanai Elazar

fields

years

verdicts

representative citing papers

citing papers explorer