Brain Score remains similar when language models are trained on diverse natural languages or on structured non-language data like DNA and code, indicating the metric tracks shared structural extraction but is not diagnostic of human-like language processing.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.CL 3verdicts
UNVERDICTED 3representative citing papers
RExBench is a new benchmark showing that LLM coding agents fail to autonomously implement most realistic research extensions to prior AI papers.
Language models can support formal generative linguistic theories, expanding testable theories and potentially reconciling them with usage-based accounts.
citing papers explorer
-
RExBench: Can coding agents autonomously implement AI research extensions?
RExBench is a new benchmark showing that LLM coding agents fail to autonomously implement most realistic research extensions to prior AI papers.