Introduces an audit-constrained protocol for targeted LLM reasoning tests using finite component grammar and compares score-based CAPS sampling against uniform sampling under matched budgets, finding no audited-yield improvement.
Transactions on Machine Learning Research , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.
citing papers explorer
-
Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol
Introduces an audit-constrained protocol for targeted LLM reasoning tests using finite component grammar and compares score-based CAPS sampling against uniform sampling under matched budgets, finding no audited-yield improvement.
-
Language models fail at extended rule following
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.