AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
Why Do Large Language Models (
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.
LLMs and LVLMs encode latent positional count information in individual tokens or visual features, with an internal counter mechanism that updates per item and emerges progressively across layers, relying on structural cues like separators.
IF-CRITIC is a fine-grained LLM critic using checklist generation and constraint-level preference optimization that outperforms strong baselines like o4-mini in instruction-following evaluation while enabling lower-cost model optimization.
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.
citing papers explorer
-
The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
-
The Position Curse: LLMs Struggle to Locate the Last Few Items in a List
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.
-
Understanding Counting Mechanisms in Large Language and Vision-Language Models
LLMs and LVLMs encode latent positional count information in individual tokens or visual features, with an internal counter mechanism that updates per item and emerges progressively across layers, relying on structural cues like separators.
-
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation
IF-CRITIC is a fine-grained LLM critic using checklist generation and constraint-level preference optimization that outperforms strong baselines like o4-mini in instruction-following evaluation while enabling lower-cost model optimization.
-
Language models fail at extended rule following
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.