Why Do Large Language Models (

Fu, Tairan, Ferrando, Raquel, Conde, Javier, Arriaga, Carlos, Reviriego, Pedro , journal = · 2024 · arXiv 2412.18626

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.

The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.

Understanding Counting Mechanisms in Large Language and Vision-Language Models

cs.CV · 2025-11-21 · unverdicted · novelty 6.0

LLMs and LVLMs encode latent positional count information in individual tokens or visual features, with an internal counter mechanism that updates per item and emerges progressively across layers, relying on structural cues like separators.

IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation

cs.CL · 2025-11-02 · unverdicted · novelty 6.0

IF-CRITIC is a fine-grained LLM critic using checklist generation and constraint-level preference optimization that outperforms strong baselines like o4-mini in instruction-following evaluation while enabling lower-cost model optimization.

Language models fail at extended rule following

cs.CL · 2026-05-03 · unverdicted · novelty 5.0

LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.

citing papers explorer

Showing 5 of 5 citing papers.

The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime cs.AI · 2026-05-11 · unverdicted · none · ref 15
AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
The Position Curse: LLMs Struggle to Locate the Last Few Items in a List cs.LG · 2026-05-08 · unverdicted · none · ref 6
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.
Understanding Counting Mechanisms in Large Language and Vision-Language Models cs.CV · 2025-11-21 · unverdicted · none · ref 8
LLMs and LVLMs encode latent positional count information in individual tokens or visual features, with an internal counter mechanism that updates per item and emerges progressively across layers, relying on structural cues like separators.
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation cs.CL · 2025-11-02 · unverdicted · none · ref 1
IF-CRITIC is a fine-grained LLM critic using checklist generation and constraint-level preference optimization that outperforms strong baselines like o4-mini in instruction-following evaluation while enabling lower-cost model optimization.
Language models fail at extended rule following cs.CL · 2026-05-03 · unverdicted · none · ref 49
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.

Why Do Large Language Models (

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer