Recognition: no theorem link
Position: Avoid Overstretching LLMs for every Enterprise Task
Pith reviewed 2026-05-12 03:24 UTC · model grok-4.3
The pith
Language models should serve only as extraction interfaces in enterprise workflows, with knowledge and computation handled by dedicated knowledge bases and symbolic systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Finite-capacity models cannot fully capture the breadth of knowledge required for enterprise tasks, creating inherent limits to efficiency and interpretability. Therefore, language models should primarily be used for structured extraction in deterministic enterprise workflows, while computation and storage are delegated to knowledge bases and symbolic procedures, resulting in modular architectures that are more reliable and maintainable than monolithic frameworks.
What carries the argument
The modular architecture that treats language models as interfaces for structured extraction, externalizing knowledge to dedicated bases and computation to symbolic procedures.
Load-bearing premise
Enterprise workloads are dominated by deterministic, structured, knowledge-dependent tasks under strict cost, latency, and reliability constraints that finite models cannot handle.
What would settle it
Demonstrating a real enterprise workflow where a single fine-tuned LLM matches or exceeds the reliability, cost, and latency of a modular extraction-plus-knowledge-base system while handling equivalent knowledge breadth.
Figures
read the original abstract
Enterprise workloads are dominated by deterministic, structured, and knowledge-dependent tasks operating under strict cost, latency, and reliability constraints. While these are often addressed through large language model (LLM) deployment or distillation into smaller models, we argue this is inefficient, unreliable, and misaligned with enterprise task structures. Instead, AI systems should treat language models as interfaces rather than monolithic engines, externalizing knowledge and computation into dedicated components for greater reliability, scalability, and transparency. Our theoretical evidences show that finite-capacity models cannot fully capture the breadth of knowledge required for enterprise tasks, creating inherent limits to efficiency and interpretability. Building on this, we take the position that language models should primarily be used for structured extraction in deterministic enterprise workflows, while computation and storage are delegated to knowledge bases and symbolic procedures. We formally demonstrate that such modular architectures are more reliable and maintainable than monolithic frameworks, offering a sustainable foundation for enterprise tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that enterprise workloads consist primarily of deterministic, structured, and knowledge-dependent tasks subject to strict cost, latency, and reliability constraints. It argues that deploying LLMs (or distilling them) for these tasks is inefficient and unreliable, and instead advocates treating LLMs solely as interfaces for structured extraction while delegating computation and storage to knowledge bases and symbolic procedures. The authors assert that finite-capacity models cannot capture enterprise knowledge breadth and claim to provide theoretical evidence and a formal demonstration that modular architectures are more reliable and maintainable than monolithic LLM frameworks.
Significance. If the position is substantiated with the promised evidence, it could meaningfully shape enterprise AI deployment practices by encouraging hybrid modular designs that prioritize reliability, transparency, and scalability over end-to-end LLM usage. The argument addresses a timely practical concern in applied AI and could stimulate discussion on architectural choices in constrained environments. However, the current manuscript supplies no supporting formal content, limiting its immediate contribution to the literature.
major comments (2)
- [Abstract] Abstract: the manuscript asserts 'theoretical evidences' and a 'formal demonstration' that finite-capacity models cannot capture enterprise knowledge breadth and that modular architectures are more reliable, yet the text contains no equations, proofs, theorems, empirical data, or derivations to support these central claims. This is load-bearing because the position rests entirely on the unshown arguments rather than on general premises alone.
- [Abstract] Abstract: the foundational premise that 'Enterprise workloads are dominated by deterministic, structured, and knowledge-dependent tasks' is stated without references, statistics, or case studies, which directly underpins the recommendation to restrict LLMs to extraction roles and is therefore load-bearing for the architectural position.
minor comments (1)
- [Abstract] The phrasing 'theoretical evidences' is grammatically nonstandard and should be revised to 'theoretical evidence' or 'theoretical arguments'.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the practical relevance of the position. We agree that the abstract's phrasing overpromises on formality and that the core premise requires better grounding. We will revise the manuscript to address these issues while preserving its nature as a position paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: the manuscript asserts 'theoretical evidences' and a 'formal demonstration' that finite-capacity models cannot capture enterprise knowledge breadth and that modular architectures are more reliable, yet the text contains no equations, proofs, theorems, empirical data, or derivations to support these central claims. This is load-bearing because the position rests entirely on the unshown arguments rather than on general premises alone.
Authors: We accept this criticism. The manuscript is a position paper whose arguments rest on conceptual reasoning about model capacity limits and the mismatch between monolithic LLMs and structured enterprise tasks, rather than on new theorems or experiments. We will revise the abstract to replace 'theoretical evidences' and 'formal demonstration' with 'conceptual arguments' and 'reasoned analysis'. The main text will be expanded with additional elaboration on these points and citations to existing literature on neural network capacity and hybrid symbolic-neural systems. We do not believe formal proofs are necessary or appropriate for this format, but we will make the supporting logic more explicit. revision: partial
-
Referee: [Abstract] Abstract: the foundational premise that 'Enterprise workloads are dominated by deterministic, structured, and knowledge-dependent tasks' is stated without references, statistics, or case studies, which directly underpins the recommendation to restrict LLMs to extraction roles and is therefore load-bearing for the architectural position.
Authors: This observation is correct. The premise is based on patterns from enterprise deployments and industry practice, but the draft provides no supporting citations. In revision we will add references to relevant surveys, reports on robotic process automation adoption, and studies of knowledge-intensive workflows to substantiate the claim. We will also qualify the language if needed to reflect that the dominance holds for many, though not all, enterprise tasks. revision: yes
Circularity Check
No significant circularity; position paper rests on explicit premises
full rationale
The manuscript is a position paper whose central recommendation (LLMs as extraction interfaces with externalized knowledge and symbolic components) is advanced from stated premises about deterministic enterprise tasks, strict constraints, and finite model capacity. The abstract's references to 'theoretical evidences' and 'formal demonstration' are argumentative summaries of the position rather than mathematical derivations, equations, or fitted quantities. No self-citations, ansatzes, uniqueness theorems, or renamings appear in the provided text that reduce any claim to its own inputs by construction. The argument is self-contained against external benchmarks of task structure and model limits, with no load-bearing internal reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Finite-capacity models cannot fully capture the breadth of knowledge required for enterprise tasks
Reference graph
Works this paper leans on
-
[1]
Reducing hallucination in structured outputs via retrieval-augmented generation
Patrice Béchard and Orlando Marquez Ayala. Reducing hallucination in structured outputs via retrieval-augmented generation. InNAACL (Industry Track), 2024
work page 2024
-
[2]
Separations in the representational capabilities of transformers and recurrent architectures
Satwik Bhattamishra, Michael Hahn, Phil Blunsom, and Varun Kanade. Separations in the representational capabilities of transformers and recurrent architectures. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[3]
Enterprise ai adoption: Balancing innovation and roi in 2026
BizzDesign. Enterprise ai adoption: Balancing innovation and roi in 2026. BizzDesign blog,
work page 2026
-
[4]
Published Jan 27, 2026
work page 2026
-
[5]
Ai trends 2025: Adoption barriers and updated predictions
Deloitte. Ai trends 2025: Adoption barriers and updated predictions. Deloitte US AI Pulse Check Series, 2025
work page 2025
-
[6]
State of ai in the enterprise 2026
Deloitte. State of ai in the enterprise 2026. Deloitte Global, 2026. Survey of 3,235 senior leaders across 24 countries, conducted Aug–Sep 2025
work page 2026
-
[7]
Fixing it in post: A comparative study of llm post-training data quality and model performance
Aladin Djuhera, Swanand Ravindra Kadhe, Syed Zawad, Farhan Ahmed, Heiko Ludwig, and Holger Boche. Fixing it in post: A comparative study of llm post-training data quality and model performance. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025
work page 2025
-
[8]
Ai adoption outpaces governance: Responsible ai pulse survey
Ernst & Young. Ai adoption outpaces governance: Responsible ai pulse survey. EY Global Survey, 2025
work page 2025
-
[9]
Gartner predicts over 40% of agentic ai projects will be canceled by end of 2027
Gartner. Gartner predicts over 40% of agentic ai projects will be canceled by end of 2027. Gartner Press Release, 2025
work page 2027
-
[10]
Š. Hána and B. Lameijer. Ai-based systems adoption in business operations: barriers and performance effects.Operations Management Research, 2025
work page 2025
-
[11]
Overcoming the organizational barriers to ai adoption
Harvard Business Review. Overcoming the organizational barriers to ai adoption. Harvard Business Review article, 2025. Published Nov 11, 2025
work page 2025
-
[12]
State of enterprise ai adoption report 2025
Information Services Group (ISG). State of enterprise ai adoption report 2025. ISG report,
work page 2025
-
[13]
Analysis of 1,200 generative, agentic, and traditional AI use cases
-
[14]
E. Karakurt and A. Akbulut. Retrieval-augmented generation and large language models for enterprise knowledge management: A systematic literature review.Applied Sciences, 2025
work page 2025
-
[15]
Foundation models for tabular data within systemic contexts need grounding, 2025
Tassilo Klein and Johannes Hoffart. Foundation models for tabular data within systemic contexts need grounding, 2025
work page 2025
-
[16]
Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021. 17
work page 2021
-
[17]
Andy Lo, Albert Q Jiang, Wenda Li, and Mateja Jamnik. End-to-end ontology learning with large language models.Advances in Neural Information Processing Systems, 37:87184–87225, 2024
work page 2024
-
[18]
Zhongyuan Lyu, Shuoyu Hu, Lujie Liu, Hongxia Yang, and Ming LI. Canonical intermediate representation for llm-based optimization problem formulation and code generation.arXiv preprint arXiv:2602.02029, 2026
-
[19]
The state of ai: Global survey 2025, 2025
McKinsey & Company. The state of ai: Global survey 2025, 2025
work page 2025
-
[20]
2025: The state of ai in healthcare
Menlo Ventures. 2025: The state of ai in healthcare. Menlo Ventures report, 2025. Survey of 700+ healthcare executives, conducted Aug–Sep 2025
work page 2025
-
[21]
OpenAI. The state of enterprise ai. OpenAI report, 2025. Based on usage telemetry from 9,000 workers across nearly 100 enterprises
work page 2025
-
[22]
OpenText, Capgemini, and Sogeti. merging worlds. Industry survey report, OpenText Cor- poration, 2025. Survey of enterprise leaders on AI adoption in quality engineering practices. Reports that while nearly 90% of organizations pursue generative AI in quality engineering, only 15% have achieved enterprise-scale deployment. Top barriers include data privac...
work page 2025
-
[23]
The GenAI Divide: State of AI in Business 2025
Pepper Foster. The artificial intelligence (ai) roi report. Pepper Foster report, 2025. Cites MIT study “The GenAI Divide: State of AI in Business 2025”
work page 2025
-
[24]
E. Romeo and J. Lacko. Adoption and integration of ai in organizations: a systematic review of challenges and drivers.Kybernetes, 2025
work page 2025
-
[25]
Ai adoption is soaring, but few companies are measuring its impact
S&P Global Sustainable1. Ai adoption is soaring, but few companies are measuring its impact. S&P Global Insights, 2025
work page 2025
-
[26]
Lena Strobl, William Merrill, Gail Weiss, David Chiang, and Dana Angluin. What formal lan- guages can transformers express? a survey.Transactions of the Association for Computational Linguistics, 2024
work page 2024
-
[27]
Claude ai agent deletes firm database in seconds.The Guardian, April 2026
The Guardian. Claude ai agent deletes firm database in seconds.The Guardian, April 2026. Accessed: 2026-05-02
work page 2026
-
[28]
Enterprise ai adoption and roi: Three-year executive study
Vistage and Wharton Human-AI Research / GBK Collective. Enterprise ai adoption and roi: Three-year executive study. Wharton/Vistage report, 2025. Survey of 800 US executives, June 26–July 11, 2025
work page 2025
-
[29]
The state of digital adoption 2025 (special ai edition)
WalkMe. The state of digital adoption 2025 (special ai edition). WalkMe Research Report, 2025
work page 2025
-
[30]
Lumina: Detecting hallucinations in rag system with context–knowledge signals
Samuel Yeh, Sharon Li, and Tanwi Mallick. Lumina: Detecting hallucinations in rag system with context–knowledge signals. InSocially Responsible and Trustworthy Foundation Models at NeurIPS 2025, 2025
work page 2025
-
[31]
Yedi Zhang, Yufan Cai, Xinyue Zuo, Xiaokun Luan, Kailong Wang, Zhe Hou, Yifan Zhang, Zhiyuan Wei, Meng Sun, Jun Sun, et al. Position: Trustworthy ai agents require the integration of large language models and formal methods. InForty-second International Conference on Machine Learning Position Paper Track, 2025. 18 A Technical Appendix Enterprise applicati...
work page 2025
-
[32]
6SLM surveys document systematic OOD failures
Small models are strong on compression-friendly tasks (patterns with low entropy). 6SLM surveys document systematic OOD failures. 7Compositionality failures in multimodal VLM/SLM systems confirm this limit. 8RAG surveys highlight that retrieval is essential for SLM factuality. 9Transformer–RNN separations show that small transformers require large width f...
-
[33]
Large models exhibit emergent behaviors: they cross the information threshold
-
[34]
Retrieval-augmented systems outperform parametric SLM-only systems: retrieval expands the information channel
-
[35]
Prompting cannot substitute for parametric deficiency: prompts only add input-information I(X), not model-informationI(W). Across domains, empirical findings confirm the information-theoretic predictions: SLMs have limited mutual information with high-complexity tasks, cannot generalize OOD without external information, and cannot simulate algorithmic str...
-
[36]
The teacher compresses internal computationRinto an outputY
-
[37]
The student compresses the teacher’s behavior into fewer parameters. If the task requires high information complexity (e.g., many latent states or steps), this double compression tends to privilege superficial heuristics over algorithmic fidelity. D.5 Limitations of Distillation Without Rationale We now state the main limitations that follow from the abov...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.