Recognition: 2 theorem links
· Lean TheoremSmall Language Models are the Future of Agentic AI
Pith reviewed 2026-05-16 11:50 UTC · model grok-4.3
The pith
Small language models will replace large ones in most agentic AI applications due to better suitability and economy for specialized tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agentic AI systems perform a small number of specialized tasks repetitively and with little variation. For these systems, small language models are sufficiently powerful, inherently more suitable, and necessarily more economical than large language models, establishing them as the future of agentic AI. In cases where general-purpose conversational abilities remain essential, heterogeneous agentic systems that invoke multiple different models offer the natural solution.
What carries the argument
The position that small language models are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, supported by an LLM-to-SLM agent conversion algorithm.
If this is right
- Agentic systems can reach comparable performance at a fraction of current inference costs.
- Development efforts will shift toward fine-tuning small models for specific agent roles rather than scaling model size.
- Heterogeneous designs will become standard, routing routine tasks to small models and reserving larger models for complex reasoning.
- Industry-wide operational expenses for running AI agents will drop even as the number of deployed agents grows.
Where Pith is reading between the lines
- Local execution of agents on smaller models could reduce latency and improve data privacy by limiting cloud dependence.
- Specialized small models may accelerate creation of domain-specific agents for common subtasks such as tool use or planning.
- Overall compute requirements for large-scale agent deployments may stabilize despite continued growth in agent numbers.
Load-bearing premise
The specialized, low-variation tasks in current and near-future agentic systems do not require the full general capabilities that only large models currently provide.
What would settle it
A controlled study showing that replacing the language-model component in representative agentic workflows with small models produces substantially lower task-completion rates or requires frequent human intervention to correct errors.
read the original abstract
Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI systems is, however, ushering in a mass of applications in which language models perform a small number of specialized tasks repetitively and with little variation. Here we lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI. Our argumentation is grounded in the current level of capabilities exhibited by SLMs, the common architectures of agentic systems, and the economy of LM deployment. We further argue that in situations where general-purpose conversational abilities are essential, heterogeneous agentic systems (i.e., agents invoking multiple different models) are the natural choice. We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm. Our position, formulated as a value statement, highlights the significance of the operational and economic impact even a partial shift from LLMs to SLMs is to have on the AI agent industry. We aim to stimulate the discussion on the effective use of AI resources and hope to advance the efforts to lower the costs of AI of the present day. Calling for both contributions to and critique of our position, we commit to publishing all such correspondence at https://research.nvidia.com/labs/lpr/slm-agents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical than large language models (LLMs) for the specialized, repetitive, low-variation tasks typical in agentic AI systems, positioning SLMs as the future of agentic AI. It grounds the position in current SLM capabilities, common agent architectures, and deployment economics; recommends heterogeneous systems (invoking multiple model sizes) when general conversational abilities are required; discusses adoption barriers; and outlines a high-level LLM-to-SLM agent conversion algorithm. The work is framed as a value statement intended to stimulate discussion on efficient AI resource use.
Significance. If the central position holds, the paper could have meaningful operational and economic impact by encouraging a shift toward lower-cost SLM deployments in agentic systems, reducing overall AI inference expenses across the industry. The explicit commitment to publishing all correspondence on the position at a public URL is a constructive element that supports open scientific dialogue.
major comments (3)
- [Abstract and sections on current capabilities and agent architectures] The core claim that SLMs are 'sufficiently powerful' for many agentic invocations (abstract and opening sections) rests entirely on qualitative assessment of 'current level of capabilities' without any quantitative benchmarks, controlled head-to-head comparisons, task decompositions, or failure-mode analyses showing where SLM performance remains adequate inside real agent loops.
- [Sections grounding the position in capabilities and architectures] The assertion that specialized low-variation tasks 'do not require the full general capabilities that only large models currently provide' (abstract and argumentation sections) is presented as an observational premise but receives no empirical support via metrics on distributional shift, ambiguity handling, or multi-step error accumulation in agentic settings.
- [Section outlining the LLM-to-SLM conversion algorithm] The outlined LLM-to-SLM agent conversion algorithm (section on conversion) is described only at a conceptual level with no pseudocode, concrete steps, implementation details, or validation examples, rendering it non-actionable for the practical adoption the paper advocates.
minor comments (2)
- [Abstract and introduction] The phrase 'inherently more suitable' is used repeatedly without an explicit definition or list of suitability criteria (e.g., latency, memory footprint, fine-tuning ease) that would allow readers to evaluate the claim.
- [Section on potential barriers] Barriers to SLM adoption are enumerated but not ranked by severity or illustrated with concrete deployment scenarios, which would strengthen the practical discussion.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential operational impact of our position. Our manuscript is explicitly framed as a value statement to stimulate discussion on efficient AI resource use, rather than an empirical study. We address each major comment below and indicate where revisions will be made.
read point-by-point responses
-
Referee: [Abstract and sections on current capabilities and agent architectures] The core claim that SLMs are 'sufficiently powerful' for many agentic invocations (abstract and opening sections) rests entirely on qualitative assessment of 'current level of capabilities' without any quantitative benchmarks, controlled head-to-head comparisons, task decompositions, or failure-mode analyses showing where SLM performance remains adequate inside real agent loops.
Authors: We acknowledge that the claims rely on qualitative assessment of existing SLM capabilities rather than new quantitative benchmarks or controlled experiments. This aligns with the paper's purpose as a position statement grounded in observed capabilities, common agent architectures, and deployment economics, not a benchmark paper. We will revise the abstract and opening sections to explicitly cite relevant existing literature on SLM performance in specialized and repetitive tasks, while clarifying that the position is intended to highlight trends and economics rather than prove sufficiency through new data. revision: partial
-
Referee: [Sections grounding the position in capabilities and architectures] The assertion that specialized low-variation tasks 'do not require the full general capabilities that only large models currently provide' (abstract and argumentation sections) is presented as an observational premise but receives no empirical support via metrics on distributional shift, ambiguity handling, or multi-step error accumulation in agentic settings.
Authors: The assertion is presented as an observational premise based on the repetitive, low-variation nature of tasks that dominate agentic systems. We do not provide new empirical metrics on distributional shift or error accumulation, as the work is not an empirical evaluation. In revision we will expand the relevant sections with additional examples from current agent architectures and citations to studies showing effective handling of such tasks by smaller models, to better support the premise without altering the position-paper framing. revision: partial
-
Referee: [Section outlining the LLM-to-SLM conversion algorithm] The outlined LLM-to-SLM agent conversion algorithm (section on conversion) is described only at a conceptual level with no pseudocode, concrete steps, implementation details, or validation examples, rendering it non-actionable for the practical adoption the paper advocates.
Authors: We agree that the high-level description of the conversion algorithm would benefit from greater concreteness to support the practical adoption we advocate. We will revise the section to include pseudocode, a list of concrete steps, and a brief illustrative example based on a standard agent task to make the algorithm more actionable. revision: yes
Circularity Check
No circularity: observational position paper with no derivations or self-referential steps
full rationale
The manuscript is a position paper that advances an argumentative claim about SLMs in agentic systems. It supplies no equations, no fitted parameters, no uniqueness theorems, and no derivations that could reduce to their own inputs. All grounding is stated as observational (current SLM capabilities, common agent architectures, deployment economics) without any self-citation load-bearing on the central thesis or any renaming of known results as new predictions. The argument is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 21 Pith papers
-
Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models
A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.
-
Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms
Single-agent systems with tools provide the optimal performance-efficiency trade-off for small language models, outperforming base models and multi-agent setups.
-
Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity
AMuFC improves multimodal fact-checking accuracy by adaptively determining visual evidence necessity via a dedicated Analyzer before verification rather than always incorporating images.
-
Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity
An adaptive multimodal fact-checking system improves accuracy by having an Analyzer determine when visual evidence is necessary before the Verifier assesses claim veracity.
-
Beyond Case Law: Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA
SearchFireSafety benchmark shows graph-guided retrieval improves statute-centric legal QA but domain-adapted models hallucinate more when statutory evidence is missing.
-
SANet: A Semantic-aware Agentic AI Networking Framework for Cross-layer Optimization in 6G
SANet uses semantic-aware AI agents for cross-layer 6G optimization, achieving up to 14.61% performance gains with 44.37% of the FLOPs of prior methods via model partitioning and decentralized multi-objective algorithms.
-
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
Pretrained base models exhibit higher yield to peer disagreement than RLHF instruct variants, with the effect localized to mid-layer attention and mitigated by structured dissent rather than prompt defenses.
-
GRAIL: A Deep-Granularity Hybrid Resonance Framework for Real-Time Agent Discovery via SLM-Enhanced Indexing
GRAIL achieves over 79 times lower latency than LLM-parsing baselines and higher Recall@10 than vector search by combining SLM-enhanced prediction, pseudo-document expansion, and MaxSim resonance on the new AgentTaxo-...
-
AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?
Small open-weight models match GPT-5 on routine agent tool-use tasks but lag on long-horizon planning, supporting tiered routing to reduce costs in agentic systems.
-
SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving
Token-wise INT4 KV-cache quantization plus block-diagonal Hadamard rotation recovers nearly all accuracy lost by naive INT4 while adding zero end-to-end overhead under paged serving constraints.
-
Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents
A fine-tuning policy trains small language models to search reliably and use evidence, improving multi-hop QA performance by 15-17 points to reach large-model levels.
-
Language Markers of Emotion Flexibility Predict Depression and Anxiety Treatment Outcomes
Emotion dynamics from therapy transcripts, extracted via transformers and clustered with state-space models, distinguish improving patients from non-responders who show higher odds of symptom worsening.
-
EmbeddingGemma: Powerful and Lightweight Text Representations
A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.
-
Cognitive Agent Compilation for Explicit Problem Solver Modeling
Cognitive Agent Compilation uses a teacher LLM to create explicit, inspectable problem-solving agents by separating knowledge, policy, and verification components for educational applications.
-
SHIELD: A Diverse Clinical Note Dataset and Distilled Small Language Models for Enterprise-Scale De-identification
SHIELD dataset and distilled DeBERTa v3 model achieve 0.88 micro precision and 0.86 recall on PHI de-identification while matching teacher performance on structured categories.
-
Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems
A hybrid system augments LLMs with an automated external RDF/OWL ontology layer for long-term memory, SHACL/OWL validation, and improved multi-step reasoning on tasks like Tower of Hanoi.
-
A pragmatic approach to regulating AI agents
AI agents require distinct regulation as AI systems under the EU AI Act with orchestration-layer oversight and a risk-based traffic light authorization system in contract law to preserve human accountability.
-
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-...
-
Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP
The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.
-
TRACE: A Metrologically-Grounded Engineering Framework for Trustworthy Agentic AI Systems in Operationally Critical Domains
TRACE is a metrologically-grounded four-layer engineering framework for trustworthy agentic AI that enforces an ML-LLM split, stateful policies, human supervision, and a parsimony metric across critical domains.
-
Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?
A fine-tuned 4B model matches or exceeds frontier LLMs in terminal execution subagent tasks for coding agents, reducing main agent token usage by 30% with no performance loss.
Reference graph
Works this paper leans on
-
[1]
Aashima. Small language models vs. llms: Finding the right fit for your needs, October 2024. Accessed: 2025-05-09
work page 2024
-
[2]
ABBYY. Small language models vs. large language models, November 2024. Accessed: 2025-05-09
work page 2024
-
[3]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone.arXiv preprint arXiv:2404.14219, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
The economics of ai training and inference: How deepseek broke the cost curve, February 2025
Adyog. The economics of ai training and inference: How deepseek broke the cost curve, February 2025. Accessed: 2025-05-09
work page 2025
-
[5]
Delift: Data efficient language model instruction fine tuning.arXiv preprint arXiv:2411.04425, 2024
Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa, and Marina Danilevksy. Delift: Data efficient language model instruction fine tuning.arXiv preprint arXiv:2411.04425, 2024
-
[6]
Smollm2: When smol goes big – data-centric training of a small language model, 2025
Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíˇcek, Agustín Piqueres Lajarín, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Clémentine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von Wer...
work page 2025
-
[7]
Peter Belcak, Greg Heinrich, Jan Kautz, and Pavlo Molchanov. Minifinetuning: Low-data gen- eration domain adaptation through corrective self-distillation.arXiv preprint arXiv:2506.15702, 2025
-
[8]
Tiny transformers excel at sentence compression.arXiv preprint arXiv:2410.23510, 2024
Peter Belcak and Roger Wattenhofer. Tiny transformers excel at sentence compression.arXiv preprint arXiv:2410.23510, 2024
-
[9]
Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, et al. Nemotron-h: A family of accurate and efficient hybrid mamba- transformer models.arXiv preprint arXiv:2504.03624, 2025
-
[10]
Rae, Erich Elsen, and Laurent Sifre
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Bogdan Damoc, Aidan Clark, Jan Kramár, et al. Improving language models by retrieving from trillions of tokens.arXiv preprint arXiv:2112.04426, 2022
-
[11]
Michael Brennan, Sadia Afroz, and Rachel Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity.ACM Transactions on Information and System Security (TISSEC), 15(3):1–22, 2012
work page 2012
-
[12]
Flextron: Many-in-one flexible large language model
Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, and Pavlo Molchanov. Flextron: Many-in-one flexible large language model. InProceedings of the 41st International Conference on Machine Learning (ICML 2024), 2024
work page 2024
-
[13]
The state of ai in 2022—and a half decade in review, December 2022
Michael Chui, Bryce Hall, Helen Mayhew, Alex Singla, and Alexander Sukharevsky. The state of ai in 2022—and a half decade in review, December 2022. Accessed: 2025-05-09
work page 2022
-
[14]
Cloudera, Inc. 96% of enterprises are expanding use of ai agents, according to latest data from cloudera, April 2025. Accessed: 2025-05-08
work page 2025
-
[15]
Planck Collaboration et al. Planck 2018 results. vi. cosmological parameters.Astronomy & Astrophysics, 641:A6, 2020
work page 2018
-
[16]
2025 data center marketplace: Balancing unprecedented opportunity with strategic risk
Colliers. 2025 data center marketplace: Balancing unprecedented opportunity with strategic risk. U.s. research report, Colliers, 2025
work page 2025
- [17]
-
[18]
Badhan Chandra Das, M Hadi Amini, and Yanzhao Wu. Security and privacy challenges of large language models: A survey.ACM Computing Surveys, 57(6):1–39, 2025
work page 2025
-
[19]
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025
DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025
work page 2025
-
[20]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023
work page 2023
-
[21]
Shizhe Diao, Yu Yang, Yonggan Fu, Xin Dong, Dan Su, Markus Kliegl, Zijia Chen, Peter Belcak, Yoshi Suhara, Hongxu Yin, et al. Climb: Clustering-based iterative data mixture bootstrapping for language model pre-training.arXiv preprint arXiv:2504.13161, 2025
-
[22]
Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine Intelligence, 5(3):220–235, 2023
work page 2023
-
[23]
Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabalesh- warkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, et al. Hymba: A hybrid-head architecture for small language models.arXiv preprint arXiv:2411.13676, 2024
-
[24]
Amr Elmeleegy et al. Introducing nvidia dynamo, a low-latency distributed inference framework for scaling reasoning ai models, March 2025. NVIDIA Technical Blog
work page 2025
-
[25]
Henry Evans. Llms vs. slms: Balancing comprehensiveness and smart resource-saving, April
-
[27]
Barbara A Ferguson, Timothy A Dreisbach, Catherine G Parks, Gregory M Filip, and Craig L Schmitt. Coarse-scale population structure of pathogenic armillaria species in a mixed-conifer forest in the blue mountains of northeast oregon.Canadian Journal of Forest Research, 33(4):612–623, 2003
work page 2003
-
[28]
Amoeballm: Constructing any-shape large language models for efficient and instant deployment
Yonggan Fu, Zhongzhi Yu, Junwei Li, Jiayi Qian, Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, and Yingyan Celine Lin. Amoeballm: Constructing any-shape large language models for efficient and instant deployment. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), 2024
work page 2024
-
[29]
google. GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications
-
[30]
Text compression for efficient language generation.arXiv preprint arXiv:2503.11426, 2025
David Gu, Peter Belcak, and Roger Wattenhofer. Text compression for efficient language generation.arXiv preprint arXiv:2503.11426, 2025
-
[31]
Harrison Clarke. Large language models vs. small language models, March 2024. Accessed: 2025-05-09
work page 2024
-
[32]
arXiv preprint arXiv:2102.01293 , url=
Danny Hernandez, Jared Kaplan, Tom Henighan, and Sam McCandlish. Scaling laws for transfer.arXiv preprint arXiv:2102.01293, 2021
-
[33]
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arxiv 2021. arXiv preprint arXiv:2106.09685, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[35]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
work page 2022
-
[36]
Unsupervised fine-tuning for text clustering
Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, and Ming Zhou. Unsupervised fine-tuning for text clustering. InProceedings of the 28th international conference on computational linguistics, pages 5530–5534, 2020. 11
work page 2020
-
[37]
How small language models can outperform llms, March 2025
Invisible Technologies. How small language models can outperform llms, March 2025. Ac- cessed: 2025-05-21
work page 2025
-
[38]
Phi-2: The surprising power of small language models,
Mojan Javaheripi and Sébastien Bubeck. Phi-2: The surprising power of small language models,
-
[39]
Microsoft Research Blog
-
[40]
Andreas Jungherr. Artificial intelligence and democracy: A conceptual framework.Social media+ society, 9(3):20563051231186353, 2023
work page 2023
-
[41]
Understanding the total cost of inferencing large language models
Aviv Kaufmann. Understanding the total cost of inferencing large language models. Technical report, Enterprise Strategy Group, April 2024. Commissioned by Dell Technologies. Accessed: 2025-05-09
work page 2024
-
[42]
Matformer: Nested transformer for elastic inference.arXiv preprint arXiv:2310.07707, 2023
Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain, et al. Matformer: Nested transformer for elastic inference.arXiv preprint arXiv:2310.07707, 2023
-
[43]
From large to small: The rise of small language models (slms) in text analytics
Akshi Kumar. From large to small: The rise of small language models (slms) in text analytics. 2025
work page 2025
-
[44]
A comparative study on unsupervised feature selection methods for text clustering
Luying Liu, Jianchu Kang, Jing Yu, and Zhongliang Wang. A comparative study on unsupervised feature selection methods for text clustering. In2005 International Conference on Natural Language Processing and Knowledge Engineering, pages 597–601. IEEE, 2005
work page 2005
-
[45]
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[46]
Deja vu: Contextual sparsity for efficient llms at inference time
Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivas- tava, Ce Zhang, Yuandong Tian, Christopher Re, et al. Deja vu: Contextual sparsity for efficient llms at inference time. InInternational Conference on Machine Learning, pages 22137–22176. PMLR, 2023
work page 2023
-
[47]
Nunzio Lore, Sepehr Ilami, and Babak Heydari. Large model strategic thinking, small model ef- ficiency: transferring theory of mind in large language models.arXiv preprint arXiv:2408.05241, 2024
-
[48]
Autonomous generative ai agents: Under development.Deloitte Insights, November 2024
Jeff Loucks, Gillian Crossan, Baris Sarer, China Widener, and Ariane Bucaille. Autonomous generative ai agents: Under development.Deloitte Insights, November 2024. Accessed: 2025-05-08
work page 2024
-
[49]
Small language models: Survey, measurements, and insights.arXiv preprint arXiv:2409.15790, 2024
Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Xiwen Zhang, Nicholas D Lane, and Mengwei Xu. Small language models: Survey, measurements, and insights.arXiv preprint arXiv:2409.15790, 2024
-
[50]
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, et al. Large language model agent: A survey on methodology, applications and challenges.arXiv preprint arXiv:2503.21460, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
Brain size and ecology in small mammals.Journal of Zoology, 193(3):333–354, 1981
Georgina M Mace, Paul H Harvey, and Timothy H Clutton-Brock. Brain size and ecology in small mammals.Journal of Zoology, 193(3):333–354, 1981
work page 1981
-
[52]
A closer look at dynamo, nvidia’s ’operating system’ for ai inference, March
Tobias Mann. A closer look at dynamo, nvidia’s ’operating system’ for ai inference, March
-
[53]
Accessed: 2025-05-09
work page 2025
-
[54]
Market.us. Global agentic ai market size, share analysis by product type, agent role, agent system, end user, region and companies – industry segment outlook, market assessment, compe- tition scenario, trends and forecast 2025–2034, March 2025. Accessed: 2025-05-08
work page 2025
-
[55]
Tula Masterman, Sandi Besen, Mason Sawtell, and Alex Chao. The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey.arXiv preprint arXiv:2404.11584, 2024
-
[56]
How much energy do llms consume? unveiling the power behind ai, July 2024
Sourabh Mehta. How much energy do llms consume? unveiling the power behind ai, July 2024. Accessed: 2025-05-21. 12
work page 2024
-
[57]
Model cards and prompt formats: Llama 3.3, April 2025
Meta Platforms, Inc. Model cards and prompt formats: Llama 3.3, April 2025. Accessed: 2025-05-08
work page 2025
-
[58]
Understanding ai agents & data security, 2025
Metomic. Understanding ai agents & data security, 2025. Accessed: 2025-05-13
work page 2025
-
[59]
Agentic ai needs a systems theory.arXiv preprint arXiv:2503.00237, 2025
Erik Miehling, Karthikeyan Natesan Ramamurthy, Kush R Varshney, Matthew Riemer, Djallel Bouneffouf, John T Richards, Amit Dhurandhar, Elizabeth M Daly, Michael Hind, Prasanna Sattigeri, et al. Agentic ai needs a systems theory.arXiv preprint arXiv:2503.00237, 2025
-
[60]
Genai revenue growth and profitability, April 2025
Morgan Stanley. Genai revenue growth and profitability, April 2025. Accessed: 2025-05-08
work page 2025
-
[61]
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models.arXiv preprint arXiv:2307.06435, 2023
- [62]
-
[63]
Nvidia dynamo: A datacenter scale distributed inference serving framework
NVIDIA. Nvidia dynamo: A datacenter scale distributed inference serving framework. https: //github.com/ai-dynamo/dynamo, 2025. Accessed: 2025-05-09
work page 2025
-
[64]
tinybenchmarks: evaluating llms with fewer examples.arXiv preprint arXiv:2402.14992, 2024
Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, and Mikhail Yurochkin. tinybenchmarks: evaluating llms with fewer examples.arXiv preprint arXiv:2402.14992, 2024
-
[65]
Lakshmi Radhakrishnan, Gundolf Schenk, Kathleen Muenzen, Boris Oskotsky, Habibeh Ashouri Choshali, Thomas Plunkett, Sharat Israni, and Atul J Butte. A certified de-identification system for all clinical text documents for information extraction at scale.JAMIA open, 6(3):ooad045, 2023
work page 2023
-
[66]
Martin J Rees.Before the Beginning: Our Universe and Others. Addison-Wesley, 1997
work page 1997
-
[67]
An open source python library for anonymiz- ing sensitive data.Scientific data, 11(1):1289, 2024
Judith Sáinz-Pardo Díaz and Álvaro López García. An open source python library for anonymiz- ing sensitive data.Scientific data, 11(1):1289, 2024
work page 2024
-
[68]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[69]
J William Schopf. Microfossils of the early archean apex chert: New evidence of the antiquity of life.Science, 260(5108):640–646, 1993
work page 1993
-
[70]
Cloud llm cost model: Breakdown for mid-market businesses, 2024
Tanya Seda. Cloud llm cost model: Breakdown for mid-market businesses, 2024. Accessed: 2025-05-09
work page 2024
-
[71]
Olivia Shone. Explore ai models: Key differences between small language models and large language models, November 2024. Accessed: 2025-05-21
work page 2024
-
[72]
Powerinfer: Fast large language model serving with a consumer-grade gpu
Yixin Song, Zeyu Mi, Haotong Xie, and Haibo Chen. Powerinfer: Fast large language model serving with a consumer-grade gpu. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, pages 590–606, 2024
work page 2024
-
[73]
Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)
Shreyas Subramanian, Vikram Elango, and Mecit Gungor. Small language models (slms) can still pack a punch: A survey.arXiv preprint arXiv:2501.05465, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[74]
Synergy Technical. Small language models vs. large language models, 2025. Accessed: 2025-05-09
work page 2025
-
[75]
Brian G. Thamm. Trustworthy and secure ai: How small language models strengthen data security.Service Contractor Magazine, October 2024. Accessed: 2025-05-08
work page 2024
-
[76]
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, et al. A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and trustworthiness.arXiv preprint arXiv:2411.03350, 2024. 13
-
[77]
WorkOS. Build secure ai agents, 2025. Accessed: 2025-05-13
work page 2025
-
[78]
Zhenliang Xue, Yixin Song, Zeyu Mi, Xinrui Zheng, Yubin Xia, and Haibo Chen. Powerinfer-2: Fast large language model inference on a smartphone.arXiv preprint arXiv:2406.06282, 2024
-
[79]
Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, and Xiuzhen Cheng. On protecting the data privacy of large language models (llms): A survey.arXiv preprint arXiv:2403.05156, 2024
-
[80]
Patil, Ion Stoica, and Joseph E
Fanjia Yan, Huanzhi Mao, Charlie Cheng-Jie Ji, Tianjun Zhang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Berkeley function calling leaderboard.https://gorilla.cs.berkeley. edu/blogs/8_berkeley_function_calling_leaderboard.html, 2024
work page 2024
-
[81]
$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan. Tau-bench: A benchmark for tool-agent-user interaction in real-world domains.arXiv preprint arXiv:2406.12045, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.