pith. machine review for the scientific record. sign in

arxiv: 2506.02153 · v2 · submitted 2025-06-02 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Small Language Models are the Future of Agentic AI

Authors on Pith no claims yet

Pith reviewed 2026-05-16 11:50 UTC · model grok-4.3

classification 💻 cs.AI
keywords small language modelsagentic AIlarge language modelsAI agentsmodel efficiencyagent architecturesdeployment costsheterogeneous systems
0
0 comments X

The pith

Small language models will replace large ones in most agentic AI applications due to better suitability and economy for specialized tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that agentic AI systems use language models mainly for a small number of specialized tasks performed repetitively with little variation, rather than for open-ended general conversation. In this setting, small language models already deliver enough capability while matching the task structure more closely and costing far less to invoke repeatedly. The authors therefore position small models as the future of agentic AI and recommend heterogeneous systems that combine different model sizes only when broad conversational abilities are required. They back the claim with an analysis of current capabilities, common agent architectures, and deployment economics, plus an algorithm for converting existing large-model agents to small-model versions.

Core claim

Agentic AI systems perform a small number of specialized tasks repetitively and with little variation. For these systems, small language models are sufficiently powerful, inherently more suitable, and necessarily more economical than large language models, establishing them as the future of agentic AI. In cases where general-purpose conversational abilities remain essential, heterogeneous agentic systems that invoke multiple different models offer the natural solution.

What carries the argument

The position that small language models are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, supported by an LLM-to-SLM agent conversion algorithm.

If this is right

  • Agentic systems can reach comparable performance at a fraction of current inference costs.
  • Development efforts will shift toward fine-tuning small models for specific agent roles rather than scaling model size.
  • Heterogeneous designs will become standard, routing routine tasks to small models and reserving larger models for complex reasoning.
  • Industry-wide operational expenses for running AI agents will drop even as the number of deployed agents grows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Local execution of agents on smaller models could reduce latency and improve data privacy by limiting cloud dependence.
  • Specialized small models may accelerate creation of domain-specific agents for common subtasks such as tool use or planning.
  • Overall compute requirements for large-scale agent deployments may stabilize despite continued growth in agent numbers.

Load-bearing premise

The specialized, low-variation tasks in current and near-future agentic systems do not require the full general capabilities that only large models currently provide.

What would settle it

A controlled study showing that replacing the language-model component in representative agentic workflows with small models produces substantially lower task-completion rates or requires frequent human intervention to correct errors.

read the original abstract

Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI systems is, however, ushering in a mass of applications in which language models perform a small number of specialized tasks repetitively and with little variation. Here we lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI. Our argumentation is grounded in the current level of capabilities exhibited by SLMs, the common architectures of agentic systems, and the economy of LM deployment. We further argue that in situations where general-purpose conversational abilities are essential, heterogeneous agentic systems (i.e., agents invoking multiple different models) are the natural choice. We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm. Our position, formulated as a value statement, highlights the significance of the operational and economic impact even a partial shift from LLMs to SLMs is to have on the AI agent industry. We aim to stimulate the discussion on the effective use of AI resources and hope to advance the efforts to lower the costs of AI of the present day. Calling for both contributions to and critique of our position, we commit to publishing all such correspondence at https://research.nvidia.com/labs/lpr/slm-agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper argues that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical than large language models (LLMs) for the specialized, repetitive, low-variation tasks typical in agentic AI systems, positioning SLMs as the future of agentic AI. It grounds the position in current SLM capabilities, common agent architectures, and deployment economics; recommends heterogeneous systems (invoking multiple model sizes) when general conversational abilities are required; discusses adoption barriers; and outlines a high-level LLM-to-SLM agent conversion algorithm. The work is framed as a value statement intended to stimulate discussion on efficient AI resource use.

Significance. If the central position holds, the paper could have meaningful operational and economic impact by encouraging a shift toward lower-cost SLM deployments in agentic systems, reducing overall AI inference expenses across the industry. The explicit commitment to publishing all correspondence on the position at a public URL is a constructive element that supports open scientific dialogue.

major comments (3)
  1. [Abstract and sections on current capabilities and agent architectures] The core claim that SLMs are 'sufficiently powerful' for many agentic invocations (abstract and opening sections) rests entirely on qualitative assessment of 'current level of capabilities' without any quantitative benchmarks, controlled head-to-head comparisons, task decompositions, or failure-mode analyses showing where SLM performance remains adequate inside real agent loops.
  2. [Sections grounding the position in capabilities and architectures] The assertion that specialized low-variation tasks 'do not require the full general capabilities that only large models currently provide' (abstract and argumentation sections) is presented as an observational premise but receives no empirical support via metrics on distributional shift, ambiguity handling, or multi-step error accumulation in agentic settings.
  3. [Section outlining the LLM-to-SLM conversion algorithm] The outlined LLM-to-SLM agent conversion algorithm (section on conversion) is described only at a conceptual level with no pseudocode, concrete steps, implementation details, or validation examples, rendering it non-actionable for the practical adoption the paper advocates.
minor comments (2)
  1. [Abstract and introduction] The phrase 'inherently more suitable' is used repeatedly without an explicit definition or list of suitability criteria (e.g., latency, memory footprint, fine-tuning ease) that would allow readers to evaluate the claim.
  2. [Section on potential barriers] Barriers to SLM adoption are enumerated but not ranked by severity or illustrated with concrete deployment scenarios, which would strengthen the practical discussion.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential operational impact of our position. Our manuscript is explicitly framed as a value statement to stimulate discussion on efficient AI resource use, rather than an empirical study. We address each major comment below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract and sections on current capabilities and agent architectures] The core claim that SLMs are 'sufficiently powerful' for many agentic invocations (abstract and opening sections) rests entirely on qualitative assessment of 'current level of capabilities' without any quantitative benchmarks, controlled head-to-head comparisons, task decompositions, or failure-mode analyses showing where SLM performance remains adequate inside real agent loops.

    Authors: We acknowledge that the claims rely on qualitative assessment of existing SLM capabilities rather than new quantitative benchmarks or controlled experiments. This aligns with the paper's purpose as a position statement grounded in observed capabilities, common agent architectures, and deployment economics, not a benchmark paper. We will revise the abstract and opening sections to explicitly cite relevant existing literature on SLM performance in specialized and repetitive tasks, while clarifying that the position is intended to highlight trends and economics rather than prove sufficiency through new data. revision: partial

  2. Referee: [Sections grounding the position in capabilities and architectures] The assertion that specialized low-variation tasks 'do not require the full general capabilities that only large models currently provide' (abstract and argumentation sections) is presented as an observational premise but receives no empirical support via metrics on distributional shift, ambiguity handling, or multi-step error accumulation in agentic settings.

    Authors: The assertion is presented as an observational premise based on the repetitive, low-variation nature of tasks that dominate agentic systems. We do not provide new empirical metrics on distributional shift or error accumulation, as the work is not an empirical evaluation. In revision we will expand the relevant sections with additional examples from current agent architectures and citations to studies showing effective handling of such tasks by smaller models, to better support the premise without altering the position-paper framing. revision: partial

  3. Referee: [Section outlining the LLM-to-SLM conversion algorithm] The outlined LLM-to-SLM agent conversion algorithm (section on conversion) is described only at a conceptual level with no pseudocode, concrete steps, implementation details, or validation examples, rendering it non-actionable for the practical adoption the paper advocates.

    Authors: We agree that the high-level description of the conversion algorithm would benefit from greater concreteness to support the practical adoption we advocate. We will revise the section to include pseudocode, a list of concrete steps, and a brief illustrative example based on a standard agent task to make the algorithm more actionable. revision: yes

Circularity Check

0 steps flagged

No circularity: observational position paper with no derivations or self-referential steps

full rationale

The manuscript is a position paper that advances an argumentative claim about SLMs in agentic systems. It supplies no equations, no fitted parameters, no uniqueness theorems, and no derivations that could reduce to their own inputs. All grounding is stated as observational (current SLM capabilities, common agent architectures, deployment economics) without any self-citation load-bearing on the central thesis or any renaming of known results as new predictions. The argument is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Position paper containing no formal parameters, axioms, or invented entities; the central claim rests on informal assessment of current model capabilities and deployment costs.

pith-pipeline@v0.9.0 · 5602 in / 938 out tokens · 35950 ms · 2026-05-16T11:50:04.174682+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

    cs.CR 2026-04 unverdicted novelty 7.0

    A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.

  2. Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

    cs.CL 2026-04 unverdicted novelty 7.0

    Single-agent systems with tools provide the optimal performance-efficiency trade-off for small language models, outperforming base models and multi-agent setups.

  3. Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity

    cs.CL 2026-04 unverdicted novelty 7.0

    AMuFC improves multimodal fact-checking accuracy by adaptively determining visual evidence necessity via a dedicated Analyzer before verification rather than always incorporating images.

  4. Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity

    cs.CL 2026-04 unverdicted novelty 7.0

    An adaptive multimodal fact-checking system improves accuracy by having an Analyzer determine when visual evidence is necessary before the Verifier assesses claim veracity.

  5. Beyond Case Law: Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA

    cs.IR 2026-01 unverdicted novelty 7.0

    SearchFireSafety benchmark shows graph-guided retrieval improves statute-centric legal QA but domain-adapted models hallucinate more when statutory evidence is missing.

  6. SANet: A Semantic-aware Agentic AI Networking Framework for Cross-layer Optimization in 6G

    cs.AI 2025-12 unverdicted novelty 7.0

    SANet uses semantic-aware AI agents for cross-layer 6G optimization, achieving up to 14.61% performance gains with 44.37% of the FLOPs of prior methods via model partitioning and decentralized multi-objective algorithms.

  7. Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

    cs.LG 2026-05 unverdicted novelty 6.0

    Pretrained base models exhibit higher yield to peer disagreement than RLHF instruct variants, with the effect localized to mid-layer attention and mitigated by structured dissent rather than prompt defenses.

  8. GRAIL: A Deep-Granularity Hybrid Resonance Framework for Real-Time Agent Discovery via SLM-Enhanced Indexing

    cs.AI 2026-05 unverdicted novelty 6.0

    GRAIL achieves over 79 times lower latency than LLM-parsing baselines and higher Recall@10 than vector search by combining SLM-enhanced prediction, pseudo-document expansion, and MaxSim resonance on the new AgentTaxo-...

  9. AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

    cs.AI 2026-05 unverdicted novelty 6.0

    Small open-weight models match GPT-5 on routine agent tool-use tasks but lag on long-horizon planning, supporting tiered routing to reduce costs in agentic systems.

  10. SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving

    cs.LG 2026-04 unverdicted novelty 6.0

    Token-wise INT4 KV-cache quantization plus block-diagonal Hadamard rotation recovers nearly all accuracy lost by naive INT4 while adding zero end-to-end overhead under paged serving constraints.

  11. Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    A fine-tuning policy trains small language models to search reliably and use evidence, improving multi-hop QA performance by 15-17 points to reach large-model levels.

  12. Language Markers of Emotion Flexibility Predict Depression and Anxiety Treatment Outcomes

    cs.CL 2026-01 unverdicted novelty 6.0

    Emotion dynamics from therapy transcripts, extracted via transformers and clustered with state-space models, distinguish improving patients from non-responders who show higher odds of symptom worsening.

  13. EmbeddingGemma: Powerful and Lightweight Text Representations

    cs.CL 2025-09 unverdicted novelty 6.0

    A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.

  14. Cognitive Agent Compilation for Explicit Problem Solver Modeling

    cs.CL 2026-05 unverdicted novelty 5.0

    Cognitive Agent Compilation uses a teacher LLM to create explicit, inspectable problem-solving agents by separating knowledge, policy, and verification components for educational applications.

  15. SHIELD: A Diverse Clinical Note Dataset and Distilled Small Language Models for Enterprise-Scale De-identification

    cs.CL 2026-05 conditional novelty 5.0

    SHIELD dataset and distilled DeBERTa v3 model achieve 0.88 micro precision and 0.86 recall on PHI de-identification while matching teacher performance on structured categories.

  16. Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

    cs.AI 2026-04 unverdicted novelty 5.0

    A hybrid system augments LLMs with an automated external RDF/OWL ontology layer for long-term memory, SHACL/OWL validation, and improved multi-step reasoning on tasks like Tower of Hanoi.

  17. A pragmatic approach to regulating AI agents

    cs.CY 2026-04 unverdicted novelty 5.0

    AI agents require distinct regulation as AI systems under the EU AI Act with orchestration-layer oversight and a risk-based traffic light authorization system in contract law to preserve human accountability.

  18. AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

    cs.LG 2026-04 unverdicted novelty 5.0

    AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-...

  19. Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP

    cs.CR 2026-02 unverdicted novelty 5.0

    The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.

  20. TRACE: A Metrologically-Grounded Engineering Framework for Trustworthy Agentic AI Systems in Operationally Critical Domains

    cs.CL 2026-05 unverdicted novelty 4.0

    TRACE is a metrologically-grounded four-layer engineering framework for trustworthy agentic AI that enforces an ML-LLM split, stateful policies, human supervision, and a parsimony metric across critical domains.

  21. Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?

    cs.AI 2026-05 unverdicted novelty 4.0

    A fine-tuned 4B model matches or exceeds frontier LLMs in terminal execution subagent tasks for coding agents, reducing main agent token usage by 30% with no performance loss.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · cited by 20 Pith papers · 9 internal anchors

  1. [1]

    Small language models vs

    Aashima. Small language models vs. llms: Finding the right fit for your needs, October 2024. Accessed: 2025-05-09

  2. [2]

    Small language models vs

    ABBYY. Small language models vs. large language models, November 2024. Accessed: 2025-05-09

  3. [3]

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone.arXiv preprint arXiv:2404.14219, 2024

  4. [4]

    The economics of ai training and inference: How deepseek broke the cost curve, February 2025

    Adyog. The economics of ai training and inference: How deepseek broke the cost curve, February 2025. Accessed: 2025-05-09

  5. [5]

    Delift: Data efficient language model instruction fine tuning.arXiv preprint arXiv:2411.04425, 2024

    Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa, and Marina Danilevksy. Delift: Data efficient language model instruction fine tuning.arXiv preprint arXiv:2411.04425, 2024

  6. [6]

    Smollm2: When smol goes big – data-centric training of a small language model, 2025

    Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíˇcek, Agustín Piqueres Lajarín, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Clémentine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von Wer...

  7. [7]

    Minifinetuning: Low-data gen- eration domain adaptation through corrective self-distillation.arXiv preprint arXiv:2506.15702, 2025

    Peter Belcak, Greg Heinrich, Jan Kautz, and Pavlo Molchanov. Minifinetuning: Low-data gen- eration domain adaptation through corrective self-distillation.arXiv preprint arXiv:2506.15702, 2025

  8. [8]

    Tiny transformers excel at sentence compression.arXiv preprint arXiv:2410.23510, 2024

    Peter Belcak and Roger Wattenhofer. Tiny transformers excel at sentence compression.arXiv preprint arXiv:2410.23510, 2024

  9. [9]

    Nemotron-h: A family of accurate and efficient hybrid mamba-transformer models.arXiv preprint arXiv:2504.03624, 2025a

    Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, et al. Nemotron-h: A family of accurate and efficient hybrid mamba- transformer models.arXiv preprint arXiv:2504.03624, 2025

  10. [10]

    Rae, Erich Elsen, and Laurent Sifre

    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Bogdan Damoc, Aidan Clark, Jan Kramár, et al. Improving language models by retrieving from trillions of tokens.arXiv preprint arXiv:2112.04426, 2022

  11. [11]

    Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity.ACM Transactions on Information and System Security (TISSEC), 15(3):1–22, 2012

    Michael Brennan, Sadia Afroz, and Rachel Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity.ACM Transactions on Information and System Security (TISSEC), 15(3):1–22, 2012

  12. [12]

    Flextron: Many-in-one flexible large language model

    Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, and Pavlo Molchanov. Flextron: Many-in-one flexible large language model. InProceedings of the 41st International Conference on Machine Learning (ICML 2024), 2024

  13. [13]

    The state of ai in 2022—and a half decade in review, December 2022

    Michael Chui, Bryce Hall, Helen Mayhew, Alex Singla, and Alexander Sukharevsky. The state of ai in 2022—and a half decade in review, December 2022. Accessed: 2025-05-09

  14. [14]

    96% of enterprises are expanding use of ai agents, according to latest data from cloudera, April 2025

    Cloudera, Inc. 96% of enterprises are expanding use of ai agents, according to latest data from cloudera, April 2025. Accessed: 2025-05-08

  15. [15]

    Planck 2018 results

    Planck Collaboration et al. Planck 2018 results. vi. cosmological parameters.Astronomy & Astrophysics, 641:A6, 2020

  16. [16]

    2025 data center marketplace: Balancing unprecedented opportunity with strategic risk

    Colliers. 2025 data center marketplace: Balancing unprecedented opportunity with strategic risk. U.s. research report, Colliers, 2025

  17. [17]

    Llm agents, April 2024

    DAIR.AI. Llm agents, April 2024. Accessed: 2025-05-08. 10

  18. [18]

    Security and privacy challenges of large language models: A survey.ACM Computing Surveys, 57(6):1–39, 2025

    Badhan Chandra Das, M Hadi Amini, and Yanzhao Wu. Security and privacy challenges of large language models: A survey.ACM Computing Surveys, 57(6):1–39, 2025

  19. [19]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025

    DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025

  20. [20]

    Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

  21. [21]

    Climb: Clustering-based iterative data mixture bootstrapping for language model pre-training.arXiv preprint arXiv:2504.13161, 2025

    Shizhe Diao, Yu Yang, Yonggan Fu, Xin Dong, Dan Su, Markus Kliegl, Zijia Chen, Peter Belcak, Yoshi Suhara, Hongxu Yin, et al. Climb: Clustering-based iterative data mixture bootstrapping for language model pre-training.arXiv preprint arXiv:2504.13161, 2025

  22. [22]

    Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine Intelligence, 5(3):220–235, 2023

    Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine Intelligence, 5(3):220–235, 2023

  23. [23]

    press/v235/dao24a.html

    Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabalesh- warkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, et al. Hymba: A hybrid-head architecture for small language models.arXiv preprint arXiv:2411.13676, 2024

  24. [24]

    Introducing nvidia dynamo, a low-latency distributed inference framework for scaling reasoning ai models, March 2025

    Amr Elmeleegy et al. Introducing nvidia dynamo, a low-latency distributed inference framework for scaling reasoning ai models, March 2025. NVIDIA Technical Blog

  25. [25]

    Henry Evans. Llms vs. slms: Balancing comprehensiveness and smart resource-saving, April

  26. [27]

    Barbara A Ferguson, Timothy A Dreisbach, Catherine G Parks, Gregory M Filip, and Craig L Schmitt. Coarse-scale population structure of pathogenic armillaria species in a mixed-conifer forest in the blue mountains of northeast oregon.Canadian Journal of Forest Research, 33(4):612–623, 2003

  27. [28]

    Amoeballm: Constructing any-shape large language models for efficient and instant deployment

    Yonggan Fu, Zhongzhi Yu, Junwei Li, Jiayi Qian, Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, and Yingyan Celine Lin. Amoeballm: Constructing any-shape large language models for efficient and instant deployment. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), 2024

  28. [29]

    GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications

    google. GitHub - google/A2A: An open protocol enabling communication and interoperability between opaque agentic applications

  29. [30]

    Text compression for efficient language generation.arXiv preprint arXiv:2503.11426, 2025

    David Gu, Peter Belcak, and Roger Wattenhofer. Text compression for efficient language generation.arXiv preprint arXiv:2503.11426, 2025

  30. [31]

    Large language models vs

    Harrison Clarke. Large language models vs. small language models, March 2024. Accessed: 2025-05-09

  31. [32]

    arXiv preprint arXiv:2102.01293 , url=

    Danny Hernandez, Jared Kaplan, Tom Henighan, and Sam McCandlish. Scaling laws for transfer.arXiv preprint arXiv:2102.01293, 2021

  32. [33]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 2022

  33. [34]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arxiv 2021. arXiv preprint arXiv:2106.09685, 2021

  34. [35]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

  35. [36]

    Unsupervised fine-tuning for text clustering

    Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, and Ming Zhou. Unsupervised fine-tuning for text clustering. InProceedings of the 28th international conference on computational linguistics, pages 5530–5534, 2020. 11

  36. [37]

    How small language models can outperform llms, March 2025

    Invisible Technologies. How small language models can outperform llms, March 2025. Ac- cessed: 2025-05-21

  37. [38]

    Phi-2: The surprising power of small language models,

    Mojan Javaheripi and Sébastien Bubeck. Phi-2: The surprising power of small language models,

  38. [39]

    Microsoft Research Blog

  39. [40]

    Artificial intelligence and democracy: A conceptual framework.Social media+ society, 9(3):20563051231186353, 2023

    Andreas Jungherr. Artificial intelligence and democracy: A conceptual framework.Social media+ society, 9(3):20563051231186353, 2023

  40. [41]

    Understanding the total cost of inferencing large language models

    Aviv Kaufmann. Understanding the total cost of inferencing large language models. Technical report, Enterprise Strategy Group, April 2024. Commissioned by Dell Technologies. Accessed: 2025-05-09

  41. [42]

    Matformer: Nested transformer for elastic inference.arXiv preprint arXiv:2310.07707, 2023

    Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain, et al. Matformer: Nested transformer for elastic inference.arXiv preprint arXiv:2310.07707, 2023

  42. [43]

    From large to small: The rise of small language models (slms) in text analytics

    Akshi Kumar. From large to small: The rise of small language models (slms) in text analytics. 2025

  43. [44]

    A comparative study on unsupervised feature selection methods for text clustering

    Luying Liu, Jianchu Kang, Jing Yu, and Zhongliang Wang. A comparative study on unsupervised feature selection methods for text clustering. In2005 International Conference on Natural Language Processing and Knowledge Engineering, pages 597–601. IEEE, 2005

  44. [45]

    DoRA: Weight-Decomposed Low-Rank Adaptation

    Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353, 2024

  45. [46]

    Deja vu: Contextual sparsity for efficient llms at inference time

    Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivas- tava, Ce Zhang, Yuandong Tian, Christopher Re, et al. Deja vu: Contextual sparsity for efficient llms at inference time. InInternational Conference on Machine Learning, pages 22137–22176. PMLR, 2023

  46. [47]

    Large model strategic thinking, small model ef- ficiency: transferring theory of mind in large language models.arXiv preprint arXiv:2408.05241, 2024

    Nunzio Lore, Sepehr Ilami, and Babak Heydari. Large model strategic thinking, small model ef- ficiency: transferring theory of mind in large language models.arXiv preprint arXiv:2408.05241, 2024

  47. [48]

    Autonomous generative ai agents: Under development.Deloitte Insights, November 2024

    Jeff Loucks, Gillian Crossan, Baris Sarer, China Widener, and Ariane Bucaille. Autonomous generative ai agents: Under development.Deloitte Insights, November 2024. Accessed: 2025-05-08

  48. [49]

    Small language models: Survey, measurements, and insights.arXiv preprint arXiv:2409.15790, 2024

    Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Xiwen Zhang, Nicholas D Lane, and Mengwei Xu. Small language models: Survey, measurements, and insights.arXiv preprint arXiv:2409.15790, 2024

  49. [50]

    Large Language Model Agent: A Survey on Methodology, Applications and Challenges

    Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, et al. Large language model agent: A survey on methodology, applications and challenges.arXiv preprint arXiv:2503.21460, 2025

  50. [51]

    Brain size and ecology in small mammals.Journal of Zoology, 193(3):333–354, 1981

    Georgina M Mace, Paul H Harvey, and Timothy H Clutton-Brock. Brain size and ecology in small mammals.Journal of Zoology, 193(3):333–354, 1981

  51. [52]

    A closer look at dynamo, nvidia’s ’operating system’ for ai inference, March

    Tobias Mann. A closer look at dynamo, nvidia’s ’operating system’ for ai inference, March

  52. [53]

    Accessed: 2025-05-09

  53. [54]

    Market.us. Global agentic ai market size, share analysis by product type, agent role, agent system, end user, region and companies – industry segment outlook, market assessment, compe- tition scenario, trends and forecast 2025–2034, March 2025. Accessed: 2025-05-08

  54. [55]

    Masterman, S

    Tula Masterman, Sandi Besen, Mason Sawtell, and Alex Chao. The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey.arXiv preprint arXiv:2404.11584, 2024

  55. [56]

    How much energy do llms consume? unveiling the power behind ai, July 2024

    Sourabh Mehta. How much energy do llms consume? unveiling the power behind ai, July 2024. Accessed: 2025-05-21. 12

  56. [57]

    Model cards and prompt formats: Llama 3.3, April 2025

    Meta Platforms, Inc. Model cards and prompt formats: Llama 3.3, April 2025. Accessed: 2025-05-08

  57. [58]

    Understanding ai agents & data security, 2025

    Metomic. Understanding ai agents & data security, 2025. Accessed: 2025-05-13

  58. [59]

    Agentic ai needs a systems theory.arXiv preprint arXiv:2503.00237, 2025

    Erik Miehling, Karthikeyan Natesan Ramamurthy, Kush R Varshney, Matthew Riemer, Djallel Bouneffouf, John T Richards, Amit Dhurandhar, Elizabeth M Daly, Michael Hind, Prasanna Sattigeri, et al. Agentic ai needs a systems theory.arXiv preprint arXiv:2503.00237, 2025

  59. [60]

    Genai revenue growth and profitability, April 2025

    Morgan Stanley. Genai revenue growth and profitability, April 2025. Accessed: 2025-05-08

  60. [61]

    Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian

    Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models.arXiv preprint arXiv:2307.06435, 2023

  61. [62]

    Chatrtx, 2024

    NVIDIA. Chatrtx, 2024. NVIDIA AI Product

  62. [63]

    Nvidia dynamo: A datacenter scale distributed inference serving framework

    NVIDIA. Nvidia dynamo: A datacenter scale distributed inference serving framework. https: //github.com/ai-dynamo/dynamo, 2025. Accessed: 2025-05-09

  63. [64]

    tinybenchmarks: evaluating llms with fewer examples.arXiv preprint arXiv:2402.14992, 2024

    Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, and Mikhail Yurochkin. tinybenchmarks: evaluating llms with fewer examples.arXiv preprint arXiv:2402.14992, 2024

  64. [65]

    A certified de-identification system for all clinical text documents for information extraction at scale.JAMIA open, 6(3):ooad045, 2023

    Lakshmi Radhakrishnan, Gundolf Schenk, Kathleen Muenzen, Boris Oskotsky, Habibeh Ashouri Choshali, Thomas Plunkett, Sharat Israni, and Atul J Butte. A certified de-identification system for all clinical text documents for information extraction at scale.JAMIA open, 6(3):ooad045, 2023

  65. [66]

    Addison-Wesley, 1997

    Martin J Rees.Before the Beginning: Our Universe and Others. Addison-Wesley, 1997

  66. [67]

    An open source python library for anonymiz- ing sensitive data.Scientific data, 11(1):1289, 2024

    Judith Sáinz-Pardo Díaz and Álvaro López García. An open source python library for anonymiz- ing sensitive data.Scientific data, 11(1):1289, 2024

  67. [68]

    Toolformer: Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  68. [69]

    Microfossils of the early archean apex chert: New evidence of the antiquity of life.Science, 260(5108):640–646, 1993

    J William Schopf. Microfossils of the early archean apex chert: New evidence of the antiquity of life.Science, 260(5108):640–646, 1993

  69. [70]

    Cloud llm cost model: Breakdown for mid-market businesses, 2024

    Tanya Seda. Cloud llm cost model: Breakdown for mid-market businesses, 2024. Accessed: 2025-05-09

  70. [71]

    Explore ai models: Key differences between small language models and large language models, November 2024

    Olivia Shone. Explore ai models: Key differences between small language models and large language models, November 2024. Accessed: 2025-05-21

  71. [72]

    Powerinfer: Fast large language model serving with a consumer-grade gpu

    Yixin Song, Zeyu Mi, Haotong Xie, and Haibo Chen. Powerinfer: Fast large language model serving with a consumer-grade gpu. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, pages 590–606, 2024

  72. [73]

    Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)

    Shreyas Subramanian, Vikram Elango, and Mecit Gungor. Small language models (slms) can still pack a punch: A survey.arXiv preprint arXiv:2501.05465, 2025

  73. [74]

    Small language models vs

    Synergy Technical. Small language models vs. large language models, 2025. Accessed: 2025-05-09

  74. [75]

    Brian G. Thamm. Trustworthy and secure ai: How small language models strengthen data security.Service Contractor Magazine, October 2024. Accessed: 2025-05-08

  75. [76]

    Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, et al. A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and trustworthiness.arXiv preprint arXiv:2411.03350, 2024. 13

  76. [77]

    Build secure ai agents, 2025

    WorkOS. Build secure ai agents, 2025. Accessed: 2025-05-13

  77. [78]

    Powerinfer-2: Fast large language model inference on a smartphone.arXiv preprint arXiv:2406.06282, 2024

    Zhenliang Xue, Yixin Song, Zeyu Mi, Xinrui Zheng, Yubin Xia, and Haibo Chen. Powerinfer-2: Fast large language model inference on a smartphone.arXiv preprint arXiv:2406.06282, 2024

  78. [79]

    On protecting the data privacy of large language models (llms): A survey.arXiv preprint arXiv:2403.05156, 2024

    Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, and Xiuzhen Cheng. On protecting the data privacy of large language models (llms): A survey.arXiv preprint arXiv:2403.05156, 2024

  79. [80]

    Patil, Ion Stoica, and Joseph E

    Fanjia Yan, Huanzhi Mao, Charlie Cheng-Jie Ji, Tianjun Zhang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Berkeley function calling leaderboard.https://gorilla.cs.berkeley. edu/blogs/8_berkeley_function_calling_leaderboard.html, 2024

  80. [81]

    $\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan. Tau-bench: A benchmark for tool-agent-user interaction in real-world domains.arXiv preprint arXiv:2406.12045, 2024

Showing first 80 references.