Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)
Pith reviewed 2026-05-23 05:45 UTC · model grok-4.3
The pith
Small language models with 1 to 8 billion parameters can match or outperform much larger models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes through its survey that a family of SLMs sized 1 to 8 billion parameters can perform as well as or even outperform large models, while defining and characterizing their effective sizes to represent increased capability with respect to LLMs.
What carries the argument
The survey of approximately 160 works on SLMs together with the definition of effective sizes that quantify capability beyond raw parameter count.
If this is right
- Task-specific SLMs can deliver targeted performance without the overhead of full-scale models.
- Techniques for creating SLMs allow balancing of accuracy, speed, and resource use.
- Effective size metrics provide a way to compare models that accounts for real capability rather than parameter count alone.
- General-purpose SLMs in this range become viable alternatives for many applications.
Where Pith is reading between the lines
- Development effort may shift toward architecture and data choices that maximize performance per parameter rather than raw scale.
- Smaller models could enable wider deployment on consumer hardware or in low-resource settings.
- Hybrid systems that combine multiple SLMs might achieve results previously thought to require single large models.
Load-bearing premise
The selected papers are representative of the field and their reported performance numbers are directly comparable across different model scales and evaluation setups.
What would settle it
A controlled head-to-head benchmark that trains and evaluates multiple SLMs and LLMs on the exact same tasks with identical data, metrics, and hardware would show larger models consistently superior.
Figures
read the original abstract
As foundation AI models continue to increase in size, an important question arises - is massive scale the only path forward? This survey of about 160 papers presents a family of Small Language Models (SLMs) in the 1 to 8 billion parameter range that demonstrate smaller models can perform as well, or even outperform large models. We explore task agnostic, general purpose SLMs, task-specific SLMs and techniques to create SLMs that can guide the community to build models while balancing performance, efficiency, scalability and cost. Furthermore we define and characterize SLMs' effective sizes, representing increased capability with respect to LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This survey reviews ~160 papers on Small Language Models (SLMs) in the 1-8B parameter range. It claims that such SLMs can match or outperform larger models on tasks, examines task-agnostic and task-specific SLMs plus creation techniques, and introduces the notion of 'effective sizes' that characterize SLM capability relative to LLMs.
Significance. If the surveyed results prove representative and comparable, the work would supply concrete counter-evidence to strict scaling hypotheses, supporting research into efficient, lower-cost models and broadening access to capable language technology.
major comments (3)
- [Introduction and Survey Scope] The central claim that SLMs can match or exceed LLMs rests on the representativeness of the ~160 selected papers, yet the manuscript provides no explicit inclusion/exclusion criteria, search protocol, or discussion of publication bias (Introduction and Survey Scope sections).
- [Performance Comparison and Results] Performance numbers drawn from the cited works are treated as directly comparable, but the text contains no normalization for differences in training data volume, compute budget, benchmark versions, or evaluation protocols, which directly affects the validity of cross-scale claims (Performance Comparison and Results sections).
- [Discussion and Limitations] No systematic treatment of counterexamples or negative results is presented; without this, selective highlighting of positive SLM outcomes cannot be ruled out as the driver of the headline conclusion (Discussion and Limitations sections).
minor comments (2)
- [Title] The parenthetical '(updated 2026)' in the title is unclear and should be explained or corrected.
- [Figures and Tables] Figure captions and table headers would benefit from explicit statements of the exact metrics and model scales being compared.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our survey. We address each major comment below and commit to revisions that strengthen the manuscript's transparency and balance without altering its core contributions.
read point-by-point responses
-
Referee: The central claim that SLMs can match or exceed LLMs rests on the representativeness of the ~160 selected papers, yet the manuscript provides no explicit inclusion/exclusion criteria, search protocol, or discussion of publication bias (Introduction and Survey Scope sections).
Authors: We agree that these methodological details were insufficiently explicit. In the revised manuscript, we will add a dedicated subsection under Survey Scope that specifies the search protocol (keywords, databases, date range 2020-2025), inclusion criteria (models strictly 1-8B parameters with reported LLM comparisons on standard benchmarks), exclusion criteria (non-comparative studies, non-English papers, duplicate reports), and a short discussion of publication bias acknowledging that positive results may be over-represented in the literature. revision: yes
-
Referee: Performance numbers drawn from the cited works are treated as directly comparable, but the text contains no normalization for differences in training data volume, compute budget, benchmark versions, or evaluation protocols, which directly affects the validity of cross-scale claims (Performance Comparison and Results sections).
Authors: The referee is correct that unnormalized comparisons introduce uncertainty. We will revise the Performance Comparison section to add an explicit limitations paragraph that (a) enumerates the sources of incomparability, (b) reports available training details (data volume, compute) for the most-cited examples, and (c) cautions readers that headline claims are indicative rather than definitive. Full statistical normalization is not feasible within a survey format, so this will be framed as a methodological limitation. revision: partial
-
Referee: No systematic treatment of counterexamples or negative results is presented; without this, selective highlighting of positive SLM outcomes cannot be ruled out as the driver of the headline conclusion (Discussion and Limitations sections).
Authors: We accept this criticism. The revised Discussion and Limitations section will contain a new subsection titled 'Counterexamples and Negative Results' that reviews documented cases where SLMs underperform (e.g., long-context reasoning, certain multilingual tasks) and cites papers showing continued benefits from scale. This addition will make the survey more balanced and reduce the risk of perceived selection bias. revision: yes
Circularity Check
No circularity: literature survey with no derivations or fitted predictions
full rationale
This is a survey paper summarizing ~160 existing works on small language models (1-8B parameters). It contains no original mathematical derivations, equations, parameter fittings, uniqueness theorems, or ansatzes that could reduce to self-referential inputs. The central claim is an aggregation of reported results from the surveyed literature rather than a constructed prediction or self-defined quantity. No load-bearing self-citations or renamings of known results are present in a way that matches the enumerated circularity patterns. The paper is self-contained as a review and receives the default non-finding.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 3 Pith papers
-
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference
ShadowNPU presents shadowAttn, a co-designed sparse attention system that uses NPU pilot compute and techniques like graph bucketing and per-head sparsity to minimize CPU/GPU fallback during on-device LLM inference wh...
-
Small Language Models are the Future of Agentic AI
Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.
-
SLM Finetuning for Natural Language to Domain Specific Code Generation in Production
Fine-tuned small language models outperform larger models in natural language to domain-specific code generation with improved performance, latency, and the ability to adapt to customer-specific scenarios without losi...
Reference graph
Works this paper leans on
-
[1]
Mixtral of experts, 2023, Jiang, Albert Q., Alexandre Sablayrolles, An- toine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot et al. arXiv preprint arXiv:2401.04088 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Openllm leaderboard, huggingface, 2024
work page 2024
-
[3]
Phi-2: The surprising power of small language models, 2024
work page 2024
-
[4]
Smollm - blazingly fast and remarkably powerful, 2024
work page 2024
-
[5]
J., Javaheripi, M., Kauffmann, P., Lee, J
Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gu- nasekar, S., Harrison, M., Hewett, R. J., Javaheripi, M., Kauffmann, P., Lee, J. R., Lee, Y. T., Li, Y., Liu, W., Mendes, C. C. T., Nguyen, A., Price, E., de Rosa, G., Saarikivi, O., Salim, A., Shah, S., Wang, X., Ward, R., Wu, Y., Yu, D., Zhang, C., and Zhang, Y. Phi-4 technical report, 2024
work page 2024
-
[6]
S., Kalai, T., Wanf, X., Ward, R., Witte, P., Zhang, C., and Zhang, Y
Abdin, M., Aneja, J., Bubeck, S., C ´esar, C., Mendes, T., Chen, W., Del Giorno, Allie abd Eldan, R., Gopi, S., Gunasekar, S., Javaheripi, M., Kauffmann, Piero abd Tat Lee, Y., Li, Yuanzhi ans Nguyen, A., de Rosa, G., Saarikivi, O., Salim, Adil a Shi- tal Shah, S., Santacroce, M., Behl, H. S., Kalai, T., Wanf, X., Ward, R., Witte, P., Zhang, C., and Zhang...
-
[7]
Abdin, M., Jacobs, S. A., Awan, A. A., Aneja, J., Awadallah, A., Awadalla, H., Bach, N., Bahree, A., Bakhtiari, A., Behl, H., Benhaim, A., Bilenko, M., Bjorck, J., Bubeck, S., Cai, M., Mendes, C. C. T., Chen, W., Chaudhary, V., Chopra, P., Giorno, A. D., de Rosa, G., Dixon, M., Eldan, R., Iter, D., Garg, A., Goswami, A., Gunasekar, S., Haider, E., Hao, J....
work page 2024
-
[8]
Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning
Aghajanyan, A., Zettlemoyer, L., and Gupta, S. Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning. ArXiv abs/2012.13255 (2020). 28
-
[9]
Finbert: Financial sentiment analysis with pre-trained lan- guage models, 2019
Araci, D. Finbert: Financial sentiment analysis with pre-trained lan- guage models, 2019
work page 2019
-
[10]
Armengol-Estap´e, J., Woodruff, J., Cummins, C., and O’Boyle, M. F. P. Slade: A portable small language model decompiler for opti- mized assembly, 2024
work page 2024
-
[11]
Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., et al. Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
Speculative streaming: Fast llm inference without auxiliary models
Bhendawade, N., Belousova, I., Fu, Q., Mason, H., Rastegari, M., and Najibi, M. Speculative streaming: Fast llm inference without auxiliary models. arXiv preprint arXiv:2402.11131 (2024)
-
[13]
PIQA: Reasoning about Physical Commonsense in Natural Language
Bisk, Y., Zellers, R., Bras, R. L., Gao, J., and Choi, Y. Piqa: Reasoning about physical commonsense in natural language. ArXiv abs/1911.11641 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1911
-
[14]
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Cai, T., Li, Y., Geng, Z., Peng, H., Lee, J. D., Chen, D., and Dao, T. Medusa: Simple llm inference acceleration framework with multiple decoding heads. arXiv preprint arXiv:2401.10774 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Attention fusion: a light yet efficient late fusion mechanism for task adaptation in nlu
Cao, J., Prakash, C., and Hamza, W. Attention fusion: a light yet efficient late fusion mechanism for task adaptation in nlu. InNAACL-HLT (2022)
work page 2022
-
[16]
Speedupnet: A plug-and-play hyper-network for accelerating text- to-image diffusion models
Chai, W., Zheng, D., Cao, J., Chen, Z., Wang, C., and Ma, C. Speedupnet: A plug-and-play hyper-network for accelerating text- to-image diffusion models. arXiv preprint arXiv:2312.08887 (2023)
-
[17]
Parameter-efficient fine-tuning design spaces
Chen, J., Zhang, A., Shi, X., Li, M., Smola, A., and Yang, D. Parameter-efficient fine-tuning design spaces. arXiv preprint arXiv:2301.01821 (2023)
-
[18]
Evaluating Large Language Models Trained on Code
Chen, M., Tworek, J., Jun, H., Yuan, Q., Ponde, H., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D. W., Plappert, M., Chantzis, F., ...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[19]
W., Sutton, C., Gehrmann, S., et al
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., et al. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1–113
work page 2023
-
[20]
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N. M., Prabhakaran, V., Reif, E., Du, N., Hutchinson, B. C., Pope, R., Bradbury, J., Austin, J., Isard, M., Gur-Ari, G., Yin, P., Duke, T., ...
work page 2022
-
[21]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv abs/1803.05457 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[22]
Training Verifiers to Solve Math Word Problems
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., and Schulman, J. Training verifiers to solve math word problems. ArXiv abs/2110.14168 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[23]
Fleurs: Few-shot learning evaluation of universal representations of speech
Conneau, A., Ma, M., Khanuja, S., Zhang, Y., Axelrod, V., Dalmia, S., Riesa, J., Rivera, C., and Bapna, A. Fleurs: Few-shot learning evaluation of universal representations of speech. 2022 IEEE Spoken Language Technology Workshop (SLT) (2022), 798–805
work page 2022
-
[24]
UltraFeedback: Boosting Language Models with Scaled AI Feedback
Cui, G., Yuan, L., Ding, N., Yao, G., Zhu, W., Ni, Y., Xie, G., Liu, Z., and Sun, M. Ultrafeedback: Boosting language models with high-quality feedback. ArXiv abs/2310.01377 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
Chatlaw: Open- source legal large language model with integrated external knowledge bases, 2023
Cui, J., Li, Z., Yan, Y., Chen, B., and Yuan, L. Chatlaw: Open- source legal large language model with integrated external knowledge bases, 2023
work page 2023
-
[26]
Compacter: Efficient low-rank hypercomplex adapter layers
Davison, J. Compacter: Efficient low-rank hypercomplex adapter layers. In Neural Information Processing Systems (2021)
work page 2021
-
[27]
S., Chen, Z., Khachane, H., Marshall, W., Pathria, R., Tom, M., and Hestness, J
Dey, N., Gosal, G. S., Chen, Z., Khachane, H., Marshall, W., Pathria, R., Tom, M., and Hestness, J. Cerebras-gpt: Open 30 compute-optimal language models trained on the cerebras wafer-scale clus- ter. ArXiv abs/2304.03208 (2023)
-
[28]
Enhancing chat language models by scaling high- quality instructional conversations
Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., Liu, Z., Sun, M., and Zhou, B. Enhancing chat language models by scaling high- quality instructional conversations. In Conference on Empirical Methods in Natural Language Processing (2023)
work page 2023
-
[29]
Dong, X., Fu, Y., Diao, S., Byeon, W., Chen, Z., Mahabalesh- warkar, A. S., Liu, S.-Y., Keirsbilck, M. V., Chen, M.-H., Suhara, Y., Lin, Y., Kautz, J., and Molchanov, P. Hymba: A hybrid-head architecture for small language models, 2024
work page 2024
-
[30]
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Edalati, A., Tahaei, M., Kobyzev, I., Nia, V. P., Clark, J. J., and Rezagholizadeh, M. Krona: Parameter efficient tuning with kro- necker adapter. arXiv preprint arXiv:2212.10650 (2022)
-
[32]
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Eldan, R., and Li, Y.-F. Tinystories: How small can language models be and still speak coherent english? ArXiv abs/2305.07759 (2023)
work page internal anchor Pith review arXiv 2023
-
[33]
Gptq: Accurate post-training quantization for generative pre-trained transform- ers, 2023
Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. Gptq: Accurate post-training quantization for generative pre-trained transform- ers, 2023
work page 2023
-
[34]
In International Conference on Machine Learning (2021)
Fu, C., Huang, H., Chen, X., Tian, Y., and Zhao, J.Learn-to-share: A hardware-friendly transfer learning framework exploiting computation and parameter sharing. In International Conference on Machine Learning (2021)
work page 2021
-
[35]
Break the sequential dependency of llm inference using lookahead decoding
Fu, Y., Bailis, P., Stoica, I., and Zhang, H. Break the sequential dependency of llm inference using lookahead decoding. arXiv preprint arXiv:2402.02057 (2024)
-
[36]
Fu, Y., Peng, H., Ou, L., Sabharwal, A., and Khot, T.Specializing smaller language models towards multi-step reasoning, 2023
work page 2023
-
[37]
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Fos- ter, C., Phang, J., He, H., Thite, A., Nabeshima, N., et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[38]
Zamba: A compact 7b ssm hybrid model, 2024
Glorioso, P., Anthony, Q., Tokpanov, Y., Whittington, J., Pi- lault, J., Ibrahim, A., and Millidge, B. Zamba: A compact 7b ssm hybrid model, 2024. 31
work page 2024
-
[39]
J., and Tao, D.Knowledge distillation: A survey
Gou, J., Yu, B., Maybank, S. J., and Tao, D.Knowledge distillation: A survey. International Journal of Computer Vision 129 , 6 (Mar. 2021), 1789–1819
work page 2021
-
[40]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gu, A., and Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[41]
Gunasekar, S., Zhang, Y., Aneja, J., Cesar, C., Mendes, T., Giorno, A. D., Gopi, S., Javaheripi, M., Kauffmann, P., de Rosa, G., Saarikivi, O., Salim, A., Shah, S., Singh Behl, H., Wang, X., Bubeck, S., Eldan, R., Kalai, A. T., Lee, Y. T., and Li, Y. Textbooks are all you need, June 2023
work page 2023
-
[42]
Gunasekar, S., Zhang, Y., Aneja, J., Mendes, C. C. T., Del Giorno, A., Gopi, S., Javaheripi, M., Kauffmann, P., de Rosa, G., Saarikivi, O., et al. Textbooks are all you need. arXiv preprint arXiv:2306.11644 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [43]
-
[44]
Improving small language models on pubmedqa via generative data augmentation
Guo, Z., Wang, P., Wang, Y., and Yu, S. Improving small language models on pubmedqa via generative data augmentation. arXiv, Jul 12 (2023)
work page 2023
-
[45]
A., Mishra, S., Nakamura, M., Mitra, A., Mashetty, S., and Baral, C
Gupta, H., Sawant, S. A., Mishra, S., Nakamura, M., Mitra, A., Mashetty, S., and Baral, C. Instruction tuned models are quick learners, 2023
work page 2023
-
[46]
V., Prabhala, H., Paul, S., and Platen, P
Gupta, Y., Jaddipal, V. V., Prabhala, H., Paul, S., and Platen, P. V. Progressive knowledge distillation of stable diffusion xl using layer level loss, 2024
work page 2024
-
[47]
U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M
Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, J., Mirjalili, S., et al. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Preprints (2023)
work page 2023
-
[48]
Han, L., Gladkoff, S., Erofeev, G., Sorokina, I., Galiano, B., and Nenadic, G. Neural machine translation of clinical text: An em- pirical investigation into multilingual pre-trained language models and transfer-learning, 2023
work page 2023
-
[49]
Towards a unified view of parameter-efficient transfer learning
He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., and Neubig, G. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021). 32
-
[50]
Measuring Massive Multitask Language Understanding
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. X., and Steinhardt, J. Measuring massive multitask lan- guage understanding. ArXiv abs/2009.03300 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[51]
Distilling the knowledge in a neural network, 2015
Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network, 2015
work page 2015
-
[52]
Parameter-Efficient Transfer Learning for NLP
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., de Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. Parameter-efficient transfer learning for nlp. ArXiv abs/1902.00751 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[53]
Hsieh, C.-Y., Li, C.-L., Yeh, C.-K., Nakhost, H., Fujii, Y., Rat- ner, A., Krishna, R., Lee, C.-Y., and Pfister, T. Distilling step- by-step! outperforming larger language models with less training data and smaller model sizes, 2023
work page 2023
-
[54]
LoRA: Low-Rank Adaptation of Large Language Models
Hu, J. E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., and Chen, W. Lora: Low-rank adaptation of large language models. ArXiv abs/2106.09685 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[55]
Lawyer llama technical report, 2023
Huang, Q., Tao, M., Zhang, C., An, Z., Jiang, C., Chen, Z., Wu, Z., and Feng, Y. Lawyer llama technical report, 2023
work page 2023
-
[56]
How good are low-bit quantized llama3 models? an empirical study, 2024
Huang, W., Ma, X., Qin, H., Zheng, X., Lv, C., Chen, H., Luo, J., Qi, X., Liu, X., and Magno, M. How good are low-bit quantized llama3 models? an empirical study, 2024
work page 2024
-
[57]
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chap- lot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., et al. Mistral 7b. arXiv preprint arXiv:2310.06825 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[58]
Model pruning and deploy- ment optimization for ship detection
Jiang, Z., Chen, X., Gu, Y., and An, K. Model pruning and deploy- ment optimization for ship detection. In2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP) (Los Alamitos, CA, USA, apr 2023), IEEE Computer Society, pp. 1961–1968
work page 2023
-
[59]
PubMedQA: A Dataset for Biomedical Research Question Answering
Jin, Q., Dhingra, B., Liu, Z., Cohen, W. W., and Lu, X. Pub- medqa: A dataset for biomedical research question answering. arXiv preprint arXiv:1909.06146 (2019)
work page internal anchor Pith review arXiv 1909
-
[60]
Flame: A small lan- guage model for spreadsheet formulas, 2023
Joshi, H., Ebenezer, A., Cambronero, J., Gulwani, S., Kanade, A., Le, V., Radi ˇcek, I., and Verbruggen, G. Flame: A small lan- guage model for spreadsheet formulas, 2023
work page 2023
-
[61]
Prometheus: Inducing fine-grained evaluation capability in language models
Kim, S., Shin, J., Cho, Y., Jang, J., Longpre, S., Lee, H., Yun, S., Shin, S., Kim, S., Thorne, J., et al. Prometheus: Inducing fine-grained evaluation capability in language models. arXiv preprint arXiv:2310.08491 (2023). 33
-
[62]
Inducing and exploiting activation sparsity for fast inference on deep neural networks
Kurtz, M., Kopinsky, J., Gelashvili, R., Matveev, A., Carr, J., Goin, M., Leiserson, W., Moore, S., Nell, B., Shavit, N., and Alistarh, D. Inducing and exploiting activation sparsity for fast inference on deep neural networks. InProceedings of the 37th International Conference on Machine Learning (Virtual, 13–18 Jul 2020), H. D. III and A. Singh, Eds., vo...
work page 2020
-
[63]
Lee, J., Yang, F., Tran, T., Hu, Q., Barut, E., Chang, K.-W., and Su, C. Can small language models help large language models reason better?: Lm-guided chain-of-thought, 2024
work page 2024
-
[64]
Lee, S., Kim, S., Park, S. H., Kim, G., and Seo, M. Prometheus- vision: Vision-language model as a judge for fine-grained evaluation.arXiv preprint arXiv:2401.06591 (2024)
-
[65]
The power of scale for parameter-efficient prompt tuning
Lester, B., Al-Rfou, R., and Constant, N. The power of scale for parameter-efficient prompt tuning. In Conference on Empirical Methods in Natural Language Processing (2021)
work page 2021
-
[66]
Fast inference from transformers via speculative decoding
Leviathan, Y., Kalman, M., and Matias, Y. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning (2023), PMLR, pp. 19274–19286
work page 2023
-
[67]
H., Hessel, J., Yu, Y., Ren, X., Chang, K.-W., and Choi, Y
Li, L. H., Hessel, J., Yu, Y., Ren, X., Chang, K.-W., and Choi, Y. Symbolic chain-of-thought distillation: Small models can also ”think” step-by-step, 2023
work page 2023
-
[68]
StarCoder: may the source be with you!
Li, R., Allal, L. B., Zi, Y., Muennighoff, N., Kocetkov, D., Mou, C., Marone, M., Akiki, C., Li, J., Chim, J., Liu, Q., Zheltonozhskii, E., Zhuo, T. Y., Wang, T., Dehaene, O., Davaadorj, M., Lamy-Poirier, J., Monteiro, J., Shliazhko, O., Gontier, N., Meade, N., Zebaze, A., Yee, M.-H., Umapathi, L. K., Zhu, J., Lipkin, B., Oblokulov, M., Wang, Z., Murthy, ...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[69]
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Li, X. L., and Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) abs/2101.00190 (2021). 34
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[70]
Li, Y., Bubeck, S., Eldan, R., Giorno, A. D., Gunasekar, S., and Lee, Y. T. Textbooks are all you need ii: phi-1.5 technical report. September 2023
work page 2023
-
[71]
Large language models in finance: A survey
Li, Y., Wang, S., Ding, H., and Chen, H. Large language models in finance: A survey. In Proceedings of the Fourth ACM International Conference on AI in Finance (2023), pp. 374–382
work page 2023
-
[72]
Jamba: A hybrid transformer- mamba language model, 2024
Lieber, O., Lenz, B., Bata, H., Cohen, G., Osin, J., Dalmedi- gos, I., Safahi, E., Meirom, S., Belinkov, Y., Shalev-Shwartz, S., Abend, O., Alon, R., Asida, T., Bergman, A., Glozman, R., Gokhman, M., Manevich, A., Ratner, N., Rozen, N., Shwartz, E., Zusman, M., and Shoham, Y. Jamba: A hybrid transformer- mamba language model, 2024
work page 2024
-
[73]
Awq: Activation-aware weight quantization for llm compression and acceleration, 2023
Lin, J., Tang, J., Tang, H., Yang, S., Dang, X., Gan, C., and Han, S. Awq: Activation-aware weight quantization for llm compression and acceleration, 2023
work page 2023
-
[74]
Tinygsm: achieving ¿80% on gsm8k with small language models
Liu, B., Bubeck, S., Eldan, R., Kulkarni, J., Li, Y., Nguyen, A., Ward, R., and Zhang, Y. Tinygsm: achieving ¿80% on gsm8k with small language models. ArXiv abs/2312.09241 (2023)
-
[75]
A.Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
Liu, H., Tam, D., Muqeeth, M., Mohta, J., Huang, T., Bansal, M., and Raffel, C. A.Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965
work page 2022
-
[76]
F., Cheng, K.-T., and Chen, M.-H
Liu, S.-Y., Wang, C.-Y., Yin, H., Molchanov, P., Wang, Y.-C. F., Cheng, K.-T., and Chen, M.-H. Dora: Weight-decomposed low-rank adaptation, 2024
work page 2024
-
[77]
Finbert: A pre-trained financial language representation model for financial text mining
Liu, Z., Huang, D., Huang, K., Li, Z., and Zhao, J. Finbert: A pre-trained financial language representation model for financial text mining. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence (2021), pp. 4513– 4519
work page 2021
-
[78]
Longpre, S., Hou, L., Vu, T., Webson, A., Chung, H. W., Tay, Y., Zhou, D., Le, Q. V., Zoph, B., Wei, J., and Roberts, A. The flan collection: Designing data and methods for effective instruction tuning. In International Conference on Machine Learning (2023)
work page 2023
-
[79]
Blending is all you need: Cheaper, better alternative to trillion-parameters llm
Lu, X., Liusie, A., Raina, V., Zhang, Y., and Beauchamp, W. Blending is all you need: Cheaper, better alternative to trillion-parameters llm
-
[80]
Luo, H., Liu, P., and Esping, S. Exploring small language models with prompt-learning paradigm for efficient domain-specific text classification, 2023. 35
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.