Less (Data) Is More: Why Small Data Holds the Key to the Future of Artificial Intelligence
Pith reviewed 2026-05-24 17:55 UTC · model grok-4.3
The pith
Cognitively inspired AI will succeed by relying on less data and more human collaboration rather than big data and automation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors contend that big data claims are oversold because deep learning shows limited track record in natural language processing while few-data systems carry regulatory and business importance; they sketch an AI with humans and for humans that is privacy-oriented and collaborative, concluding that cognitively inspired AI means less data not more and more humans not fewer.
What carries the argument
The 'AI with humans and for humans' paradigm that builds privacy-oriented systems focused on collaboration rather than replacement.
Load-bearing premise
The limited track record of deep learning in natural language processing stems primarily from reliance on large data volumes rather than other architectural or representational limits.
What would settle it
A deep learning system that reaches strong natural language processing performance through further increases in data scale without changes to data volume dependence would undermine the central argument.
Figures
read the original abstract
The claims that big data holds the key to enterprise successes and that Artificial Intelligence is going to replace humanity have become increasingly more popular over the past few years, both in academia and in the industry. However, while these claims may indeed capture some truth, they have also been massively oversold, or so we contend here. The goal of this paper is two-fold. First, we provide a qualified defence of the value of less data within the context of AI. This is done by carefully reviewing two distinct problems for big data driven AI, namely a) the limited track record of Deep Learning in key areas such as Natural Language Processing, b) the regulatory and business significance of being able to learn from few data points. Second, we briefly sketch what we refer to as a case of AI with humans and for humans, namely an AI paradigm whereby the systems we build are privacy-oriented and focused on human-machine collaboration, not competition. Combining our claims above, we conclude that when seen through the lens of cognitively inspired AI, the bright future of the discipline is about less data, not more, and more humans, not fewer.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper contends that big data has been oversold for AI successes and that deep learning has a limited track record in areas such as NLP. It identifies two problems with big-data AI—(a) DL's NLP limitations and (b) the regulatory/business value of few-shot learning—and sketches a human-centric 'AI with humans and for humans' paradigm focused on privacy and collaboration rather than replacement. The conclusion is that cognitively inspired AI's future lies in less data and more humans.
Significance. If the causal attribution holds, the position could usefully redirect attention toward small-data, privacy-preserving, collaborative systems. The manuscript supplies no citations, data, or analysis to support its core premises, however, so its potential impact remains that of an untested opinion piece rather than an evidence-based argument.
major comments (2)
- [Abstract] Abstract (paragraph on two distinct problems): the claim that DL's limited NLP track record stems primarily from reliance on large data volumes is asserted without any citation, empirical isolation, or argument separating data volume from other factors such as architectural or representational limitations (e.g., compositionality, long-range dependencies).
- [Abstract] Abstract (final sentence): the recommendation that 'the bright future of the discipline is about less data, not more' is load-bearing on the unexamined premise that data volume is the dominant cause of problem (a); no analysis is supplied showing that a shift to small-data paradigms would address the root issue rather than merely restating the preference for cognitively inspired AI.
minor comments (1)
- The phrase 'carefully reviewing' appears in the abstract but the provided text contains no actual review, citations, or structured analysis of the two problems; explicit sectioning or a dedicated review subsection would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive feedback on our position paper. We address each major comment below, clarifying the scope of our arguments as a perspective piece rather than an empirical study, while noting where revisions can strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on two distinct problems): the claim that DL's limited NLP track record stems primarily from reliance on large data volumes is asserted without any citation, empirical isolation, or argument separating data volume from other factors such as architectural or representational limitations (e.g., compositionality, long-range dependencies).
Authors: We agree that the abstract, due to its brevity, presents the claim concisely without fully isolating data volume from other contributing factors such as model architecture or representational limitations. As a position paper, the core intent is to argue that big-data approaches have been oversold in AI discourse, with the limited NLP track record serving as one illustrative problem rather than a fully dissected causal analysis. The full manuscript expands on this by contrasting big-data DL with cognitively inspired alternatives, but we acknowledge the abstract could better qualify the claim. We will revise the abstract to explicitly note that data volume is one significant factor among others and add supporting citations to literature on data efficiency and NLP limitations. revision: partial
-
Referee: [Abstract] Abstract (final sentence): the recommendation that 'the bright future of the discipline is about less data, not more' is load-bearing on the unexamined premise that data volume is the dominant cause of problem (a); no analysis is supplied showing that a shift to small-data paradigms would address the root issue rather than merely restating the preference for cognitively inspired AI.
Authors: The final sentence synthesizes the two problems outlined: (a) DL limitations in areas like NLP under big-data regimes and (b) the regulatory/business value of few-shot learning. Our argument is not that data volume is the sole or dominant root cause of (a), but that a shift toward small-data, human-centric paradigms offers a complementary direction that addresses both issues while prioritizing privacy and collaboration. This is presented as a sketched perspective aligned with cognitively inspired AI, not a proven causal solution. We will revise the abstract and conclusion to more explicitly frame the recommendation as a directional proposal rather than a direct fix for all underlying limitations. revision: yes
Circularity Check
No circularity: argumentative premises do not reduce to self-defined inputs
full rationale
The paper advances a qualitative argument identifying two problems with big-data AI and sketching a human-centric alternative. No equations, fitted parameters, or derivations exist. The central attribution of DL's NLP limitations to data volume is stated as a premise without reduction to any self-citation chain or definitional loop, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Deep Learning has a limited track record in key areas such as Natural Language Processing
- domain assumption Being able to learn from few data points has regulatory and business significance
invented entities (1)
-
AI with humans and for humans
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Abadi, 2015, TensorFlow: Large -Scale Machine Learning on Heterogeneous Distributed Systems, White Paper TensorFlow. ACM,
work page 2015
-
[2]
Management Science, 57(8), pp.1373-1386
Goodbye pareto principle, hello long tail: The effect of search costs on the concentration of product sales. Management Science, 57(8), pp.1373-1386. Chollet F., 2017, retrieved from https://twitter.com/ fchollet/status/94273341478819 0209?lang=en Chui, M., James Manyika, Mehdi Miremadi, Nicolaus Henke, Rita Chung, Pieter Nel, and Sankalp Malhotra,
-
[3]
The rise of the robots: Technology and the threat of mass unemployment. Oneworld publications. Ghemawat, S., Gobioff H., Leung S., 2003 Proceedings of the 19th ACM Symposium on Operating Systems Principle, 20--4, Bolton Landing, NY. Goodman, N. D., Tenenbau m, J. B. and The ProbMods Contributors
work page 2003
-
[4]
Retrieved 2019 -4-15 from https:// probmods.org/ Goodman, N
Probabilistic Models of Cognition (2nd ed.). Retrieved 2019 -4-15 from https:// probmods.org/ Goodman, N. D., and Stuhlmüller, A
work page 2019
- [5]
-
[6]
The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 47, 98 –115. doi:10.1016/j.is.2014.07.006 Henke, N., Bughin, J., Chui, M., Manyika, J., Saleh, T., Wiseman, B. and Sethupathy, G.,
-
[7]
Deep Learning Scaling is Predictable, Empirically
Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409. Hinton, G. E ., Krizhevsky A., Sutskever I., Srivastva I., 2013, System and method for addressing overfitting in a neural network, USS PATENT: US9406017B2. Hochreiter S., Schmidhuber S., 1997, Long short -term memory, Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco....
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1162/neco.1997.9.8.1735 2013
-
[8]
In Advances in neural information processing systems (pp
Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). Joy, B., 2001, Why the Future doesn ’t Need Us, Wired. Lake, B.M., Ullman, T.D., Tenenbaum, J.B. and Gershman, S.J.,
work page 2001
-
[9]
Landgrebe and Smith 2019, Making AI meaningful again, Synthese, 1-21. Levesque, H. J
work page 2019
-
[10]
In Logical Formalizations of Commonsense Reasoning, 2011 AAAI Spring Symposium, TR SS-11-06
The Winograd Schema Challenge. In Logical Formalizations of Commonsense Reasoning, 2011 AAAI Spring Symposium, TR SS-11-06. Marblestone, A. H., Wayne G., Kording K. P., 2016, Frontiers in Computational Neuroscience doi: 10.3389/fncom.2016.00094 Marcus, G.,
-
[11]
Deep Learning: A Critical Appraisal
Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631. Marcus, G., Vijayan S., Bandi Rao S., Vishton PM., 1999, Rule learning by seven -month-old infants, Science,; 283(5398):77-80. Markman, E. M., 1989, Categorization and Na ming in Children, MIT Press, Cambridge, MA. Meylan, 2015, S.C. and Griffiths, T.L.,
work page internal anchor Pith review Pith/arXiv arXiv 1999
-
[12]
In Proceedings of the 37th Annual Meeting of the Cognitive Science Society
A Bayesian framework for learning words from multiword utterances. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Minsky, M. Papert S., 1 969, Perceptrons: An Introduction to Computational Geometry, The MIT Press, Cambridge MA, ISBN 0-262-63022-2. Musk, E., 2017, retrieved from https://twitter.com/ elonmusk/status/934888089058...
-
[13]
Language models are unsupervised multitask learners. OpenAI Blog, 1, p.8. Rosenblatt, F., 1957, The Perceptron --a perceiving and recognizing automaton. Report 85 -460-1, Cornell Aeronautical Laboratory. Rumelhart, D. E., Hinton G. E., Williams R. J., 1986, Learning representations by back -propagating errors, Nature, volume 323, pages 533–536. Shvachko, ...
work page 1957
-
[14]
Silver, D., 2016, Mastering the game of Go with deep neural networks and tree search, Google AI
Hype Cycle for Artificial Intelligence, Gartner. Silver, D., 2016, Mastering the game of Go with deep neural networks and tree search, Google AI. Spacy,
work page 2016
-
[15]
Venture Capital Funding For Artificial Intelligence Startups Hit Record High, Forbes. Tensorflow, 2018, Retrieved from https:// www.tensorflow.org/alpha/tutorials/sequences/nmt_wi th_attention#next_steps LINK Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I.,
work page 2018
-
[16]
In Advances in neural information processing systems (pp
Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). Weiss, K., Khoshgoftaar, T.M., Wang, D., 2016 . A survey of transfer learning. Journal of Big Data 3(1),
work page 2016
-
[17]
Xu, F., Tenenbaum, JB., 2007, Word learning as Bayesian inference, Psychol Rev. 2007 Apr;114(2):245-72. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S. and Stoica, I.,
work page 2007
-
[18]
Spark: Clust er computing with working sets. HotCloud, 10(10-10), p.95. Zarsky, 2017, Incompatible: The GDPR in the Age of Big Dat Wikipedia, 2019a, Form W -2. Retrieved from: https://en.wikipedia.org/wiki/Form_W-2 Wikipedia, 2019b, Form
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.