Less (Data) Is More: Why Small Data Holds the Key to the Future of Artificial Intelligence

Andrea Polonioli; Ciro Greco; Jacopo Tagliabue

arxiv: 1907.10424 · v1 · pith:GQORLKC7new · submitted 2019-07-22 · 💻 cs.CY · cs.AI

Less (Data) Is More: Why Small Data Holds the Key to the Future of Artificial Intelligence

Ciro Greco , Andrea Polonioli , Jacopo Tagliabue This is my paper

Pith reviewed 2026-05-24 17:55 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords artificial intelligencesmall datadeep learningnatural language processinghuman-machine collaborationprivacyfew-shot learningcognitively inspired AI

0 comments

The pith

Cognitively inspired AI will succeed by relying on less data and more human collaboration rather than big data and automation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defends the value of smaller datasets in AI by examining deep learning's limited results in natural language processing and the regulatory plus business advantages of learning from few examples. It outlines an alternative where AI systems are built for privacy and collaboration with humans instead of competing with or replacing them. A sympathetic reader would care because this challenges the assumption that scaling data always improves outcomes and points to a more efficient path aligned with human capabilities. If the view holds, development would shift from data volume to cognitive principles and joint human-AI work.

Core claim

The authors contend that big data claims are oversold because deep learning shows limited track record in natural language processing while few-data systems carry regulatory and business importance; they sketch an AI with humans and for humans that is privacy-oriented and collaborative, concluding that cognitively inspired AI means less data not more and more humans not fewer.

What carries the argument

The 'AI with humans and for humans' paradigm that builds privacy-oriented systems focused on collaboration rather than replacement.

Load-bearing premise

The limited track record of deep learning in natural language processing stems primarily from reliance on large data volumes rather than other architectural or representational limits.

What would settle it

A deep learning system that reaches strong natural language processing performance through further increases in data scale without changes to data volume dependence would undermine the central argument.

Figures

Figures reproduced from arXiv: 1907.10424 by Andrea Polonioli, Ciro Greco, Jacopo Tagliabue.

**Figure 2.** Figure 2: Error rate for the best performing deep learning models of the year in two standard challenges: ImageNet (vision-based challenge, 2011-2014) vs Winograd (language-based challenge, 2014-2017). While both trends have been showing decreasing marginal gains since their inception, the error rate for the visual-based challenge (Deng et al 2009) reached human performances (~0.05) in four years; in the same timefr… view at source ↗

**Figure 3.** Figure 3: Bot and user interaction. Bot can send messages to User (as in (1)), and User may reply by typing in the input filed (2); what is typed in (2) can be sent to Bot and be part of the shared conversation between the parties. In the use case at hand, the interface is used by employees inside Company A to ask internal questions about payroll and Human Resources (HR) management - as for example “when are taxes d… view at source ↗

read the original abstract

The claims that big data holds the key to enterprise successes and that Artificial Intelligence is going to replace humanity have become increasingly more popular over the past few years, both in academia and in the industry. However, while these claims may indeed capture some truth, they have also been massively oversold, or so we contend here. The goal of this paper is two-fold. First, we provide a qualified defence of the value of less data within the context of AI. This is done by carefully reviewing two distinct problems for big data driven AI, namely a) the limited track record of Deep Learning in key areas such as Natural Language Processing, b) the regulatory and business significance of being able to learn from few data points. Second, we briefly sketch what we refer to as a case of AI with humans and for humans, namely an AI paradigm whereby the systems we build are privacy-oriented and focused on human-machine collaboration, not competition. Combining our claims above, we conclude that when seen through the lens of cognitively inspired AI, the bright future of the discipline is about less data, not more, and more humans, not fewer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This 2019 position paper defends small-data, human-centric AI but asserts without evidence that data volume is the main cause of deep learning's NLP limits.

read the letter

The paper argues that big-data AI has been oversold and that the field should shift toward fewer data points plus human collaboration, especially for privacy and regulatory reasons. It flags two problems with current approaches: deep learning's limited NLP track record and the business value of few-shot learning, then sketches a collaborative alternative called AI with humans and for humans. Nothing here is new. The points recycle existing discussions on few-shot methods and data privacy that were already common by 2019, and the text adds no experiments, derivations, or concrete mechanisms. It does connect regulatory incentives to small-data methods in a straightforward way, which might be useful for readers outside core technical work. The soft spot is exactly the one the stress-test flags. The claim that data volume drives the NLP shortfall is presented as one of two distinct problems but receives no citations, comparisons, or argument showing it outweighs architectural issues such as compositionality or structure. Without that isolation, the recommendation to move away from scale rests on an untested premise. The piece stays at the level of qualified opinion rather than evidence. It is aimed at people thinking about AI policy, ethics, or enterprise deployment who want a counter to scale narratives. Technical readers looking for results or formal grounding will find little to work with. It does not have the novelty or evidential weight to justify peer review.

Referee Report

2 major / 1 minor

Summary. The paper contends that big data has been oversold for AI successes and that deep learning has a limited track record in areas such as NLP. It identifies two problems with big-data AI—(a) DL's NLP limitations and (b) the regulatory/business value of few-shot learning—and sketches a human-centric 'AI with humans and for humans' paradigm focused on privacy and collaboration rather than replacement. The conclusion is that cognitively inspired AI's future lies in less data and more humans.

Significance. If the causal attribution holds, the position could usefully redirect attention toward small-data, privacy-preserving, collaborative systems. The manuscript supplies no citations, data, or analysis to support its core premises, however, so its potential impact remains that of an untested opinion piece rather than an evidence-based argument.

major comments (2)

[Abstract] Abstract (paragraph on two distinct problems): the claim that DL's limited NLP track record stems primarily from reliance on large data volumes is asserted without any citation, empirical isolation, or argument separating data volume from other factors such as architectural or representational limitations (e.g., compositionality, long-range dependencies).
[Abstract] Abstract (final sentence): the recommendation that 'the bright future of the discipline is about less data, not more' is load-bearing on the unexamined premise that data volume is the dominant cause of problem (a); no analysis is supplied showing that a shift to small-data paradigms would address the root issue rather than merely restating the preference for cognitively inspired AI.

minor comments (1)

The phrase 'carefully reviewing' appears in the abstract but the provided text contains no actual review, citations, or structured analysis of the two problems; explicit sectioning or a dedicated review subsection would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback on our position paper. We address each major comment below, clarifying the scope of our arguments as a perspective piece rather than an empirical study, while noting where revisions can strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on two distinct problems): the claim that DL's limited NLP track record stems primarily from reliance on large data volumes is asserted without any citation, empirical isolation, or argument separating data volume from other factors such as architectural or representational limitations (e.g., compositionality, long-range dependencies).

Authors: We agree that the abstract, due to its brevity, presents the claim concisely without fully isolating data volume from other contributing factors such as model architecture or representational limitations. As a position paper, the core intent is to argue that big-data approaches have been oversold in AI discourse, with the limited NLP track record serving as one illustrative problem rather than a fully dissected causal analysis. The full manuscript expands on this by contrasting big-data DL with cognitively inspired alternatives, but we acknowledge the abstract could better qualify the claim. We will revise the abstract to explicitly note that data volume is one significant factor among others and add supporting citations to literature on data efficiency and NLP limitations. revision: partial
Referee: [Abstract] Abstract (final sentence): the recommendation that 'the bright future of the discipline is about less data, not more' is load-bearing on the unexamined premise that data volume is the dominant cause of problem (a); no analysis is supplied showing that a shift to small-data paradigms would address the root issue rather than merely restating the preference for cognitively inspired AI.

Authors: The final sentence synthesizes the two problems outlined: (a) DL limitations in areas like NLP under big-data regimes and (b) the regulatory/business value of few-shot learning. Our argument is not that data volume is the sole or dominant root cause of (a), but that a shift toward small-data, human-centric paradigms offers a complementary direction that addresses both issues while prioritizing privacy and collaboration. This is presented as a sketched perspective aligned with cognitively inspired AI, not a proven causal solution. We will revise the abstract and conclusion to more explicitly frame the recommendation as a directional proposal rather than a direct fix for all underlying limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: argumentative premises do not reduce to self-defined inputs

full rationale

The paper advances a qualitative argument identifying two problems with big-data AI and sketching a human-centric alternative. No equations, fitted parameters, or derivations exist. The central attribution of DL's NLP limitations to data volume is stated as a premise without reduction to any self-citation chain or definitional loop, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper is a position piece whose claims rest on domain assumptions about current AI performance and regulatory needs rather than on new parameters or entities.

axioms (2)

domain assumption Deep Learning has a limited track record in key areas such as Natural Language Processing
Invoked in the first problem statement of the abstract as a premise for preferring small data.
domain assumption Being able to learn from few data points has regulatory and business significance
Second problem listed in the abstract; treated as self-evident justification for the small-data emphasis.

invented entities (1)

AI with humans and for humans no independent evidence
purpose: Proposed paradigm that is privacy-oriented and focused on human-machine collaboration rather than competition
Introduced in the abstract as the alternative to big-data AI; no independent evidence or falsifiable prediction supplied.

pith-pipeline@v0.9.0 · 5738 in / 1340 out tokens · 25878 ms · 2026-05-24T17:55:09.679346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

[1]

Abadi, 2015, TensorFlow: Large -Scale Machine Learning on Heterogeneous Distributed Systems, White Paper TensorFlow. ACM,

work page 2015
[2]

Management Science, 57(8), pp.1373-1386

Goodbye pareto principle, hello long tail: The effect of search costs on the concentration of product sales. Management Science, 57(8), pp.1373-1386. Chollet F., 2017, retrieved from https://twitter.com/ fchollet/status/94273341478819 0209?lang=en Chui, M., James Manyika, Mehdi Miremadi, Nicolaus Henke, Rita Chung, Pieter Nel, and Sankalp Malhotra,

work page arXiv 2017
[3]

Oneworld publications

The rise of the robots: Technology and the threat of mass unemployment. Oneworld publications. Ghemawat, S., Gobioff H., Leung S., 2003 Proceedings of the 19th ACM Symposium on Operating Systems Principle, 20--4, Bolton Landing, NY. Goodman, N. D., Tenenbau m, J. B. and The ProbMods Contributors

work page 2003
[4]

Retrieved 2019 -4-15 from https:// probmods.org/ Goodman, N

Probabilistic Models of Cognition (2nd ed.). Retrieved 2019 -4-15 from https:// probmods.org/ Goodman, N. D., and Stuhlmüller, A

work page 2019
[5]

Vancouver

The unreasonable effectiveness of data, IEEE. Vancouver. Hartnett, 2018, To Build Truly Intelligent Machines, Teach Them Cause and Effect, Quanta Magazine Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Ullah Khan, S.,

work page 2018
[6]

big data

The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 47, 98 –115. doi:10.1016/j.is.2014.07.006 Henke, N., Bughin, J., Chui, M., Manyika, J., Saleh, T., Wiseman, B. and Sethupathy, G.,

work page doi:10.1016/j.is.2014.07.006 2014
[7]

Deep Learning Scaling is Predictable, Empirically

Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409. Hinton, G. E ., Krizhevsky A., Sutskever I., Srivastva I., 2013, System and method for addressing overfitting in a neural network, USS PATENT: US9406017B2. Hochreiter S., Schmidhuber S., 1997, Long short -term memory, Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1162/neco.1997.9.8.1735 2013
[8]

In Advances in neural information processing systems (pp

Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). Joy, B., 2001, Why the Future doesn ’t Need Us, Wired. Lake, B.M., Ullman, T.D., Tenenbaum, J.B. and Gershman, S.J.,

work page 2001
[9]

Levesque, H

Landgrebe and Smith 2019, Making AI meaningful again, Synthese, 1-21. Levesque, H. J

work page 2019
[10]

In Logical Formalizations of Commonsense Reasoning, 2011 AAAI Spring Symposium, TR SS-11-06

The Winograd Schema Challenge. In Logical Formalizations of Commonsense Reasoning, 2011 AAAI Spring Symposium, TR SS-11-06. Marblestone, A. H., Wayne G., Kording K. P., 2016, Frontiers in Computational Neuroscience doi: 10.3389/fncom.2016.00094 Marcus, G.,

work page doi:10.3389/fncom.2016.00094 2011
[11]

Deep Learning: A Critical Appraisal

Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631. Marcus, G., Vijayan S., Bandi Rao S., Vishton PM., 1999, Rule learning by seven -month-old infants, Science,; 283(5398):77-80. Markman, E. M., 1989, Categorization and Na ming in Children, MIT Press, Cambridge, MA. Meylan, 2015, S.C. and Griffiths, T.L.,

work page internal anchor Pith review Pith/arXiv arXiv 1999
[12]

In Proceedings of the 37th Annual Meeting of the Cognitive Science Society

A Bayesian framework for learning words from multiword utterances. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Minsky, M. Papert S., 1 969, Perceptrons: An Introduction to Computational Geometry, The MIT Press, Cambridge MA, ISBN 0-262-63022-2. Musk, E., 2017, retrieved from https://twitter.com/ elonmusk/status/934888089058...

work page arXiv 2017
[13]

OpenAI Blog, 1, p.8

Language models are unsupervised multitask learners. OpenAI Blog, 1, p.8. Rosenblatt, F., 1957, The Perceptron --a perceiving and recognizing automaton. Report 85 -460-1, Cornell Aeronautical Laboratory. Rumelhart, D. E., Hinton G. E., Williams R. J., 1986, Learning representations by back -propagating errors, Nature, volume 323, pages 533–536. Shvachko, ...

work page 1957
[14]

Silver, D., 2016, Mastering the game of Go with deep neural networks and tree search, Google AI

Hype Cycle for Artificial Intelligence, Gartner. Silver, D., 2016, Mastering the game of Go with deep neural networks and tree search, Google AI. Spacy,

work page 2016
[15]

Venture Capital Funding For Artificial Intelligence Startups Hit Record High, Forbes. Tensorflow, 2018, Retrieved from https:// www.tensorflow.org/alpha/tutorials/sequences/nmt_wi th_attention#next_steps LINK Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I.,

work page 2018
[16]

In Advances in neural information processing systems (pp

Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). Weiss, K., Khoshgoftaar, T.M., Wang, D., 2016 . A survey of transfer learning. Journal of Big Data 3(1),

work page 2016
[17]

2007 Apr;114(2):245-72

Xu, F., Tenenbaum, JB., 2007, Word learning as Bayesian inference, Psychol Rev. 2007 Apr;114(2):245-72. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S. and Stoica, I.,

work page 2007
[18]

HotCloud, 10(10-10), p.95

Spark: Clust er computing with working sets. HotCloud, 10(10-10), p.95. Zarsky, 2017, Incompatible: The GDPR in the Age of Big Dat Wikipedia, 2019a, Form W -2. Retrieved from: https://en.wikipedia.org/wiki/Form_W-2 Wikipedia, 2019b, Form

work page 2017

[1] [1]

Abadi, 2015, TensorFlow: Large -Scale Machine Learning on Heterogeneous Distributed Systems, White Paper TensorFlow. ACM,

work page 2015

[2] [2]

Management Science, 57(8), pp.1373-1386

Goodbye pareto principle, hello long tail: The effect of search costs on the concentration of product sales. Management Science, 57(8), pp.1373-1386. Chollet F., 2017, retrieved from https://twitter.com/ fchollet/status/94273341478819 0209?lang=en Chui, M., James Manyika, Mehdi Miremadi, Nicolaus Henke, Rita Chung, Pieter Nel, and Sankalp Malhotra,

work page arXiv 2017

[3] [3]

Oneworld publications

The rise of the robots: Technology and the threat of mass unemployment. Oneworld publications. Ghemawat, S., Gobioff H., Leung S., 2003 Proceedings of the 19th ACM Symposium on Operating Systems Principle, 20--4, Bolton Landing, NY. Goodman, N. D., Tenenbau m, J. B. and The ProbMods Contributors

work page 2003

[4] [4]

Retrieved 2019 -4-15 from https:// probmods.org/ Goodman, N

Probabilistic Models of Cognition (2nd ed.). Retrieved 2019 -4-15 from https:// probmods.org/ Goodman, N. D., and Stuhlmüller, A

work page 2019

[5] [5]

Vancouver

The unreasonable effectiveness of data, IEEE. Vancouver. Hartnett, 2018, To Build Truly Intelligent Machines, Teach Them Cause and Effect, Quanta Magazine Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Ullah Khan, S.,

work page 2018

[6] [6]

big data

The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 47, 98 –115. doi:10.1016/j.is.2014.07.006 Henke, N., Bughin, J., Chui, M., Manyika, J., Saleh, T., Wiseman, B. and Sethupathy, G.,

work page doi:10.1016/j.is.2014.07.006 2014

[7] [7]

Deep Learning Scaling is Predictable, Empirically

Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409. Hinton, G. E ., Krizhevsky A., Sutskever I., Srivastva I., 2013, System and method for addressing overfitting in a neural network, USS PATENT: US9406017B2. Hochreiter S., Schmidhuber S., 1997, Long short -term memory, Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1162/neco.1997.9.8.1735 2013

[8] [8]

In Advances in neural information processing systems (pp

Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). Joy, B., 2001, Why the Future doesn ’t Need Us, Wired. Lake, B.M., Ullman, T.D., Tenenbaum, J.B. and Gershman, S.J.,

work page 2001

[9] [9]

Levesque, H

Landgrebe and Smith 2019, Making AI meaningful again, Synthese, 1-21. Levesque, H. J

work page 2019

[10] [10]

In Logical Formalizations of Commonsense Reasoning, 2011 AAAI Spring Symposium, TR SS-11-06

The Winograd Schema Challenge. In Logical Formalizations of Commonsense Reasoning, 2011 AAAI Spring Symposium, TR SS-11-06. Marblestone, A. H., Wayne G., Kording K. P., 2016, Frontiers in Computational Neuroscience doi: 10.3389/fncom.2016.00094 Marcus, G.,

work page doi:10.3389/fncom.2016.00094 2011

[11] [11]

Deep Learning: A Critical Appraisal

Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631. Marcus, G., Vijayan S., Bandi Rao S., Vishton PM., 1999, Rule learning by seven -month-old infants, Science,; 283(5398):77-80. Markman, E. M., 1989, Categorization and Na ming in Children, MIT Press, Cambridge, MA. Meylan, 2015, S.C. and Griffiths, T.L.,

work page internal anchor Pith review Pith/arXiv arXiv 1999

[12] [12]

In Proceedings of the 37th Annual Meeting of the Cognitive Science Society

A Bayesian framework for learning words from multiword utterances. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Minsky, M. Papert S., 1 969, Perceptrons: An Introduction to Computational Geometry, The MIT Press, Cambridge MA, ISBN 0-262-63022-2. Musk, E., 2017, retrieved from https://twitter.com/ elonmusk/status/934888089058...

work page arXiv 2017

[13] [13]

OpenAI Blog, 1, p.8

Language models are unsupervised multitask learners. OpenAI Blog, 1, p.8. Rosenblatt, F., 1957, The Perceptron --a perceiving and recognizing automaton. Report 85 -460-1, Cornell Aeronautical Laboratory. Rumelhart, D. E., Hinton G. E., Williams R. J., 1986, Learning representations by back -propagating errors, Nature, volume 323, pages 533–536. Shvachko, ...

work page 1957

[14] [14]

Silver, D., 2016, Mastering the game of Go with deep neural networks and tree search, Google AI

Hype Cycle for Artificial Intelligence, Gartner. Silver, D., 2016, Mastering the game of Go with deep neural networks and tree search, Google AI. Spacy,

work page 2016

[15] [15]

Venture Capital Funding For Artificial Intelligence Startups Hit Record High, Forbes. Tensorflow, 2018, Retrieved from https:// www.tensorflow.org/alpha/tutorials/sequences/nmt_wi th_attention#next_steps LINK Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I.,

work page 2018

[16] [16]

In Advances in neural information processing systems (pp

Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). Weiss, K., Khoshgoftaar, T.M., Wang, D., 2016 . A survey of transfer learning. Journal of Big Data 3(1),

work page 2016

[17] [17]

2007 Apr;114(2):245-72

Xu, F., Tenenbaum, JB., 2007, Word learning as Bayesian inference, Psychol Rev. 2007 Apr;114(2):245-72. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S. and Stoica, I.,

work page 2007

[18] [18]

HotCloud, 10(10-10), p.95

Spark: Clust er computing with working sets. HotCloud, 10(10-10), p.95. Zarsky, 2017, Incompatible: The GDPR in the Age of Big Dat Wikipedia, 2019a, Form W -2. Retrieved from: https://en.wikipedia.org/wiki/Form_W-2 Wikipedia, 2019b, Form

work page 2017