A solution to generalized learning from small training sets found in infant repeated visual experiences of individual objects

David J. Crandall; Elizabeth Clerkin; Frangil Ramirez; Linda B. Smith

arxiv: 2510.15060 · v3 · pith:P5RNXJK4new · submitted 2025-10-16 · 💻 cs.CV

A solution to generalized learning from small training sets found in infant repeated visual experiences of individual objects

Frangil Ramirez , Elizabeth Clerkin , David J. Crandall , Linda B. Smith This is my paper

Pith reviewed 2026-05-25 07:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords infant visual experienceobject categorieslumpy distributiongeneralizationhead-camera imagessimilarity structuresmall training setscategory learning

0 comments

The pith

Lumpy clusters of repeated similar views in infants' daily visual input enable category generalization after very few examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes head-camera images from 14 one-year-olds across 87 mealtimes to characterize visual experiences of 8 common object categories. For each infant and category the instances follow a highly skewed distribution with many images of a few objects and fewer of others, forming a graph of similarities that is lumpy: multiple interconnected clusters of high-similarity images mixed with high variability. Computational experiments then create artificial training sets that reproduce this lumpy structure and test whether models can generalize to novel instances. The results show that such lumpy sets succeed at generalization after minimal training, whereas the paper implies uniform distributions do not. A sympathetic reader cares because this statistical pattern in real infant experience offers a concrete mechanism that solves the small-sample generalization problem for both developing humans and machines.

Core claim

The distribution of instances for each infant and category is highly skewed, containing many images of the same few objects along with fewer images of other instances; graph-theoretic measures reveal a lumpy mix of high similarity and high variability organized into multiple but interconnected clusters; artificially-created training sets that reproduce this lumpy distribution of similarities support generalization to novel instances after very few training experiences.

What carries the argument

The lumpy distribution of similarities revealed by graph-theoretic measures on the head-camera images, organized as multiple interconnected clusters of high-similarity views.

If this is right

Training sets built from lumpy similarity clusters generalize to new instances after far fewer examples than uniform distributions.
Infant visual experience statistics supply a natural training regime that solves the small-sample learning problem.
The same lumpy structure can be engineered into machine-training data to improve few-shot object recognition.
General learning systems, biological or artificial, benefit when input statistics contain repeated high-similarity clusters rather than uniform coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lumpy input structure may explain rapid learning in other domains such as early word acquisition.
Artificial curricula that deliberately repeat a few instances in clustered views could reduce the data hunger of current vision models.
If the lumpy pattern is disrupted in atypical visual experience, category learning delays might be expected.

Load-bearing premise

The graph-theoretic similarity measures on the infant images capture the perceptual dimensions that actually drive generalization in both infants and models.

What would settle it

A direct test in which models trained on lumpy sets generalize to novel instances after few examples while identically sized uniform or random sets do not.

read the original abstract

One-year-old infants rapidly form and generalize categories of the everyday objects they encounter. Here we provide evidence on infants daily-life visual experiences for 8 early-learned object categories. Using a corpus of infant head-camera images recorded at mealtimes (87 mealtimes captured by 14 infants), we measure the frequency of the unique instances of each category and the variability of the visual experiences of each instance. The distribution of instances is highly skewed, containing, for each infant and category, many images of the same few objects along with fewer images of other instances. Graph theoretic measures of the similarity structure for individual categories reveal a lumpy mix of high similarity and high variability, organized into multiple but interconnected clusters of high-similarity images. In computational experiments, we show that artificially-created training sets characterized by a lumpy distribution of similarities support generalization to novel instances after very few training experiences. We discuss implications for visual object recognition, and for learning more generally, by both humans and machines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper measures skewed instance frequencies and lumpy similarity clusters from real infant head-cam images and tests whether artificial sets with matching stats improve few-shot generalization in models.

read the letter

The main takeaway is that they quantified real infant visual input for eight common object categories from head-cam recordings, found highly skewed instance frequencies plus a graph-based lumpy structure of high-similarity clusters, and then showed in computational tests that training sets built with similar properties support generalization to new instances after few examples. This is new because it supplies fresh quantitative measurements from 14 infants across 87 sessions and directly links those statistics to a generalization outcome rather than stopping at description. The separation between the empirical corpus analysis and the later computational experiments is a strength; it keeps the claim from being purely circular. The work is grounded in actual everyday infant experience instead of abstract assumptions about data distributions. The soft spots sit in the missing procedural details. The abstract gives no information on how images were labeled, what representation or distance metric fed the graph measures, how clusters were identified, exactly how the artificial training sets were constructed to match the lumpy properties, what model architectures were used, or what statistical controls were applied. Without those steps it is difficult to confirm that the reported generalization advantage truly traces to the claimed similarity structure rather than to uncontrolled differences in variance or feature alignment. The stress-test point about fidelity of the artificial sets therefore lands. This paper is aimed at people studying few-shot visual learning, data curation for machine vision, or the statistics of natural infant experience. A reader working on either side of the human-machine comparison would find the measurements and the basic hypothesis useful. It deserves peer review because the new corpus data and the computational test are substantive enough to warrant closer examination even if the methods section needs expansion to make the results reproducible.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes head-camera images from 14 infants across 87 mealtimes for 8 object categories, reporting highly skewed instance frequencies (many images of few objects) and, via graph-theoretic similarity measures, a lumpy structure of high-similarity clusters. Computational experiments then demonstrate that artificially constructed training sets with analogous lumpy similarity distributions enable generalization to novel instances after very few training examples.

Significance. The naturalistic infant data collection provides a valuable empirical window into real-world visual experience distributions that differ markedly from standard ML training regimes. If the computational results are shown to be driven specifically by the measured statistical properties rather than uncontrolled factors, the work could offer a mechanistic account of few-shot category generalization with implications for both developmental science and machine learning architectures.

major comments (2)

[Computational experiments] Computational experiments section: the procedure used to construct the artificial training sets is not described with sufficient specificity (e.g., exact sampling rules for instance frequencies, choice of distance metric or embedding for the graph, definition of clusters, and how the 'lumpy' structure is quantitatively reproduced). Without these details it is impossible to verify that the reported generalization performance is attributable to the claimed properties of the real head-camera data rather than other differences in variance or feature alignment.
[Graph theoretic measures] Methods for graph-theoretic measures: the similarity metric, image representation, and cluster definition used to identify the 'lumpy' structure on the real data are not specified. These choices are load-bearing because the central claim requires that the artificial sets faithfully replicate the measured properties.

minor comments (2)

[Abstract and Methods] The abstract and methods should explicitly state the criteria used for image labeling, session selection, and any statistical controls for inter-infant variability.
[Figures] Figure captions and legends should clarify the axes, color coding, and sample sizes for any plots of instance distributions or similarity graphs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on methodological clarity. We agree that additional specificity is needed in the computational experiments and graph-theoretic measures sections to allow verification of the claims. We will revise the manuscript to address both points.

read point-by-point responses

Referee: [Computational experiments] Computational experiments section: the procedure used to construct the artificial training sets is not described with sufficient specificity (e.g., exact sampling rules for instance frequencies, choice of distance metric or embedding for the graph, definition of clusters, and how the 'lumpy' structure is quantitatively reproduced). Without these details it is impossible to verify that the reported generalization performance is attributable to the claimed properties of the real head-camera data rather than other differences in variance or feature alignment.

Authors: We agree that the construction procedure for the artificial training sets must be specified in greater detail. In the revised manuscript we will add the exact sampling rules used to match instance frequencies, the embedding and distance metric employed for the graph, the quantitative definition of clusters, and the precise procedure for reproducing the lumpy similarity distribution. These additions will make it possible to confirm that generalization performance arises from the measured statistical properties. revision: yes
Referee: [Graph theoretic measures] Methods for graph-theoretic measures: the similarity metric, image representation, and cluster definition used to identify the 'lumpy' structure on the real data are not specified. These choices are load-bearing because the central claim requires that the artificial sets faithfully replicate the measured properties.

Authors: We acknowledge that the similarity metric, image representation, and cluster definition were not stated with sufficient precision. The revised manuscript will explicitly report these choices (including the embedding used, the similarity function, and the criteria for identifying clusters) so that readers can evaluate how faithfully the artificial sets reproduce the empirical structure. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurement of real data followed by independent computational tests on constructed sets.

full rationale

The paper measures instance frequencies and similarity structure from real infant head-camera images using graph-theoretic methods, then separately constructs artificial training sets that exhibit the observed lumpy similarity distributions and tests generalization performance on novel instances. No equations, fitted parameters, or self-citations reduce the reported generalization results to the input measurements by construction. The computational experiments are presented as independent verification rather than a renaming or self-referential prediction of the measured statistics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions in vision science about what image similarity means for categorization; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption Graph-theoretic measures of image similarity capture the perceptual features relevant to object category learning.
Invoked when the authors use these measures to characterize the visual experiences and when they construct matching artificial training sets.

pith-pipeline@v0.9.0 · 5714 in / 1252 out tokens · 25011 ms · 2026-05-25T07:33:30.159349+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 2 internal anchors

[1]

Gentner, Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning

D. Gentner, Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. BBN report ; no. 4854. Center for the Study of Reading Technical Report ; no

work page
[2]

Rosch, C

E. Rosch, C. B. Mervis, W. D. Gray, D. M. Johnson, P. Boyesbraem, Basic Objects in Natural Categories. Cognitive Psychol 8, 382–439 (1976)

work page 1976
[3]

Ayzenberg, M

V. Ayzenberg, M. Behrmann, The Dorsal Visual Pathway Represents Object -Centered Spatial Relations for Object Recognition. J Neurosci 42, 4693–4710 (2022)

work page 2022
[4]

Ayzenberg, M

V. Ayzenberg, M. Behrmann, Development of visual object recognition. Nat Rev Psychol 3, 123– 137 (2024)

work page 2024
[5]

Pinto, D

N. Pinto, D. D. Cox, J. J. DiCarlo, Why is real-world visual object recognition hard? PLoS Comput Biol 4, e27 (2008)

work page 2008
[6]

Child -basic object categories and early lexical development

C. B. Mervis, "Child -basic object categories and early lexical development" in Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization , U. Neisser, Ed. (Cambridge University Press, Cambridge, 1987), chap. 201-233

work page 1987
[7]

Bergelson, D

E. Bergelson, D. Swingley, At 6 –9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences 109, 3253–3258 (2012)

work page 2012
[8]

Campbell, D

J. Campbell, D. G. Hall, The scope of infants? early object word extensions. Cognition 228 (2022)

work page 2022
[9]

Garrison, G

H. Garrison, G. Baudet, E. Breitfeld, A. Aberman, E. Bergelson, Familiarity plays a small role in noun comprehension at 12-18 months. Infancy 25, 458–477 (2020)

work page 2020
[10]

R. M. Nosofsky, Attention, Similarity, and the Identification -Categorization Relationship. J Exp Psychol Gen 115, 39–57 (1986)

work page 1986
[11]

R. N. Shepard, Stimulus and Response Generalization: A Stochastic Model Relating Generalization to Distance in Psychological Space. Psychometrika 22, 325–345 (1957)

work page 1957
[12]

R. N. Shepard, Toward a Universal Law of Generalization for Psychological Science. Science 237, 1317–1323 (1987)

work page 1987
[13]

Edelman, Representation is representation of similarities

S. Edelman, Representation is representation of similarities. Behav Brain Sci 21, 449–+ (1998)

work page 1998
[14]

Hadsell, S

R. Hadsell, S. Chopra, Y. LeCun (2006) Dimensionality Reduction by Learning an Invariant Mapping. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp 1735–1742

work page 2006
[15]

Khosla et al., Supervised Contrastive Learning

P. Khosla et al., Supervised Contrastive Learning. Advances in Neural Information Processing Systems 33, NeurIPS 2020 33 (2020). Preprint. 14

work page 2020
[16]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks. Commun Acm 60, 84–90 (2017)

work page 2017
[17]

Bahri, E

Y. Bahri, E. Dyer, J. Kaplan, J. H. Lee, U. Sharma, Explaining neural scaling laws. P Natl Acad Sci USA 121 (2024)

work page 2024
[18]

Scaling Laws for Neural Language Models

J. Kaplan et al. , Scaling Laws for Neural Language Models. http://dx.doi.org/https://doi.org/10.48550/arXiv.2001.08361

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001
[19]

Raviv, G

L. Raviv, G. Lupyan, S. C. Green, How variability shapes learning and generalization. Trends in Cognitive Sciences 26, 462–483 (2022)

work page 2022
[20]

C. Sun, A. Shrivastava, S. Singh, A. Gupta (2017) Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. in 2017 IEEE/CVF International Conference on Computer Vision (ICCV) , pp 843–852

work page 2017
[21]

Taori et al

R. Taori et al. , Measuring Robustness to Natural Distribution Shifts in Image Classification. Advances in Neural Information Processing Systems 33, NeurIPS 2020 33 (2020)

work page 2020
[22]

E. M. Clerkin, E. Hart, J. M. Rehg, C. Yu, L. B. Smith, Real -world visual statistics and infants' first-learned object names. Philosophical Transactions of the Royal Society B: Biological Sciences 372 (2017)

work page 2017
[23]

E. M. Clerkin, L. B. Smith, Real -world statistics at two timescales and a mechanism for infant learning of object names. Proceedings of the National Academy of Sciences 119 (2022)

work page 2022
[24]

M. C. Frank, M. Braginsky, D. Yurovsky, V. A. Marchman, Wordbank: an open repository for developmental vocabulary data. Journal of Child Language 44, 677–694 (2016)

work page 2016
[25]

Principles of categorization

E. Rosch, "Principles of categorization" in Cognition and categorization , E. Rosch, B. B. Lloyd, Eds. (Lawrence Erlbaum Associates, 1978), pp. 27–48

work page 1978
[26]

Fenson et al., Variability in Early Communicative Development

L. Fenson et al., Variability in Early Communicative Development. Monogr Soc Res Child 59, R5– + (1994)

work page 1994
[27]

S. T. Piantadosi, Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review 21, 1112–1130 (2014)

work page 2014
[28]

L. B. Smith, S. Jayaraman, E. Clerkin, C. Yu, The Developing Infant Creates a Curriculum for Statistical Learning. Trends in Cognitive Sciences 22, 325–336 (2018)

work page 2018
[29]

G. K. Zipf, Human behavior and the principle of least effort (Addison-Wesley Press, 1949)

work page 1949
[30]

P. F. Carvalho, R. L. Goldstone, Putting category learning in order: Category structure and temporal arrangement affect the benefit of interleaved over blocked study. Mem Cognition 42, 481 –495 (2014)

work page 2014
[31]

S. C. Y. Chan et al. , Data Distributional Properties Drive Emergent In -Context Learning in Transformers. Adv Neur In 35 (2022)

work page 2022
[32]

Y. J. Lee, K. Grauman (2011) Learning the Easy Things First: Self -Paced Visual Category Discovery. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp 1721–1728

work page 2011
[33]

Salakhutdinov, A

R. Salakhutdinov, A. Torralba, J. Tenenbaum (2011) Learning to share visual appearance for multiclass object detection. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1481–1488

work page 2011
[34]

Domke, Y

J. Domke, Y. Aloimonos (2006) Deformation and Viewpoint Invariant Color Histograms. in Procedings of the British Machine Vision Conference 2006, pp 53.51–53.10

work page 2006
[35]

M. J. Swain, D. H. Ballard, Color Indexing. International Journal of Computer Vision 7, 11–32 (1991)

work page 1991
[36]

J. B. Luo, D. Crandall, Color object detection using spatial-color joint probability functions. IEEE T Image Process 15, 1443–1453 (2006)

work page 2006
[37]

Penrose, Random Geometric Graphs (Oxford University Press, ed

M. Penrose, Random Geometric Graphs (Oxford University Press, ed. 1st, 2003)

work page 2003
[38]

Diestel, Graph Theory, Graduate Texts in Mathematics (Springer Berlin Heidelberg, ed

R. Diestel, Graph Theory, Graduate Texts in Mathematics (Springer Berlin Heidelberg, ed. 6, 2025), 10.1007/978-3-662-70107-2

work page doi:10.1007/978-3-662-70107-2 2025
[39]

L. W. Beineke, O. R. Oellermann, R. E. Pippert, The average connectivity of a graph. Discrete Math 252, 31–45 (2002). Preprint. 15

work page 2002
[40]

C. R. Bowman, T. Iwashita, D. Zeithamova, Tracking prototype and exemplar representations in the brain across learning. Elife 9 (2020)

work page 2020
[41]

M. L. Schlichting, A. R. Preston, Memory integration: neural mechanisms and implications for behavior. Curr Opin Behav Sci 1, 1–8 (2015)

work page 2015
[42]

M. T. R. van Kesteren, D. J. Ruiter, G. Fernández, R. N. Henson, How schema and novelty augment memory formation. Trends Neurosci 35, 211–219 (2012)

work page 2012
[43]

A. X. Chang et al. , ShapeNet: An Information -Rich 3D Model Repository. http://dx.doi.org/10.48550/arXiv.1512.03012

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1512.03012
[44]

Oliva, A

A. Oliva, A. Torralba, Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. International Journal of Computer Vision 42, 145–175 (2001)

work page 2001
[45]

K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun (2016) Deep Residual Learning for Image Recognition. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

work page 2016
[46]

J. J. DiCarlo, D. D. Cox, Untangling invariant object recognition. Trends in Cognitive Sciences 11, 333–341 (2007)

work page 2007
[47]

D. Marr, H. K. Nishihara, Representation and recognition of the spatial organization of three - dimensional shapes. Proceedings of the Royal Society of London. Series B. Biological Sciences 200 (1978)

work page 1978
[48]

Poggio, S

T. Poggio, S. Edelman, A network that learns to recognize three-dimensional objects. Nature 343, 263–266 (1990)

work page 1990
[49]

J. J. DiCarlo, D. Zoccolan, N. C. Rust, How does the brain solve visual object recognition? Neuron 73, 415–434 (2012)

work page 2012
[50]

J. J. Gibson, The ecological approach to visual perception (Houghton, Mifflin and Company, 1979)

work page 1979
[51]

Graf, Coordinate transformations in object recognition

M. Graf, Coordinate transformations in object recognition. Psychol Bull 132, 920–945 (2006)

work page 2006
[52]

J. T. Todd, The visual perception of 3D shape. Trends in Cognitive Sciences 8, 115–121 (2004)

work page 2004
[53]

L. K. Slone, L. B. Smith, C. Yu, Self -generated variability in object images predicts vocabulary growth. Developmental Sci 22 (2019)

work page 2019
[54]

K. H. James, S. S. Jones, L. B. Smith, S. N. Swain, Young Children's Self-Generated Object Views and Object Recognition. J Cogn Dev 15, 393–401 (2014)

work page 2014
[55]

O. S. Kingo, P. Krojgaard, Object manipulation facilitates kind -based object individuation of shape-similar objects. Cognitive Dev 26, 87–103 (2011)

work page 2011
[56]

Stojanov et al

S. Stojanov et al. , Incremental Object Learning from Contiguous Views. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10.1109/Cvpr.2019.00898, 8769–8778 (2019)

work page doi:10.1109/cvpr.2019.00898 2019
[57]

L. B. Smith, S. S. Jones, B. Landau, L. Gershkoff -Stowe, L. Samuelson, Object name learning provides on-the-job training for attention. Psychol Sci 13, 13–19 (2002)

work page 2002
[58]

M. Xu, S. Yoon, A. Fuentes, D. S. Park, A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognition 137 (2023)

work page 2023
[59]

T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A Simple Framework for Contrastive Learning of Visual Representations. Pr Mach Learn Res 119 (2020)

work page 2020
[60]

Balestriero, L

R. Balestriero, L. Bottou, Y. LeCun, The Effects of Regularization and Data Augmentation are Class Dependent. Adv Neur In 35 (2022)

work page 2022
[61]

Devries, G

T. Devries, G. W. Taylor, Improved Regularization of Convolutional Neural Networks with Cutout

work page
[62]

C. F. G. Dos Santos, J. P. Papa, Avoiding Overfitting: A Survey on Regularization Methods for Convolutional Neural Networks. ACM Computing Surveys https://doi.org/10.1145/3510413, Article 123 (2022)

work page doi:10.1145/3510413 2022
[63]

Zhang, M

G. Zhang, M. Cisse, Y. Dauphin, D. Lopez -Paz (2018) mixup: Beyond Empirical Risk Minimization. in International Conference on Learning Representations (ICLR)

work page 2018
[64]

B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human -level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015)

work page 2015
[65]

F. H. Sinz, X. Pitkow, J. Reimer, M. Bethge, A. S. Tolias, Engineering a Less Artificial Intelligence. Neuron 103, 967–979 (2019). Preprint. 16

work page 2019
[66]

C. M. Fausey, S. Jayaraman, L. B. Smith, From faces to hands: Changing visual input in the first two years. Cognition 152, 101–107 (2016)

work page 2016
[67]

Bradski, The OpenCV library

G. Bradski, The OpenCV library. Dr Dobbs J 25, 120–+ (2000)

work page 2000
[68]

A. A. Hagberg , D. A. Schult , P. J. Swart (2008) Exploring network structure, dynamics, and function using NetworkX. in Python in Science , eds G. Varoquaux , T. Vaught, J. Millman (Pasadena, CA USA), pp 11–15

work page 2008
[69]

A. F. Pereira, K. H. James, S. S. Jones, L. B. Smith, Early biases and developmental changes in self-generated object views. Journal of Vision 10, 22–22 (2010)

work page 2010
[70]

Paszke et al., PyTorch: An Imperative Style, High -Performance Deep Learning Library

A. Paszke et al., PyTorch: An Imperative Style, High -Performance Deep Learning Library. 2019 Advances in Neural Information Processing Systems 32 (NeurIPS) 32 (2019). Preprint Supplementary Information Infant Visual Experiences of 8 Object Categories We present in Fig. S1 a visualization of results mentioned in the Main Text. Figure S1. Several character...

work page 2019

[1] [1]

Gentner, Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning

D. Gentner, Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. BBN report ; no. 4854. Center for the Study of Reading Technical Report ; no

work page

[2] [2]

Rosch, C

E. Rosch, C. B. Mervis, W. D. Gray, D. M. Johnson, P. Boyesbraem, Basic Objects in Natural Categories. Cognitive Psychol 8, 382–439 (1976)

work page 1976

[3] [3]

Ayzenberg, M

V. Ayzenberg, M. Behrmann, The Dorsal Visual Pathway Represents Object -Centered Spatial Relations for Object Recognition. J Neurosci 42, 4693–4710 (2022)

work page 2022

[4] [4]

Ayzenberg, M

V. Ayzenberg, M. Behrmann, Development of visual object recognition. Nat Rev Psychol 3, 123– 137 (2024)

work page 2024

[5] [5]

Pinto, D

N. Pinto, D. D. Cox, J. J. DiCarlo, Why is real-world visual object recognition hard? PLoS Comput Biol 4, e27 (2008)

work page 2008

[6] [6]

Child -basic object categories and early lexical development

C. B. Mervis, "Child -basic object categories and early lexical development" in Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization , U. Neisser, Ed. (Cambridge University Press, Cambridge, 1987), chap. 201-233

work page 1987

[7] [7]

Bergelson, D

E. Bergelson, D. Swingley, At 6 –9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences 109, 3253–3258 (2012)

work page 2012

[8] [8]

Campbell, D

J. Campbell, D. G. Hall, The scope of infants? early object word extensions. Cognition 228 (2022)

work page 2022

[9] [9]

Garrison, G

H. Garrison, G. Baudet, E. Breitfeld, A. Aberman, E. Bergelson, Familiarity plays a small role in noun comprehension at 12-18 months. Infancy 25, 458–477 (2020)

work page 2020

[10] [10]

R. M. Nosofsky, Attention, Similarity, and the Identification -Categorization Relationship. J Exp Psychol Gen 115, 39–57 (1986)

work page 1986

[11] [11]

R. N. Shepard, Stimulus and Response Generalization: A Stochastic Model Relating Generalization to Distance in Psychological Space. Psychometrika 22, 325–345 (1957)

work page 1957

[12] [12]

R. N. Shepard, Toward a Universal Law of Generalization for Psychological Science. Science 237, 1317–1323 (1987)

work page 1987

[13] [13]

Edelman, Representation is representation of similarities

S. Edelman, Representation is representation of similarities. Behav Brain Sci 21, 449–+ (1998)

work page 1998

[14] [14]

Hadsell, S

R. Hadsell, S. Chopra, Y. LeCun (2006) Dimensionality Reduction by Learning an Invariant Mapping. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp 1735–1742

work page 2006

[15] [15]

Khosla et al., Supervised Contrastive Learning

P. Khosla et al., Supervised Contrastive Learning. Advances in Neural Information Processing Systems 33, NeurIPS 2020 33 (2020). Preprint. 14

work page 2020

[16] [16]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks. Commun Acm 60, 84–90 (2017)

work page 2017

[17] [17]

Bahri, E

Y. Bahri, E. Dyer, J. Kaplan, J. H. Lee, U. Sharma, Explaining neural scaling laws. P Natl Acad Sci USA 121 (2024)

work page 2024

[18] [18]

Scaling Laws for Neural Language Models

J. Kaplan et al. , Scaling Laws for Neural Language Models. http://dx.doi.org/https://doi.org/10.48550/arXiv.2001.08361

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001

[19] [19]

Raviv, G

L. Raviv, G. Lupyan, S. C. Green, How variability shapes learning and generalization. Trends in Cognitive Sciences 26, 462–483 (2022)

work page 2022

[20] [20]

C. Sun, A. Shrivastava, S. Singh, A. Gupta (2017) Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. in 2017 IEEE/CVF International Conference on Computer Vision (ICCV) , pp 843–852

work page 2017

[21] [21]

Taori et al

R. Taori et al. , Measuring Robustness to Natural Distribution Shifts in Image Classification. Advances in Neural Information Processing Systems 33, NeurIPS 2020 33 (2020)

work page 2020

[22] [22]

E. M. Clerkin, E. Hart, J. M. Rehg, C. Yu, L. B. Smith, Real -world visual statistics and infants' first-learned object names. Philosophical Transactions of the Royal Society B: Biological Sciences 372 (2017)

work page 2017

[23] [23]

E. M. Clerkin, L. B. Smith, Real -world statistics at two timescales and a mechanism for infant learning of object names. Proceedings of the National Academy of Sciences 119 (2022)

work page 2022

[24] [24]

M. C. Frank, M. Braginsky, D. Yurovsky, V. A. Marchman, Wordbank: an open repository for developmental vocabulary data. Journal of Child Language 44, 677–694 (2016)

work page 2016

[25] [25]

Principles of categorization

E. Rosch, "Principles of categorization" in Cognition and categorization , E. Rosch, B. B. Lloyd, Eds. (Lawrence Erlbaum Associates, 1978), pp. 27–48

work page 1978

[26] [26]

Fenson et al., Variability in Early Communicative Development

L. Fenson et al., Variability in Early Communicative Development. Monogr Soc Res Child 59, R5– + (1994)

work page 1994

[27] [27]

S. T. Piantadosi, Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review 21, 1112–1130 (2014)

work page 2014

[28] [28]

L. B. Smith, S. Jayaraman, E. Clerkin, C. Yu, The Developing Infant Creates a Curriculum for Statistical Learning. Trends in Cognitive Sciences 22, 325–336 (2018)

work page 2018

[29] [29]

G. K. Zipf, Human behavior and the principle of least effort (Addison-Wesley Press, 1949)

work page 1949

[30] [30]

P. F. Carvalho, R. L. Goldstone, Putting category learning in order: Category structure and temporal arrangement affect the benefit of interleaved over blocked study. Mem Cognition 42, 481 –495 (2014)

work page 2014

[31] [31]

S. C. Y. Chan et al. , Data Distributional Properties Drive Emergent In -Context Learning in Transformers. Adv Neur In 35 (2022)

work page 2022

[32] [32]

Y. J. Lee, K. Grauman (2011) Learning the Easy Things First: Self -Paced Visual Category Discovery. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp 1721–1728

work page 2011

[33] [33]

Salakhutdinov, A

R. Salakhutdinov, A. Torralba, J. Tenenbaum (2011) Learning to share visual appearance for multiclass object detection. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1481–1488

work page 2011

[34] [34]

Domke, Y

J. Domke, Y. Aloimonos (2006) Deformation and Viewpoint Invariant Color Histograms. in Procedings of the British Machine Vision Conference 2006, pp 53.51–53.10

work page 2006

[35] [35]

M. J. Swain, D. H. Ballard, Color Indexing. International Journal of Computer Vision 7, 11–32 (1991)

work page 1991

[36] [36]

J. B. Luo, D. Crandall, Color object detection using spatial-color joint probability functions. IEEE T Image Process 15, 1443–1453 (2006)

work page 2006

[37] [37]

Penrose, Random Geometric Graphs (Oxford University Press, ed

M. Penrose, Random Geometric Graphs (Oxford University Press, ed. 1st, 2003)

work page 2003

[38] [38]

Diestel, Graph Theory, Graduate Texts in Mathematics (Springer Berlin Heidelberg, ed

R. Diestel, Graph Theory, Graduate Texts in Mathematics (Springer Berlin Heidelberg, ed. 6, 2025), 10.1007/978-3-662-70107-2

work page doi:10.1007/978-3-662-70107-2 2025

[39] [39]

L. W. Beineke, O. R. Oellermann, R. E. Pippert, The average connectivity of a graph. Discrete Math 252, 31–45 (2002). Preprint. 15

work page 2002

[40] [40]

C. R. Bowman, T. Iwashita, D. Zeithamova, Tracking prototype and exemplar representations in the brain across learning. Elife 9 (2020)

work page 2020

[41] [41]

M. L. Schlichting, A. R. Preston, Memory integration: neural mechanisms and implications for behavior. Curr Opin Behav Sci 1, 1–8 (2015)

work page 2015

[42] [42]

M. T. R. van Kesteren, D. J. Ruiter, G. Fernández, R. N. Henson, How schema and novelty augment memory formation. Trends Neurosci 35, 211–219 (2012)

work page 2012

[43] [43]

A. X. Chang et al. , ShapeNet: An Information -Rich 3D Model Repository. http://dx.doi.org/10.48550/arXiv.1512.03012

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1512.03012

[44] [44]

Oliva, A

A. Oliva, A. Torralba, Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. International Journal of Computer Vision 42, 145–175 (2001)

work page 2001

[45] [45]

K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun (2016) Deep Residual Learning for Image Recognition. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

work page 2016

[46] [46]

J. J. DiCarlo, D. D. Cox, Untangling invariant object recognition. Trends in Cognitive Sciences 11, 333–341 (2007)

work page 2007

[47] [47]

D. Marr, H. K. Nishihara, Representation and recognition of the spatial organization of three - dimensional shapes. Proceedings of the Royal Society of London. Series B. Biological Sciences 200 (1978)

work page 1978

[48] [48]

Poggio, S

T. Poggio, S. Edelman, A network that learns to recognize three-dimensional objects. Nature 343, 263–266 (1990)

work page 1990

[49] [49]

J. J. DiCarlo, D. Zoccolan, N. C. Rust, How does the brain solve visual object recognition? Neuron 73, 415–434 (2012)

work page 2012

[50] [50]

J. J. Gibson, The ecological approach to visual perception (Houghton, Mifflin and Company, 1979)

work page 1979

[51] [51]

Graf, Coordinate transformations in object recognition

M. Graf, Coordinate transformations in object recognition. Psychol Bull 132, 920–945 (2006)

work page 2006

[52] [52]

J. T. Todd, The visual perception of 3D shape. Trends in Cognitive Sciences 8, 115–121 (2004)

work page 2004

[53] [53]

L. K. Slone, L. B. Smith, C. Yu, Self -generated variability in object images predicts vocabulary growth. Developmental Sci 22 (2019)

work page 2019

[54] [54]

K. H. James, S. S. Jones, L. B. Smith, S. N. Swain, Young Children's Self-Generated Object Views and Object Recognition. J Cogn Dev 15, 393–401 (2014)

work page 2014

[55] [55]

O. S. Kingo, P. Krojgaard, Object manipulation facilitates kind -based object individuation of shape-similar objects. Cognitive Dev 26, 87–103 (2011)

work page 2011

[56] [56]

Stojanov et al

S. Stojanov et al. , Incremental Object Learning from Contiguous Views. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10.1109/Cvpr.2019.00898, 8769–8778 (2019)

work page doi:10.1109/cvpr.2019.00898 2019

[57] [57]

L. B. Smith, S. S. Jones, B. Landau, L. Gershkoff -Stowe, L. Samuelson, Object name learning provides on-the-job training for attention. Psychol Sci 13, 13–19 (2002)

work page 2002

[58] [58]

M. Xu, S. Yoon, A. Fuentes, D. S. Park, A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognition 137 (2023)

work page 2023

[59] [59]

T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A Simple Framework for Contrastive Learning of Visual Representations. Pr Mach Learn Res 119 (2020)

work page 2020

[60] [60]

Balestriero, L

R. Balestriero, L. Bottou, Y. LeCun, The Effects of Regularization and Data Augmentation are Class Dependent. Adv Neur In 35 (2022)

work page 2022

[61] [61]

Devries, G

T. Devries, G. W. Taylor, Improved Regularization of Convolutional Neural Networks with Cutout

work page

[62] [62]

C. F. G. Dos Santos, J. P. Papa, Avoiding Overfitting: A Survey on Regularization Methods for Convolutional Neural Networks. ACM Computing Surveys https://doi.org/10.1145/3510413, Article 123 (2022)

work page doi:10.1145/3510413 2022

[63] [63]

Zhang, M

G. Zhang, M. Cisse, Y. Dauphin, D. Lopez -Paz (2018) mixup: Beyond Empirical Risk Minimization. in International Conference on Learning Representations (ICLR)

work page 2018

[64] [64]

B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human -level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015)

work page 2015

[65] [65]

F. H. Sinz, X. Pitkow, J. Reimer, M. Bethge, A. S. Tolias, Engineering a Less Artificial Intelligence. Neuron 103, 967–979 (2019). Preprint. 16

work page 2019

[66] [66]

C. M. Fausey, S. Jayaraman, L. B. Smith, From faces to hands: Changing visual input in the first two years. Cognition 152, 101–107 (2016)

work page 2016

[67] [67]

Bradski, The OpenCV library

G. Bradski, The OpenCV library. Dr Dobbs J 25, 120–+ (2000)

work page 2000

[68] [68]

A. A. Hagberg , D. A. Schult , P. J. Swart (2008) Exploring network structure, dynamics, and function using NetworkX. in Python in Science , eds G. Varoquaux , T. Vaught, J. Millman (Pasadena, CA USA), pp 11–15

work page 2008

[69] [69]

A. F. Pereira, K. H. James, S. S. Jones, L. B. Smith, Early biases and developmental changes in self-generated object views. Journal of Vision 10, 22–22 (2010)

work page 2010

[70] [70]

Paszke et al., PyTorch: An Imperative Style, High -Performance Deep Learning Library

A. Paszke et al., PyTorch: An Imperative Style, High -Performance Deep Learning Library. 2019 Advances in Neural Information Processing Systems 32 (NeurIPS) 32 (2019). Preprint Supplementary Information Infant Visual Experiences of 8 Object Categories We present in Fig. S1 a visualization of results mentioned in the Main Text. Figure S1. Several character...

work page 2019