pith. the verified trust layer for science. sign in

arxiv: 2604.05777 · v2 · submitted 2026-04-07 · 💻 cs.AI

Emergent social transmission of model-based representations without inference

Pith reviewed 2026-05-11 01:53 UTC · model grok-4.3

classification 💻 cs.AI
keywords social learningreinforcement learningmodel-based representationscultural transmissionheuristic learningexpert observationmentalizingsimulation
0
0 comments X p. Extension

The pith

Simple observation of expert actions allows naive agents to acquire model-based representations without inferring mental states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates through reinforcement learning simulations that a naive agent searching for rewards can develop rich internal models of its environment by observing an expert's actions alone, without any inference about the expert's beliefs. These observations are used through basic heuristics that either guide which actions the learner tries or increase the value assigned to them, which in turn shapes the learner's experience and pulls its representations closer to the expert's. Model-based learners, who maintain an internal model of how the world works, gain the largest advantage from this exposure, reaching expert-like knowledge faster than agents that learn in isolation. If this holds, it indicates that flexible knowledge can spread culturally through minimal social mechanisms that piggyback on ordinary individual learning rather than requiring costly mentalizing.

Core claim

In simulations of a reconfigurable environment where agents search for rewards, a naive learner that either copies observed actions or boosts their values based on an expert's behavior develops internal representations that converge toward those of the expert. Model-based agents, which learn transition and reward models of the environment, show faster convergence and more expert-like models than model-free agents or solo learners, all without any mechanism for inferring the expert's mental states.

What carries the argument

Heuristic action selection or value boosting based on observed expert actions, which biases the learner's experience sampling to indirectly transmit model-based representations.

If this is right

  • Model-based learners acquire expert-like internal models of the environment faster when exposed to social cues than when learning alone.
  • Higher-level representations can be transmitted culturally through simple behavioral observation that exploits standard reinforcement learning updates.
  • Mentalizing or belief inference is not required for the spread of flexible, model-based knowledge in these simulated settings.
  • This form of transmission works across different environment configurations where rewards must be located.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same bias mechanism could allow groups of agents to align their internal models over repeated interactions without explicit communication.
  • Artificial agents designed with model-based reinforcement learning may naturally support cultural-like knowledge sharing when given access to observed behavior.
  • Human social learning studies could test whether brief exposure to expert actions produces similar shifts in participants' causal models of a task.
  • Extending the environment to include noisy observations or multiple experts might reveal how robust the convergence remains under more realistic conditions.

Load-bearing premise

The specific rules for how the learner selects actions or boosts values from observed expert behavior are enough to capture the key social learning processes without extra mechanisms.

What would settle it

Run the same simulations but disable the heuristic action selection and value boosting rules; if the learner's representations no longer converge toward the expert's, the transmission mechanism fails.

Figures

Figures reproduced from arXiv: 2604.05777 by Charley M. Wu, Claudio Tennie, Miriam Bautista-Salinero, Silja Ke{\ss}ler.

Figure 1
Figure 1. Figure 1: Simulations. a) Grid-world environment composed of four quadrants with fixed walls and designated reward states. b) For each simulation, quadrants were randomly rotated and arranged, while rewards were randomly assigned to designated reward states. The simulations comprised a training phase with an observable expert agent (over-trained model-based RL), followed by a test phase without the expert. A differe… view at source ↗
Figure 2
Figure 2. Figure 2: Learning Strategies. Six learning conditions were tested, combining model-free (MF) and model-based (MB) RL with either asocial learning (AS) or social learning from an expert. Social learning was implemented at two levels: policy-based (DB) and value-based (VS). 1-10) and a test phase (episodes 11-20; Fig. 1b). During the training phase, learners were accompanied by an expert, an asocial model-based RL ag… view at source ↗
Figure 3
Figure 3. Figure 3: Experiment 1. a) Performance: mean cumulative reward for different learning strategies combining (a-)social learning (SL) and reinforcement learning (RL) strategies. The vertical dotted line represents the transition between training and test phase. b) Value transfer: mean correlation between learner’s and expert’s value function, grouped by distance to reward states. c) Belief transfer: mean correlation b… view at source ↗
Figure 4
Figure 4. Figure 4: Experiments 2 and 3. a) Exp. 2 performance. b) Exp. 2 value accuracy: mean correlation between the learner’s and the optimal value function for the modified re￾ward structure (Eq. 7), grouped by distance to reward. c) Exp. 3 performance. d) Exp. 3 belief transfer comparison (MB only) between the baseline (x-axis) and new start loca￾tion in the test phase (y-axis). Each point is a single simula￾tion, averag… view at source ↗
read the original abstract

How do people acquire rich, flexible knowledge about their environment from others despite limited cognitive capacity? Humans are often thought to rely on computationally costly mentalizing, such as inferring others' beliefs. In contrast, cultural evolution emphasizes that behavioral transmission can be supported by simple social cues. Using reinforcement learning simulations, we show how minimal social learning can indirectly transmit higher-level representations. We simulate a na\"ive agent searching for rewards in a reconfigurable environment, learning either alone or by observing an expert - crucially, without inferring mental states. Instead, the learner heuristically selects actions or boosts value representations based on observed actions. Our results demonstrate that these cues bias the learner's experience, causing its representation to converge toward the expert's. Model-based learners benefit most from social exposure, showing faster learning and more expert-like representations. These findings show how cultural transmission can arise from simple, non-mentalizing processes exploiting asocial learning mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript uses reinforcement learning simulations to show that a naive agent can acquire model-based representations converging toward an expert's through minimal social cues—specifically, heuristic action selection or value boosting based solely on observed expert actions—without mental state inference. These cues bias the learner's experience in a reconfigurable reward environment, with model-based learners exhibiting faster learning and more expert-like representations than model-free ones.

Significance. If the implementation details confirm that the heuristics operate from observable actions alone, the work is significant as a computational demonstration that cultural transmission of higher-level, flexible knowledge can emerge from simple, non-mentalizing processes that exploit standard asocial RL mechanisms. It provides a concrete alternative to inference-heavy accounts of social learning and could inform both cultural evolution theory and the design of observational learning in AI agents.

major comments (2)
  1. [Methods] Methods section (value boosting and action selection rules): The paper must explicitly demonstrate that value boosting is computed exclusively from observable expert actions without supplying the learner with the expert's internal Q-values, transition model, or shared state representation. If the simulation provides any of these (as is common in standard model-based RL implementations), the reported convergence would be an artifact of the setup rather than emergent from raw behavioral cues, directly undermining the central claim of transmission 'without inference'.
  2. [Results] Results section (model-based learner advantages): The qualitative claim that model-based learners benefit most (faster learning, more expert-like representations) lacks reported statistical tests, effect sizes, or ablation controls varying the heuristic parameters. Without these, it is unclear whether the differential benefit is robust or specific to the chosen action-selection and boosting rules, weakening support for the broader conclusion about model-based representations.
minor comments (2)
  1. [Abstract] Abstract: The string 'naïve' is rendered with an encoding artifact ('naïve'); correct to standard spelling for clarity.
  2. [Methods] Methods: All simulation parameters (learning rates, environment reconfiguration schedule, reward magnitudes, number of trials) should be listed explicitly in the main text or a table to enable full reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have prompted us to strengthen the clarity and rigor of our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Methods] Methods section (value boosting and action selection rules): The paper must explicitly demonstrate that value boosting is computed exclusively from observable expert actions without supplying the learner with the expert's internal Q-values, transition model, or shared state representation. If the simulation provides any of these (as is common in standard model-based RL implementations), the reported convergence would be an artifact of the setup rather than emergent from raw behavioral cues, directly undermining the central claim of transmission 'without inference'.

    Authors: We agree that this distinction is crucial for the validity of our central claim. Our simulations are designed such that the learner only observes the expert's actions in each state, without any access to the expert's internal Q-values, transition model, or state representations. The value boosting mechanism simply increments the value estimate for the observed action by a fixed heuristic amount, and the action selection heuristic increases the probability of selecting the expert's action, both based purely on behavioral observation. No mental state inference or direct knowledge transfer occurs. To address the referee's concern, we will revise the Methods section to include explicit pseudocode and a statement confirming that all social cues derive exclusively from observable actions. This will eliminate any ambiguity regarding the implementation. revision: yes

  2. Referee: [Results] Results section (model-based learner advantages): The qualitative claim that model-based learners benefit most (faster learning, more expert-like representations) lacks reported statistical tests, effect sizes, or ablation controls varying the heuristic parameters. Without these, it is unclear whether the differential benefit is robust or specific to the chosen action-selection and boosting rules, weakening support for the broader conclusion about model-based representations.

    Authors: We acknowledge the value of quantitative support for our claims. While the current results show consistent patterns across multiple runs, we will enhance the Results section by adding appropriate statistical tests (such as independent t-tests comparing learning curves and representation similarity scores between conditions), reporting effect sizes, and including ablation analyses that vary the heuristic parameters (e.g., different boosting strengths and selection biases). These additions will demonstrate the robustness of the model-based advantage and will be presented in the main text and supplementary information. revision: yes

Circularity Check

0 steps flagged

No circularity: results emerge from forward simulation dynamics

full rationale

The paper reports outcomes from explicit reinforcement learning simulations of naive agents interacting with an environment and an expert's observable actions. Heuristic rules for action selection and value boosting are applied directly to generate experience; the convergence of model-based representations is an observed consequence of running these dynamics, not a quantity defined in terms of the target result or fitted to it. No equations, self-citations, or ansatzes are invoked that reduce the central claim to its own inputs by construction. The derivation chain is therefore self-contained against the simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the chosen heuristics for action copying and value boosting are representative of minimal social learning and that the simulated environments capture the relevant structure of reconfigurable real-world tasks. No explicit free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Heuristic action selection and value boosting based on observed expert actions are sufficient to bias experience toward expert-like representations.
    Invoked in the description of the learner's update rules without further justification in the abstract.

pith-pipeline@v0.9.0 · 5465 in / 1291 out tokens · 29433 ms · 2026-05-11T01:53:45.586539+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

  1. [1]

    \ Dayan, P

    antonov2025exploring APACrefauthors Antonov, G. \ Dayan, P. APACrefauthors \ 2025 . Exploring replay Exploring replay . Nature Communications 16 1 1657

  2. [2]

    Baker, Rebecca Saxe, and Joshua B

    Baker_Saxe_Tenenbaum_2009 APACrefauthors Baker, C L. , Saxe, R. \ Tenenbaum, J B. APACrefauthors \ 2009 . Action understanding as inverse planning Action understanding as inverse planning . Cognition 113 3 329–349 . APACrefDOI doi:10.1016/j.cognition.2009.07.005 APACrefDOI

  3. [3]

    ve, adult, captive chimpanzees do not socially learn how to make and use sharp stone tools Na \

    bandini2023naive APACrefauthors Bandini, E. \ Tennie, C. APACrefauthors \ 2023 . Na \" ve, adult, captive chimpanzees do not socially learn how to make and use sharp stone tools Na \" ve, adult, captive chimpanzees do not socially learn how to make and use sharp stone tools . Scientific Reports 13 1 22733

  4. [4]

    APACrefauthors \ 1957

    Bellman1957 APACrefauthors Bellman, R. APACrefauthors \ 1957 . Dynamic Programming Dynamic programming \ ( 1 \ ). Princeton, NJ, USA Princeton University Press

  5. [5]

    \ Richerson, P J

    boyd1988culture APACrefauthors Boyd, R. \ Richerson, P J. APACrefauthors \ 1988 . Culture and the evolutionary process Culture and the evolutionary process . University of Chicago press

  6. [6]

    \ Russon, A E

    byrne_learning_1998 APACrefauthors Byrne, R W. \ Russon, A E. APACrefauthors \ 1998 10 . Learning by imitation: A hierarchical approach Learning by imitation: A hierarchical approach . Behavioral and Brain Sciences 21 5 667--684 . APACrefDOI doi:10.1017/S0140525X98001745 APACrefDOI

  7. [7]

    , Carpenter, M

    call2005copying APACrefauthors Call, J. , Carpenter, M. \ Tomasello, M. APACrefauthors \ 2005 . Copying results and copying actions in the process of social learning: chimpanzees (Pan troglodytes) and human children (Homo sapiens) Copying results and copying actions in the process of social learning: chimpanzees (pan troglodytes) and human children (homo ...

  8. [8]

    , Pauli, W M

    collette2017neural APACrefauthors Collette, S. , Pauli, W M. , Bossaerts, P. \ O'Doherty, J. APACrefauthors \ 2017 . Neural computations underlying inverse reinforcement learning in the human brain Neural computations underlying inverse reinforcement learning in the human brain . Elife 6 e29718

  9. [9]

    , Gershman, S J

    daw2011model APACrefauthors Daw, N D. , Gershman, S J. , Seymour, B. , Dayan, P. \ Dolan, R J. APACrefauthors \ 2011 . Model-based influences on humans' choices and striatal prediction errors Model-based influences on humans' choices and striatal prediction errors . Neuron 69 6 1204--1215

  10. [10]

    \ Niv, Y

    drummond2020model APACrefauthors Drummond, N. \ Niv, Y. APACrefauthors \ 2020 . Model-based decision making and model-free learning Model-based decision making and model-free learning . Current Biology 30 15 R860--R865

  11. [11]

    APACrefauthors \ 1988

    Galef1988-bc APACrefauthors Galef, B G., Jr. APACrefauthors \ 1988 . Imitation in animals: History, definition and interpretation of the data from the psychological laboratory Imitation in animals: History, definition and interpretation of the data from the psychological laboratory . T R. Zentall\ B G. Galef Jr\ ( ), Social learning: Psychological and Bio...

  12. [12]

    , Kello, C T

    garg2022individual APACrefauthors Garg, K. , Kello, C T. \ Smaldino, P E. APACrefauthors \ 2022 . Individual exploration and selective social learning: balancing exploration--exploitation trade-offs in collective foraging Individual exploration and selective social learning: balancing exploration--exploitation trade-offs in collective foraging . Journal o...

  13. [13]

    \ Schmidhuber, J

    ha2018recurrent APACrefauthors Ha, D. \ Schmidhuber, J. APACrefauthors \ 2018 . Recurrent world models facilitate policy evolution Recurrent world models facilitate policy evolution . Advances in neural information processing systems 31

  14. [14]

    , Berg, J J

    hackel2019model APACrefauthors Hackel, L M. , Berg, J J. , Lindstr \"o m, B R. \ Amodio, D M. APACrefauthors \ 2019 . Model-based and model-free social cognition: investigating the role of habit in social attitude formation and choice Model-based and model-free social cognition: investigating the role of habit in social attitude formation and choice . Fro...

  15. [15]

    , Mende-Siedlecki, P

    hackel2020reinforcement APACrefauthors Hackel, L M. , Mende-Siedlecki, P. \ Amodio, D M. APACrefauthors \ 2020 . Reinforcement learning in social interaction: The distinguishing role of trait inference Reinforcement learning in social interaction: The distinguishing role of trait inference . Journal of Experimental Social Psychology 88 103948

  16. [16]

    , Pasukonis, J

    hafner2025mastering APACrefauthors Hafner, D. , Pasukonis, J. , Ba, J. \ Lillicrap, T. APACrefauthors \ 2025 . Mastering diverse control tasks through world models Mastering diverse control tasks through world models . Nature 1--7

  17. [17]

    , Berdahl, A M

    hawkins2023flexible APACrefauthors Hawkins, R D. , Berdahl, A M. , Pentland, A S. , Tenenbaum, J B. , Goodman, N D. \ Krafft, P. APACrefauthors \ 2023 . Flexible social inference facilitates targeted social learning when rewards are not observable Flexible social inference facilitates targeted social learning when rewards are not observable . Nature Human...

  18. [18]

    APACrefauthors \ 2018

    heyes2018cognitive APACrefauthors Heyes, C. APACrefauthors \ 2018 . Cognitive gadgets: The cultural evolution of thinking Cognitive gadgets: The cultural evolution of thinking . Harvard University Press

  19. [19]

    \ Whiten, A

    horner2005causal APACrefauthors Horner, V. \ Whiten, A. APACrefauthors \ 2005 . Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens) Causal knowledge and imitation/emulation switching in chimpanzees (pan troglodytes) and children (homo sapiens) . Animal cognition 8 3 164--181

  20. [20]

    APACrefauthors \ 2019

    Jara-Ettinger_2019 APACrefauthors Jara-Ettinger, J. APACrefauthors \ 2019 . Theory of mind as inverse reinforcement learning Theory of mind as inverse reinforcement learning . Current Opinion in Behavioral Sciences 29 105–110

  21. [21]

    , Lucas, C G

    jern2017people APACrefauthors Jern, A. , Lucas, C G. \ Kemp, C. APACrefauthors \ 2017 . People learn other people’s preferences through inverse decision-making People learn other people’s preferences through inverse decision-making . Cognition 168 46--64

  22. [22]

    \ Mesoudi, A

    jimenez2019prestige APACrefauthors Jim \'e nez, \'A V. \ Mesoudi, A. APACrefauthors \ 2019 . Prestige-biased social learning: Current evidence and outstanding questions Prestige-biased social learning: Current evidence and outstanding questions . Palgrave Communications 5 1

  23. [23]

    APACrefauthors \ 2004 02

    Laland2004-pq APACrefauthors Laland, K N. APACrefauthors \ 2004 02 . Social learning strategies Social learning strategies . Learning & Behavior 32 1 4--14 . APACrefDOI doi:10.3758/bf03196002 APACrefDOI

  24. [24]

    \ Nielsen, M

    legare2015imitation APACrefauthors Legare, C H. \ Nielsen, M. APACrefauthors \ 2015 . Imitation and innovation: The dual engines of cultural learning Imitation and innovation: The dual engines of cultural learning . Trends in cognitive sciences 19 11 688--699

  25. [25]

    , Littman, M L

    lehnert2020reward APACrefauthors Lehnert, L. , Littman, M L. \ Frank, M J. APACrefauthors \ 2020 . Reward-predictive representations generalize across tasks in reinforcement learning Reward-predictive representations generalize across tasks in reinforcement learning . PLoS computational biology 16 10 e1008317

  26. [26]

    , Young, A G

    lyons2007hidden APACrefauthors Lyons, D E. , Young, A G. \ Keil, F C. APACrefauthors \ 2007 . The hidden structure of overimitation The hidden structure of overimitation . Proceedings of the National Academy of Sciences 104 50 19751--19756

  27. [27]

    , Zhou, H

    mantiuk2025curiosity APACrefauthors Mantiuk, F. , Zhou, H. \ Wu, C M. APACrefauthors \ 2025 . From Curiosity to Competence: How World Models Interact with the Dynamics of Exploration From curiosity to competence: How world models interact with the dynamics of exploration . A. Ruggeri, D. Barner, C. Walker \ N. Bramley\ ( ), Proceedings of the 47th Annual ...

  28. [28]

    , Bell, A V

    mcelreath2008beyond APACrefauthors McElreath, R. , Bell, A V. , Efferson, C. , Lubell, M. , Richerson, P J. \ Waring, T. APACrefauthors \ 2008 . Beyond existence and aiming outside the laboratory: estimating frequency-dependent and pay-off-biased social learning strategies Beyond existence and aiming outside the laboratory: estimating frequency-dependent ...

  29. [29]

    APACrefauthors \ 2016

    mesoudi2016cultural APACrefauthors Mesoudi, A. APACrefauthors \ 2016 . Cultural evolution: a review of theory, findings and controversies Cultural evolution: a review of theory, findings and controversies . Evolutionary biology 43 4 481--497

  30. [30]

    , Botvinick, M M

    miller2017dorsal APACrefauthors Miller, K J. , Botvinick, M M. \ Brody, C D. APACrefauthors \ 2017 . Dorsal hippocampus contributes to model-based planning Dorsal hippocampus contributes to model-based planning . Nature neuroscience 20 9 1269--1276

  31. [31]

    , Bonnet, E

    najar2020actions APACrefauthors Najar, A. , Bonnet, E. , Bahrami, B. \ Palminteri, S. APACrefauthors \ 2020 . The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning . PLoS biology 18 12 e3001028

  32. [32]

    , Bush, D

    olafsdottir2018role APACrefauthors \'O lafsd \'o ttir, H F. , Bush, D. \ Barry, C. APACrefauthors \ 2018 . The role of hippocampal replay in memory and planning The role of hippocampal replay in memory and planning . Current Biology 28 1 R37--R50

  33. [33]

    , Knapska, E

    olsson2020neural APACrefauthors Olsson, A. , Knapska, E. \ Lindstr \"o m, B. APACrefauthors \ 2020 . The neural and computational systems of social learning The neural and computational systems of social learning . Nature Reviews Neuroscience 21 4 197--212

  34. [34]

    , Lee, S W

    o2015structure APACrefauthors O’Doherty, J P. , Lee, S W. \ McNamee, D. APACrefauthors \ 2015 . The structure of reinforcement-learning mechanisms in the human brain The structure of reinforcement-learning mechanisms in the human brain . Current Opinion in Behavioral Sciences 1 94--100

  35. [35]

    , Go \" ame, S

    park2017integration APACrefauthors Park, S A. , Go \" ame, S. , O'Connor, D A. \ Dreher, J C. APACrefauthors \ 2017 . Integration of individual and social information for decision-making in groups of different sizes Integration of individual and social information for decision-making in groups of different sizes . PLoS Biology 15 6 e2001958

  36. [36]

    \ Schaal, S

    peters2008reinforcement APACrefauthors Peters, J. \ Schaal, S. APACrefauthors \ 2008 . Reinforcement learning of motor skills with policy gradients Reinforcement learning of motor skills with policy gradients . Neural networks 21 4 682--697

  37. [37]

    , Boyd, R

    rendell2010copy APACrefauthors Rendell, L. , Boyd, R. , Cownden, D. , Enquist, M. , Eriksson, K. , Feldman, M W. Laland, K N. APACrefauthors \ 2010 . Why copy others? Insights from the social learning strategies tournament Why copy others? insights from the social learning strategies tournament . Science 328 5975 208--213

  38. [38]

    , Fogarty, L

    rendell2010rogers APACrefauthors Rendell, L. , Fogarty, L. \ Laland, K N. APACrefauthors \ 2010 . ROGERS’PARADOX RECAST AND RESOLVED: POPULATION STRUCTURE AND THE EVOLUTION OF SOCIAL LEARNING STRATEGIES Rogers’paradox recast and resolved: Population structure and the evolution of social learning strategies . Evolution 64 2 534--548

  39. [39]

    , Bolic, M

    roberts2025environmental APACrefauthors Roberts-Gaal, X. , Bolic, M. \ Cushman, F A. APACrefauthors \ 2025 . Environmental variability shapes the representational format of cultural learning Environmental variability shapes the representational format of cultural learning . Proceedings of the National Academy of Sciences 122 28 e2505283122

  40. [40]

    APACrefauthors \ 1998 07

    Russell_1998 APACrefauthors Russell, S. APACrefauthors \ 1998 07 . Learning agents for uncertain environments Learning agents for uncertain environments . ( 101–103). Madison Wisconsin USA ACM . APACrefDOI doi:10.1145/279943.279964 APACrefDOI

  41. [41]

    , Sakai, Y

    sato2023state APACrefauthors Sato, Y. , Sakai, Y. \ Hirata, S. APACrefauthors \ 2023 . State-transition-free reinforcement learning in chimpanzees (Pan troglodytes) State-transition-free reinforcement learning in chimpanzees (pan troglodytes) . Learning & behavior 51 4 413--427

  42. [42]

    \ Loewenstein, Y

    shteingart2014reinforcement APACrefauthors Shteingart, H. \ Loewenstein, Y. APACrefauthors \ 2014 . Reinforcement learning and human behavior Reinforcement learning and human behavior . Current opinion in neurobiology 25 93--98

  43. [43]

    APACrefauthors \ 1937 12

    Spence1937-gr APACrefauthors Spence, K W. APACrefauthors \ 1937 12 . Experimental studies of learning and the higher mental processes in infra-human primates Experimental studies of learning and the higher mental processes in infra-human primates . Psychological bulletin 34 10 806--850 . APACrefDOI doi:10.1037/h0061498 APACrefDOI

  44. [44]

    & Price, K

    storn1997differential APACrefauthors Storn, R. \ Price, K. APACrefauthors \ 1997 . Differential Evolution--A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces Differential evolution--a simple and efficient heuristic for global optimization over continuous spaces . Journal of Global Optimization 11 4 341--359 . APACrefDOI doi:10...

  45. [45]

    APACrefauthors \ 1990

    Sutton_1990 APACrefauthors Sutton, R S. APACrefauthors \ 1990 . Integrated architectures for learning, planning, and reacting based on approximating dynamic programming Integrated architectures for learning, planning, and reacting based on approximating dynamic programming . Proceedings of the Seventh International Conference on Machine Learning Proceedin...

  46. [46]

    \ Barto, A G

    sutton2018reinforcement APACrefauthors Sutton, R S. \ Barto, A G. APACrefauthors \ 2018 . Reinforcement learning: An introduction Reinforcement learning: An introduction . MIT press

  47. [47]

    , Call, J

    tennie2010evidence APACrefauthors Tennie, C. , Call, J. \ Tomasello, M. APACrefauthors \ 2010 . Evidence for emulation in chimpanzees in social settings using the floating peanut task Evidence for emulation in chimpanzees in social settings using the floating peanut task . PLoS One 5 5 e10544

  48. [48]

    APACrefauthors \ 1999

    tomasello_cultural_1999 APACrefauthors Tomasello, M. APACrefauthors \ 1999 . The cultural origins of human cognition The cultural origins of human cognition . Cambridge, MA, US Harvard University Press . Pages: vi, 248

  49. [49]

    , Whalen, A

    toyokawa_social_2019 APACrefauthors Toyokawa, W. , Whalen, A. \ Laland, K N. APACrefauthors \ 2019 01 . Social learning strategies regulate the wisdom and madness of interactive crowds Social learning strategies regulate the wisdom and madness of interactive crowds . Nature Human Behaviour 3 2 183--193 . APACrefDOI doi:10.1038/s41562-018-0518-x APACrefDOI

  50. [50]

    , Tennie, C

    uchiyama2023model APACrefauthors Uchiyama, R. , Tennie, C. \ Wu, C M. APACrefauthors \ 2023 . Model-Based Assimilation Transmits and Recombines World Models Model-based assimilation transmits and recombines world models . L. Hunt, C. Summerfield, T. Konkle, E. Fedorenko \ T. Naselaris\ ( ), Proceedings of the 2023 Conference on Cognitive Computational Neu...

  51. [51]

    , Mandlekar, A

    urain2025survey APACrefauthors Urain, J. , Mandlekar, A. , Du, Y. , Muhammad, N. , Xu, D. , Fragkiadaki, K. others APACrefauthors \ 2025 . A Survey on Deep Generative Models for Robot Learning From Multimodal Demonstrations A survey on deep generative models for robot learning from multimodal demonstrations . IEEE Transactions on Robotics 42 60--79

  52. [52]

    , Chen, A M

    velez2023teachers APACrefauthors V \'e lez, N. , Chen, A M. , Burke, T. , Cushman, F A. \ Gershman, S J. APACrefauthors \ 2023 . Teachers recruit mentalizing regions to represent learners’ beliefs Teachers recruit mentalizing regions to represent learners’ beliefs . Proceedings of the National Academy of Sciences 120 22 e2215015120

  53. [53]

    , Meager, M R

    vikbladh2019hippocampal APACrefauthors Vikbladh, O M. , Meager, M R. , King, J. , Blackmon, K. , Devinsky, O. , Shohamy, D. Daw, N D. APACrefauthors \ 2019 . Hippocampal contributions to model-based planning and spatial memory Hippocampal contributions to model-based planning and spatial memory . Neuron 102 3 683--693

  54. [54]

    , Toyokawa, W

    witt2024flexible APACrefauthors Witt, A. , Toyokawa, W. , Lala, K N. , Gaissmaier, W. \ Wu, C M. APACrefauthors \ 2024 . Humans flexibly integrate social information despite interindividual differences in reward Humans flexibly integrate social information despite interindividual differences in reward . Proceedings of the National Academy of Sciences 121 ...

  55. [55]

    Adaptive mechanisms of social and asocial learning in immersive foraging environments , volume =

    wu2025adaptive APACrefauthors Wu, C M. , Deffner, D. , Kahl, B. , Meder, B. , Ho, M K. \ Kurvers, R H. APACrefauthors \ 2025 . Adaptive mechanisms of social and asocial learning in immersive foraging environments Adaptive mechanisms of social and asocial learning in immersive foraging environments . Nature Communications 16 3539 . APACrefDOI doi:10.1038/s...

  56. [56]

    , V \'e lez, N

    wu2022representational APACrefauthors Wu, C M. , V \'e lez, N. \ Cushman, F A. APACrefauthors \ 2022 . Representational exchange in human social learning: Balancing efficiency and flexibility Representational exchange in human social learning: Balancing efficiency and flexibility . I C. Dezza, E. Schulz \ C M. Wu\ ( ), The Drive for Knowledge: The Science...