arxiv: 2604.05777 · v2 · submitted 2026-04-07 · 💻 cs.AI

Emergent social transmission of model-based representations without inference

Silja Ke{\ss}ler , Miriam Bautista-Salinero , Claudio Tennie , Charley M. Wu This is my paper

Pith reviewed 2026-05-11 01:53 UTC · model grok-4.3

classification 💻 cs.AI

keywords social learningreinforcement learningmodel-based representationscultural transmissionheuristic learningexpert observationmentalizingsimulation

0 comments p. Extension

The pith

Simple observation of expert actions allows naive agents to acquire model-based representations without inferring mental states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates through reinforcement learning simulations that a naive agent searching for rewards can develop rich internal models of its environment by observing an expert's actions alone, without any inference about the expert's beliefs. These observations are used through basic heuristics that either guide which actions the learner tries or increase the value assigned to them, which in turn shapes the learner's experience and pulls its representations closer to the expert's. Model-based learners, who maintain an internal model of how the world works, gain the largest advantage from this exposure, reaching expert-like knowledge faster than agents that learn in isolation. If this holds, it indicates that flexible knowledge can spread culturally through minimal social mechanisms that piggyback on ordinary individual learning rather than requiring costly mentalizing.

Core claim

In simulations of a reconfigurable environment where agents search for rewards, a naive learner that either copies observed actions or boosts their values based on an expert's behavior develops internal representations that converge toward those of the expert. Model-based agents, which learn transition and reward models of the environment, show faster convergence and more expert-like models than model-free agents or solo learners, all without any mechanism for inferring the expert's mental states.

What carries the argument

Heuristic action selection or value boosting based on observed expert actions, which biases the learner's experience sampling to indirectly transmit model-based representations.

If this is right

Model-based learners acquire expert-like internal models of the environment faster when exposed to social cues than when learning alone.
Higher-level representations can be transmitted culturally through simple behavioral observation that exploits standard reinforcement learning updates.
Mentalizing or belief inference is not required for the spread of flexible, model-based knowledge in these simulated settings.
This form of transmission works across different environment configurations where rewards must be located.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bias mechanism could allow groups of agents to align their internal models over repeated interactions without explicit communication.
Artificial agents designed with model-based reinforcement learning may naturally support cultural-like knowledge sharing when given access to observed behavior.
Human social learning studies could test whether brief exposure to expert actions produces similar shifts in participants' causal models of a task.
Extending the environment to include noisy observations or multiple experts might reveal how robust the convergence remains under more realistic conditions.

Load-bearing premise

The specific rules for how the learner selects actions or boosts values from observed expert behavior are enough to capture the key social learning processes without extra mechanisms.

What would settle it

Run the same simulations but disable the heuristic action selection and value boosting rules; if the learner's representations no longer converge toward the expert's, the transmission mechanism fails.

Figures

Figures reproduced from arXiv: 2604.05777 by Charley M. Wu, Claudio Tennie, Miriam Bautista-Salinero, Silja Ke{\ss}ler.

**Figure 1.** Figure 1: Simulations. a) Grid-world environment composed of four quadrants with fixed walls and designated reward states. b) For each simulation, quadrants were randomly rotated and arranged, while rewards were randomly assigned to designated reward states. The simulations comprised a training phase with an observable expert agent (over-trained model-based RL), followed by a test phase without the expert. A differe… view at source ↗

**Figure 2.** Figure 2: Learning Strategies. Six learning conditions were tested, combining model-free (MF) and model-based (MB) RL with either asocial learning (AS) or social learning from an expert. Social learning was implemented at two levels: policy-based (DB) and value-based (VS). 1-10) and a test phase (episodes 11-20; Fig. 1b). During the training phase, learners were accompanied by an expert, an asocial model-based RL ag… view at source ↗

**Figure 3.** Figure 3: Experiment 1. a) Performance: mean cumulative reward for different learning strategies combining (a-)social learning (SL) and reinforcement learning (RL) strategies. The vertical dotted line represents the transition between training and test phase. b) Value transfer: mean correlation between learner’s and expert’s value function, grouped by distance to reward states. c) Belief transfer: mean correlation b… view at source ↗

**Figure 4.** Figure 4: Experiments 2 and 3. a) Exp. 2 performance. b) Exp. 2 value accuracy: mean correlation between the learner’s and the optimal value function for the modified reward structure (Eq. 7), grouped by distance to reward. c) Exp. 3 performance. d) Exp. 3 belief transfer comparison (MB only) between the baseline (x-axis) and new start location in the test phase (y-axis). Each point is a single simulation, averag… view at source ↗

read the original abstract

How do people acquire rich, flexible knowledge about their environment from others despite limited cognitive capacity? Humans are often thought to rely on computationally costly mentalizing, such as inferring others' beliefs. In contrast, cultural evolution emphasizes that behavioral transmission can be supported by simple social cues. Using reinforcement learning simulations, we show how minimal social learning can indirectly transmit higher-level representations. We simulate a na\"ive agent searching for rewards in a reconfigurable environment, learning either alone or by observing an expert - crucially, without inferring mental states. Instead, the learner heuristically selects actions or boosts value representations based on observed actions. Our results demonstrate that these cues bias the learner's experience, causing its representation to converge toward the expert's. Model-based learners benefit most from social exposure, showing faster learning and more expert-like representations. These findings show how cultural transmission can arise from simple, non-mentalizing processes exploiting asocial learning mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Simulations show model-based agents can converge on expert-like representations via simple action-based heuristics, but the no-inference claim hinges on whether those heuristics truly run on raw observations alone.

read the letter

The main thing to know is that this paper runs RL simulations in reconfigurable environments where a naive learner either explores solo or gets biased by an expert's actions through two minimal rules: preferring the observed action or boosting its value. The learner's internal representation ends up closer to the expert's, with model-based agents showing clearer and faster gains than model-free ones. No mentalizing or belief inference is involved on the learner's side. That distinction between model-based and model-free benefit is the clearest new piece here, and it lines up with the abstract's claim that asocial mechanisms can support cultural transmission indirectly. The setup is straightforward and the qualitative pattern is easy to follow from the description. The soft spot is exactly the one the stress-test flags. Value boosting from observed actions is not automatic in standard model-based RL; it usually needs either the expert's values, a shared transition model, or some mapping that assumes more than raw behavior. If the simulation supplies any of that behind the scenes, the convergence is not really emergent from pure observation. The abstract does not spell out the precise implementation or controls, so it is difficult to tell whether the result holds or is partly built into the code. The outcomes are also presented qualitatively with no mention of statistical tests or sensitivity checks. This is the kind of work that fits a computational social learning or cultural evolution reading group. Someone already thinking about RL models of transmission would get a concrete example to discuss, even if they end up disagreeing with how minimal the cues really are. It is worth sending to peer review because the question is live and the simulation route is a direct way to test it, but the referees will need to see the exact rules and any robustness checks before the central claim can be taken as settled.

Referee Report

2 major / 2 minor

Summary. The manuscript uses reinforcement learning simulations to show that a naive agent can acquire model-based representations converging toward an expert's through minimal social cues—specifically, heuristic action selection or value boosting based solely on observed expert actions—without mental state inference. These cues bias the learner's experience in a reconfigurable reward environment, with model-based learners exhibiting faster learning and more expert-like representations than model-free ones.

Significance. If the implementation details confirm that the heuristics operate from observable actions alone, the work is significant as a computational demonstration that cultural transmission of higher-level, flexible knowledge can emerge from simple, non-mentalizing processes that exploit standard asocial RL mechanisms. It provides a concrete alternative to inference-heavy accounts of social learning and could inform both cultural evolution theory and the design of observational learning in AI agents.

major comments (2)

[Methods] Methods section (value boosting and action selection rules): The paper must explicitly demonstrate that value boosting is computed exclusively from observable expert actions without supplying the learner with the expert's internal Q-values, transition model, or shared state representation. If the simulation provides any of these (as is common in standard model-based RL implementations), the reported convergence would be an artifact of the setup rather than emergent from raw behavioral cues, directly undermining the central claim of transmission 'without inference'.
[Results] Results section (model-based learner advantages): The qualitative claim that model-based learners benefit most (faster learning, more expert-like representations) lacks reported statistical tests, effect sizes, or ablation controls varying the heuristic parameters. Without these, it is unclear whether the differential benefit is robust or specific to the chosen action-selection and boosting rules, weakening support for the broader conclusion about model-based representations.

minor comments (2)

[Abstract] Abstract: The string 'naïve' is rendered with an encoding artifact ('naïve'); correct to standard spelling for clarity.
[Methods] Methods: All simulation parameters (learning rates, environment reconfiguration schedule, reward magnitudes, number of trials) should be listed explicitly in the main text or a table to enable full reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have prompted us to strengthen the clarity and rigor of our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Methods] Methods section (value boosting and action selection rules): The paper must explicitly demonstrate that value boosting is computed exclusively from observable expert actions without supplying the learner with the expert's internal Q-values, transition model, or shared state representation. If the simulation provides any of these (as is common in standard model-based RL implementations), the reported convergence would be an artifact of the setup rather than emergent from raw behavioral cues, directly undermining the central claim of transmission 'without inference'.

Authors: We agree that this distinction is crucial for the validity of our central claim. Our simulations are designed such that the learner only observes the expert's actions in each state, without any access to the expert's internal Q-values, transition model, or state representations. The value boosting mechanism simply increments the value estimate for the observed action by a fixed heuristic amount, and the action selection heuristic increases the probability of selecting the expert's action, both based purely on behavioral observation. No mental state inference or direct knowledge transfer occurs. To address the referee's concern, we will revise the Methods section to include explicit pseudocode and a statement confirming that all social cues derive exclusively from observable actions. This will eliminate any ambiguity regarding the implementation. revision: yes
Referee: [Results] Results section (model-based learner advantages): The qualitative claim that model-based learners benefit most (faster learning, more expert-like representations) lacks reported statistical tests, effect sizes, or ablation controls varying the heuristic parameters. Without these, it is unclear whether the differential benefit is robust or specific to the chosen action-selection and boosting rules, weakening support for the broader conclusion about model-based representations.

Authors: We acknowledge the value of quantitative support for our claims. While the current results show consistent patterns across multiple runs, we will enhance the Results section by adding appropriate statistical tests (such as independent t-tests comparing learning curves and representation similarity scores between conditions), reporting effect sizes, and including ablation analyses that vary the heuristic parameters (e.g., different boosting strengths and selection biases). These additions will demonstrate the robustness of the model-based advantage and will be presented in the main text and supplementary information. revision: yes

Circularity Check

0 steps flagged

No circularity: results emerge from forward simulation dynamics

full rationale

The paper reports outcomes from explicit reinforcement learning simulations of naive agents interacting with an environment and an expert's observable actions. Heuristic rules for action selection and value boosting are applied directly to generate experience; the convergence of model-based representations is an observed consequence of running these dynamics, not a quantity defined in terms of the target result or fitted to it. No equations, self-citations, or ansatzes are invoked that reduce the central claim to its own inputs by construction. The derivation chain is therefore self-contained against the simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the chosen heuristics for action copying and value boosting are representative of minimal social learning and that the simulated environments capture the relevant structure of reconfigurable real-world tasks. No explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Heuristic action selection and value boosting based on observed expert actions are sufficient to bias experience toward expert-like representations.
Invoked in the description of the learner's update rules without further justification in the abstract.

pith-pipeline@v0.9.0 · 5465 in / 1291 out tokens · 29433 ms · 2026-05-11T01:53:45.586539+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We simulate a naïve agent searching for rewards in a reconfigurable environment, learning either alone or by observing an expert—crucially, without inferring mental states. Instead, the learner heuristically selects actions or boosts value representations based on observed actions.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Model-based (MB) learning is implemented using Dyna-Q ... maintains an internal belief B about the environment—equivalent to a world model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

[1]

\ Dayan, P

antonov2025exploring APACrefauthors Antonov, G. \ Dayan, P. APACrefauthors \ 2025 . Exploring replay Exploring replay . Nature Communications 16 1 1657

work page 2025
[2]

Baker, Rebecca Saxe, and Joshua B

Baker_Saxe_Tenenbaum_2009 APACrefauthors Baker, C L. , Saxe, R. \ Tenenbaum, J B. APACrefauthors \ 2009 . Action understanding as inverse planning Action understanding as inverse planning . Cognition 113 3 329–349 . APACrefDOI doi:10.1016/j.cognition.2009.07.005 APACrefDOI

work page doi:10.1016/j.cognition.2009.07.005 2009
[3]

ve, adult, captive chimpanzees do not socially learn how to make and use sharp stone tools Na \

bandini2023naive APACrefauthors Bandini, E. \ Tennie, C. APACrefauthors \ 2023 . Na \" ve, adult, captive chimpanzees do not socially learn how to make and use sharp stone tools Na \" ve, adult, captive chimpanzees do not socially learn how to make and use sharp stone tools . Scientific Reports 13 1 22733

work page 2023
[4]

APACrefauthors \ 1957

Bellman1957 APACrefauthors Bellman, R. APACrefauthors \ 1957 . Dynamic Programming Dynamic programming \ ( 1 \ ). Princeton, NJ, USA Princeton University Press

work page 1957
[5]

\ Richerson, P J

boyd1988culture APACrefauthors Boyd, R. \ Richerson, P J. APACrefauthors \ 1988 . Culture and the evolutionary process Culture and the evolutionary process . University of Chicago press

work page 1988
[6]

\ Russon, A E

byrne_learning_1998 APACrefauthors Byrne, R W. \ Russon, A E. APACrefauthors \ 1998 10 . Learning by imitation: A hierarchical approach Learning by imitation: A hierarchical approach . Behavioral and Brain Sciences 21 5 667--684 . APACrefDOI doi:10.1017/S0140525X98001745 APACrefDOI

work page doi:10.1017/s0140525x98001745 1998
[7]

, Carpenter, M

call2005copying APACrefauthors Call, J. , Carpenter, M. \ Tomasello, M. APACrefauthors \ 2005 . Copying results and copying actions in the process of social learning: chimpanzees (Pan troglodytes) and human children (Homo sapiens) Copying results and copying actions in the process of social learning: chimpanzees (pan troglodytes) and human children (homo ...

work page 2005
[8]

, Pauli, W M

collette2017neural APACrefauthors Collette, S. , Pauli, W M. , Bossaerts, P. \ O'Doherty, J. APACrefauthors \ 2017 . Neural computations underlying inverse reinforcement learning in the human brain Neural computations underlying inverse reinforcement learning in the human brain . Elife 6 e29718

work page 2017
[9]

, Gershman, S J

daw2011model APACrefauthors Daw, N D. , Gershman, S J. , Seymour, B. , Dayan, P. \ Dolan, R J. APACrefauthors \ 2011 . Model-based influences on humans' choices and striatal prediction errors Model-based influences on humans' choices and striatal prediction errors . Neuron 69 6 1204--1215

work page 2011
[10]

\ Niv, Y

drummond2020model APACrefauthors Drummond, N. \ Niv, Y. APACrefauthors \ 2020 . Model-based decision making and model-free learning Model-based decision making and model-free learning . Current Biology 30 15 R860--R865

work page 2020
[11]

APACrefauthors \ 1988

Galef1988-bc APACrefauthors Galef, B G., Jr. APACrefauthors \ 1988 . Imitation in animals: History, definition and interpretation of the data from the psychological laboratory Imitation in animals: History, definition and interpretation of the data from the psychological laboratory . T R. Zentall\ B G. Galef Jr\ ( ), Social learning: Psychological and Bio...

work page 1988
[12]

, Kello, C T

garg2022individual APACrefauthors Garg, K. , Kello, C T. \ Smaldino, P E. APACrefauthors \ 2022 . Individual exploration and selective social learning: balancing exploration--exploitation trade-offs in collective foraging Individual exploration and selective social learning: balancing exploration--exploitation trade-offs in collective foraging . Journal o...

work page 2022
[13]

\ Schmidhuber, J

ha2018recurrent APACrefauthors Ha, D. \ Schmidhuber, J. APACrefauthors \ 2018 . Recurrent world models facilitate policy evolution Recurrent world models facilitate policy evolution . Advances in neural information processing systems 31

work page 2018
[14]

, Berg, J J

hackel2019model APACrefauthors Hackel, L M. , Berg, J J. , Lindstr \"o m, B R. \ Amodio, D M. APACrefauthors \ 2019 . Model-based and model-free social cognition: investigating the role of habit in social attitude formation and choice Model-based and model-free social cognition: investigating the role of habit in social attitude formation and choice . Fro...

work page 2019
[15]

, Mende-Siedlecki, P

hackel2020reinforcement APACrefauthors Hackel, L M. , Mende-Siedlecki, P. \ Amodio, D M. APACrefauthors \ 2020 . Reinforcement learning in social interaction: The distinguishing role of trait inference Reinforcement learning in social interaction: The distinguishing role of trait inference . Journal of Experimental Social Psychology 88 103948

work page 2020
[16]

, Pasukonis, J

hafner2025mastering APACrefauthors Hafner, D. , Pasukonis, J. , Ba, J. \ Lillicrap, T. APACrefauthors \ 2025 . Mastering diverse control tasks through world models Mastering diverse control tasks through world models . Nature 1--7

work page 2025
[17]

, Berdahl, A M

hawkins2023flexible APACrefauthors Hawkins, R D. , Berdahl, A M. , Pentland, A S. , Tenenbaum, J B. , Goodman, N D. \ Krafft, P. APACrefauthors \ 2023 . Flexible social inference facilitates targeted social learning when rewards are not observable Flexible social inference facilitates targeted social learning when rewards are not observable . Nature Human...

work page 2023
[18]

APACrefauthors \ 2018

heyes2018cognitive APACrefauthors Heyes, C. APACrefauthors \ 2018 . Cognitive gadgets: The cultural evolution of thinking Cognitive gadgets: The cultural evolution of thinking . Harvard University Press

work page 2018
[19]

\ Whiten, A

horner2005causal APACrefauthors Horner, V. \ Whiten, A. APACrefauthors \ 2005 . Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens) Causal knowledge and imitation/emulation switching in chimpanzees (pan troglodytes) and children (homo sapiens) . Animal cognition 8 3 164--181

work page 2005
[20]

APACrefauthors \ 2019

Jara-Ettinger_2019 APACrefauthors Jara-Ettinger, J. APACrefauthors \ 2019 . Theory of mind as inverse reinforcement learning Theory of mind as inverse reinforcement learning . Current Opinion in Behavioral Sciences 29 105–110

work page 2019
[21]

, Lucas, C G

jern2017people APACrefauthors Jern, A. , Lucas, C G. \ Kemp, C. APACrefauthors \ 2017 . People learn other people’s preferences through inverse decision-making People learn other people’s preferences through inverse decision-making . Cognition 168 46--64

work page 2017
[22]

\ Mesoudi, A

jimenez2019prestige APACrefauthors Jim \'e nez, \'A V. \ Mesoudi, A. APACrefauthors \ 2019 . Prestige-biased social learning: Current evidence and outstanding questions Prestige-biased social learning: Current evidence and outstanding questions . Palgrave Communications 5 1

work page 2019
[23]

APACrefauthors \ 2004 02

Laland2004-pq APACrefauthors Laland, K N. APACrefauthors \ 2004 02 . Social learning strategies Social learning strategies . Learning & Behavior 32 1 4--14 . APACrefDOI doi:10.3758/bf03196002 APACrefDOI

work page doi:10.3758/bf03196002 2004
[24]

\ Nielsen, M

legare2015imitation APACrefauthors Legare, C H. \ Nielsen, M. APACrefauthors \ 2015 . Imitation and innovation: The dual engines of cultural learning Imitation and innovation: The dual engines of cultural learning . Trends in cognitive sciences 19 11 688--699

work page 2015
[25]

, Littman, M L

lehnert2020reward APACrefauthors Lehnert, L. , Littman, M L. \ Frank, M J. APACrefauthors \ 2020 . Reward-predictive representations generalize across tasks in reinforcement learning Reward-predictive representations generalize across tasks in reinforcement learning . PLoS computational biology 16 10 e1008317

work page 2020
[26]

, Young, A G

lyons2007hidden APACrefauthors Lyons, D E. , Young, A G. \ Keil, F C. APACrefauthors \ 2007 . The hidden structure of overimitation The hidden structure of overimitation . Proceedings of the National Academy of Sciences 104 50 19751--19756

work page 2007
[27]

, Zhou, H

mantiuk2025curiosity APACrefauthors Mantiuk, F. , Zhou, H. \ Wu, C M. APACrefauthors \ 2025 . From Curiosity to Competence: How World Models Interact with the Dynamics of Exploration From curiosity to competence: How world models interact with the dynamics of exploration . A. Ruggeri, D. Barner, C. Walker \ N. Bramley\ ( ), Proceedings of the 47th Annual ...

work page doi:10.48550/arxiv.2507.08210 2025
[28]

, Bell, A V

mcelreath2008beyond APACrefauthors McElreath, R. , Bell, A V. , Efferson, C. , Lubell, M. , Richerson, P J. \ Waring, T. APACrefauthors \ 2008 . Beyond existence and aiming outside the laboratory: estimating frequency-dependent and pay-off-biased social learning strategies Beyond existence and aiming outside the laboratory: estimating frequency-dependent ...

work page 2008
[29]

APACrefauthors \ 2016

mesoudi2016cultural APACrefauthors Mesoudi, A. APACrefauthors \ 2016 . Cultural evolution: a review of theory, findings and controversies Cultural evolution: a review of theory, findings and controversies . Evolutionary biology 43 4 481--497

work page 2016
[30]

, Botvinick, M M

miller2017dorsal APACrefauthors Miller, K J. , Botvinick, M M. \ Brody, C D. APACrefauthors \ 2017 . Dorsal hippocampus contributes to model-based planning Dorsal hippocampus contributes to model-based planning . Nature neuroscience 20 9 1269--1276

work page 2017
[31]

, Bonnet, E

najar2020actions APACrefauthors Najar, A. , Bonnet, E. , Bahrami, B. \ Palminteri, S. APACrefauthors \ 2020 . The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning . PLoS biology 18 12 e3001028

work page 2020
[32]

, Bush, D

olafsdottir2018role APACrefauthors \'O lafsd \'o ttir, H F. , Bush, D. \ Barry, C. APACrefauthors \ 2018 . The role of hippocampal replay in memory and planning The role of hippocampal replay in memory and planning . Current Biology 28 1 R37--R50

work page 2018
[33]

, Knapska, E

olsson2020neural APACrefauthors Olsson, A. , Knapska, E. \ Lindstr \"o m, B. APACrefauthors \ 2020 . The neural and computational systems of social learning The neural and computational systems of social learning . Nature Reviews Neuroscience 21 4 197--212

work page 2020
[34]

, Lee, S W

o2015structure APACrefauthors O’Doherty, J P. , Lee, S W. \ McNamee, D. APACrefauthors \ 2015 . The structure of reinforcement-learning mechanisms in the human brain The structure of reinforcement-learning mechanisms in the human brain . Current Opinion in Behavioral Sciences 1 94--100

work page 2015
[35]

, Go \" ame, S

park2017integration APACrefauthors Park, S A. , Go \" ame, S. , O'Connor, D A. \ Dreher, J C. APACrefauthors \ 2017 . Integration of individual and social information for decision-making in groups of different sizes Integration of individual and social information for decision-making in groups of different sizes . PLoS Biology 15 6 e2001958

work page 2017
[36]

\ Schaal, S

peters2008reinforcement APACrefauthors Peters, J. \ Schaal, S. APACrefauthors \ 2008 . Reinforcement learning of motor skills with policy gradients Reinforcement learning of motor skills with policy gradients . Neural networks 21 4 682--697

work page 2008
[37]

, Boyd, R

rendell2010copy APACrefauthors Rendell, L. , Boyd, R. , Cownden, D. , Enquist, M. , Eriksson, K. , Feldman, M W. Laland, K N. APACrefauthors \ 2010 . Why copy others? Insights from the social learning strategies tournament Why copy others? insights from the social learning strategies tournament . Science 328 5975 208--213

work page 2010
[38]

, Fogarty, L

rendell2010rogers APACrefauthors Rendell, L. , Fogarty, L. \ Laland, K N. APACrefauthors \ 2010 . ROGERS’PARADOX RECAST AND RESOLVED: POPULATION STRUCTURE AND THE EVOLUTION OF SOCIAL LEARNING STRATEGIES Rogers’paradox recast and resolved: Population structure and the evolution of social learning strategies . Evolution 64 2 534--548

work page 2010
[39]

, Bolic, M

roberts2025environmental APACrefauthors Roberts-Gaal, X. , Bolic, M. \ Cushman, F A. APACrefauthors \ 2025 . Environmental variability shapes the representational format of cultural learning Environmental variability shapes the representational format of cultural learning . Proceedings of the National Academy of Sciences 122 28 e2505283122

work page 2025
[40]

APACrefauthors \ 1998 07

Russell_1998 APACrefauthors Russell, S. APACrefauthors \ 1998 07 . Learning agents for uncertain environments Learning agents for uncertain environments . ( 101–103). Madison Wisconsin USA ACM . APACrefDOI doi:10.1145/279943.279964 APACrefDOI

work page doi:10.1145/279943.279964 1998
[41]

, Sakai, Y

sato2023state APACrefauthors Sato, Y. , Sakai, Y. \ Hirata, S. APACrefauthors \ 2023 . State-transition-free reinforcement learning in chimpanzees (Pan troglodytes) State-transition-free reinforcement learning in chimpanzees (pan troglodytes) . Learning & behavior 51 4 413--427

work page 2023
[42]

\ Loewenstein, Y

shteingart2014reinforcement APACrefauthors Shteingart, H. \ Loewenstein, Y. APACrefauthors \ 2014 . Reinforcement learning and human behavior Reinforcement learning and human behavior . Current opinion in neurobiology 25 93--98

work page 2014
[43]

APACrefauthors \ 1937 12

Spence1937-gr APACrefauthors Spence, K W. APACrefauthors \ 1937 12 . Experimental studies of learning and the higher mental processes in infra-human primates Experimental studies of learning and the higher mental processes in infra-human primates . Psychological bulletin 34 10 806--850 . APACrefDOI doi:10.1037/h0061498 APACrefDOI

work page doi:10.1037/h0061498 1937
[44]

& Price, K

storn1997differential APACrefauthors Storn, R. \ Price, K. APACrefauthors \ 1997 . Differential Evolution--A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces Differential evolution--a simple and efficient heuristic for global optimization over continuous spaces . Journal of Global Optimization 11 4 341--359 . APACrefDOI doi:10...

work page doi:10.1023/a:1008202821328 1997
[45]

APACrefauthors \ 1990

Sutton_1990 APACrefauthors Sutton, R S. APACrefauthors \ 1990 . Integrated architectures for learning, planning, and reacting based on approximating dynamic programming Integrated architectures for learning, planning, and reacting based on approximating dynamic programming . Proceedings of the Seventh International Conference on Machine Learning Proceedin...

work page 1990
[46]

\ Barto, A G

sutton2018reinforcement APACrefauthors Sutton, R S. \ Barto, A G. APACrefauthors \ 2018 . Reinforcement learning: An introduction Reinforcement learning: An introduction . MIT press

work page 2018
[47]

, Call, J

tennie2010evidence APACrefauthors Tennie, C. , Call, J. \ Tomasello, M. APACrefauthors \ 2010 . Evidence for emulation in chimpanzees in social settings using the floating peanut task Evidence for emulation in chimpanzees in social settings using the floating peanut task . PLoS One 5 5 e10544

work page 2010
[48]

APACrefauthors \ 1999

tomasello_cultural_1999 APACrefauthors Tomasello, M. APACrefauthors \ 1999 . The cultural origins of human cognition The cultural origins of human cognition . Cambridge, MA, US Harvard University Press . Pages: vi, 248

work page 1999
[49]

, Whalen, A

toyokawa_social_2019 APACrefauthors Toyokawa, W. , Whalen, A. \ Laland, K N. APACrefauthors \ 2019 01 . Social learning strategies regulate the wisdom and madness of interactive crowds Social learning strategies regulate the wisdom and madness of interactive crowds . Nature Human Behaviour 3 2 183--193 . APACrefDOI doi:10.1038/s41562-018-0518-x APACrefDOI

work page doi:10.1038/s41562-018-0518-x 2019
[50]

, Tennie, C

uchiyama2023model APACrefauthors Uchiyama, R. , Tennie, C. \ Wu, C M. APACrefauthors \ 2023 . Model-Based Assimilation Transmits and Recombines World Models Model-based assimilation transmits and recombines world models . L. Hunt, C. Summerfield, T. Konkle, E. Fedorenko \ T. Naselaris\ ( ), Proceedings of the 2023 Conference on Cognitive Computational Neu...

work page doi:10.31234/osf.io/v69jy 2023
[51]

, Mandlekar, A

urain2025survey APACrefauthors Urain, J. , Mandlekar, A. , Du, Y. , Muhammad, N. , Xu, D. , Fragkiadaki, K. others APACrefauthors \ 2025 . A Survey on Deep Generative Models for Robot Learning From Multimodal Demonstrations A survey on deep generative models for robot learning from multimodal demonstrations . IEEE Transactions on Robotics 42 60--79

work page 2025
[52]

, Chen, A M

velez2023teachers APACrefauthors V \'e lez, N. , Chen, A M. , Burke, T. , Cushman, F A. \ Gershman, S J. APACrefauthors \ 2023 . Teachers recruit mentalizing regions to represent learners’ beliefs Teachers recruit mentalizing regions to represent learners’ beliefs . Proceedings of the National Academy of Sciences 120 22 e2215015120

work page 2023
[53]

, Meager, M R

vikbladh2019hippocampal APACrefauthors Vikbladh, O M. , Meager, M R. , King, J. , Blackmon, K. , Devinsky, O. , Shohamy, D. Daw, N D. APACrefauthors \ 2019 . Hippocampal contributions to model-based planning and spatial memory Hippocampal contributions to model-based planning and spatial memory . Neuron 102 3 683--693

work page 2019
[54]

, Toyokawa, W

witt2024flexible APACrefauthors Witt, A. , Toyokawa, W. , Lala, K N. , Gaissmaier, W. \ Wu, C M. APACrefauthors \ 2024 . Humans flexibly integrate social information despite interindividual differences in reward Humans flexibly integrate social information despite interindividual differences in reward . Proceedings of the National Academy of Sciences 121 ...

work page doi:10.1073/pnas.2404928121 2024
[55]

Adaptive mechanisms of social and asocial learning in immersive foraging environments , volume =

wu2025adaptive APACrefauthors Wu, C M. , Deffner, D. , Kahl, B. , Meder, B. , Ho, M K. \ Kurvers, R H. APACrefauthors \ 2025 . Adaptive mechanisms of social and asocial learning in immersive foraging environments Adaptive mechanisms of social and asocial learning in immersive foraging environments . Nature Communications 16 3539 . APACrefDOI doi:10.1038/s...

work page doi:10.1038/s41467-025-58365-6 2025
[56]

, V \'e lez, N

wu2022representational APACrefauthors Wu, C M. , V \'e lez, N. \ Cushman, F A. APACrefauthors \ 2022 . Representational exchange in human social learning: Balancing efficiency and flexibility Representational exchange in human social learning: Balancing efficiency and flexibility . I C. Dezza, E. Schulz \ C M. Wu\ ( ), The Drive for Knowledge: The Science...

work page 2022