Reasoning and Generalization in RL: A Tool Use Perspective

Dan Saunders; Jim Fleming; Mike Qiu; Sam Wenke

arxiv: 1907.02050 · v1 · pith:ZIIJVPOJnew · submitted 2019-07-03 · 💻 cs.NE · cs.AI· cs.LG

Reasoning and Generalization in RL: A Tool Use Perspective

Sam Wenke , Dan Saunders , Mike Qiu , Jim Fleming This is my paper

Pith reviewed 2026-05-25 09:17 UTC · model grok-4.3

classification 💻 cs.NE cs.AIcs.LG

keywords reinforcement learninggeneralizationtool usetrap-tube tasktransfer learningbenchmark evaluationagent testing

0 comments

The pith

Reinforcement learning generalization is measured using multiple test sets created by transfers inspired by the trap-tube task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current RL evaluation often relies on a single test set of environments, which fails to isolate specific forms of generalization. Instead, it proposes transfers drawn from animal and human tool-use studies, such as the trap-tube task, that generate several distinct test sets. Each set targets a particular generalization ability demonstrated by biological tool users. A reader would care because this setup could reveal whether agents learn reusable mechanisms for novel situations rather than memorizing patterns from training.

Core claim

We study tool use in the context of reinforcement learning and propose a framework for analyzing generalization inspired by a classic study of tool using behavior, the trap-tube task. Recently, it has become common in reinforcement learning to measure generalization performance on a single test set of environments. We instead propose transfers that produce multiple test sets that are used to measure specified types of generalization, inspired by abilities demonstrated by animal and human tool users.

What carries the argument

Transfers inspired by the trap-tube task that generate multiple test sets for isolating distinct generalization types in RL agents.

If this is right

RL agents can be tested for whether they acquire the underlying mechanisms of tool use rather than task-specific solutions.
Different forms of generalization become separable and measurable instead of collapsed into one aggregate score.
Evaluation protocols can be extended to other domains by designing analogous transfers that produce targeted test sets.
The source environments and transfer code enable direct reproduction and extension of the test sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transfer approach could be adapted to create benchmarks that test causal reasoning or planning in non-tool domains.
If the multiple test sets prove more diagnostic, standard RL leaderboards might shift from single-holdout evaluation to families of related test sets.
Robotic implementations of the trap-tube transfers could provide a bridge between simulated RL agents and physical tool-use experiments.

Load-bearing premise

Generalization patterns observed in animal and human tool-use studies provide a valid model for creating and interpreting test sets for RL agents.

What would settle it

A comparison experiment in which agents trained under the proposed transfers show no measurable difference in performance patterns across the multiple test sets compared with agents evaluated on a single combined test set.

Figures

Figures reproduced from arXiv: 1907.02050 by Dan Saunders, Jim Fleming, Mike Qiu, Sam Wenke.

**Figure 2.** Figure 2: States of a structural trap-tube environment. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: States of a symbolic trap-tube environment. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Example training (left) and evaluation (right) reward curves averaged over the batch [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Mean ± 1 standard deviation and maximum training and evaluation performance curves from training 5 PPO + ICM agents to solve tasks from {FP , FSt, FSy}. Algorithm {FP } {FSt} {FSy} {FP , FSt} {FP , FSy} {FSt, FSy} {FP , FSt, FSy} PPO 0% 0% 35.5% ± 10.9% 0% 7% ± 3.6% 19.4% ± 3.2% 4.4% ± 3.2% PPO + ICM 2% ± 3% 0% 40.2% ± 16.9% 0% 29.7% ± 8.1% 33.1% ± 19.7% 24.4% ± 9% [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Learning to use tools to solve a variety of tasks is an innate ability of humans and has been observed of animals in the wild. However, the underlying mechanisms that are required to learn to use tools are abstract and widely contested in the literature. In this paper, we study tool use in the context of reinforcement learning and propose a framework for analyzing generalization inspired by a classic study of tool using behavior, the trap-tube task. Recently, it has become common in reinforcement learning to measure generalization performance on a single test set of environments. We instead propose transfers that produce multiple test sets that are used to measure specified types of generalization, inspired by abilities demonstrated by animal and human tool users. The source code to reproduce our experiments is publicly available at https://github.com/fomorians/gym_tool_use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Proposal for multi-test-set generalization testing in RL tool use lacks results but has a clear methodological idea.

read the letter

The paper's main contribution is a proposal for a framework that uses transfers inspired by the trap-tube task to generate multiple test sets in reinforcement learning environments. These test sets are meant to measure specified types of generalization in agents learning tool use. What is actually new here is the mapping of that classic task to create distinct test sets for multi-set generalization testing in RL. Prior work has looked at transfer and multi-environment testing, but this specific framing around tool use and defined generalization axes appears fresh. The paper does well in highlighting the issue with relying on a single test set and in making the source code available for others to try. It keeps the focus on practical construction of test environments rather than abstract theory. There are no load-bearing flaws in the stated approach. The central claim is methodological, and as the stress-test notes, it does not require the animal tool-use studies to serve as a direct model for RL agents. It only needs that the transfers can be defined and the test sets labeled by generalization type. No internal contradictions or unsupported derivations are apparent. The soft spot, and it is significant given the current state, is that the abstract presents this as a proposal without any data, derivations, or experimental results. We cannot yet see if the framework actually measures the intended generalization types or how the transfers are implemented in code. This makes it difficult to assess the practical value. This work is for people in the RL community who are building or evaluating generalization benchmarks, especially around structured tasks like tool use. It could provide value to someone designing new test suites. I would recommend engaging with it in peer review. The idea has enough structure to be worth referee feedback, particularly on how to validate the test sets, even though it is early stage.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a framework for evaluating generalization in RL agents on tool-use tasks. Drawing inspiration from the trap-tube task in animal cognition studies, it advocates defining transfer operators that generate multiple distinct test sets, each intended to isolate a specified generalization type, rather than relying on performance on a single test set. Publicly available code is provided to support the environments.

Significance. If the transfers can be shown to isolate the claimed generalization axes, the framework would offer a methodological advance over single-test-set evaluation practices common in RL. The public release of the code is a clear strength for reproducibility.

major comments (1)

[Abstract and framework description] The central claim that the proposed transfers produce test sets measuring specified generalization types is not supported by any derivation, construction details, or empirical results in the manuscript; without this, it is impossible to verify that the test sets achieve the intended isolation (abstract and framework description).

minor comments (1)

[Introduction] The relationship between the trap-tube task and the RL environments could be stated more precisely to avoid any implication of direct equivalence.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and plan to revise the manuscript to strengthen the presentation of the framework.

read point-by-point responses

Referee: [Abstract and framework description] The central claim that the proposed transfers produce test sets measuring specified generalization types is not supported by any derivation, construction details, or empirical results in the manuscript; without this, it is impossible to verify that the test sets achieve the intended isolation (abstract and framework description).

Authors: We agree that the manuscript would be improved by providing more explicit details on the construction of the transfers. The current version describes the high-level inspiration from the trap-tube task and defines the transfers at a conceptual level, but does not include formal derivations or step-by-step construction procedures for each test set. In the revised manuscript we will add a new subsection under the framework description that formally defines each transfer operator, specifies the exact modifications made to generate the test environments, and explains the intended isolation of each generalization axis. We will also include a small set of illustrative examples and, where feasible, empirical checks confirming that performance differences align with the claimed axes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual proposal with no derivations or load-bearing self-citations

full rationale

The paper presents a methodological framework for constructing multiple test sets via transfer operators to isolate generalization types in RL, drawing inspirational source material from the trap-tube task in animal studies. No equations, fitted parameters, derivations, or uniqueness theorems appear anywhere in the manuscript. The central construction (defining transfers that generate labeled test sets) is self-contained and does not reduce to any input by definition, self-citation chain, or renaming of prior results; the animal studies serve only as motivation rather than a required equivalence or load-bearing premise. No self-citations are invoked to justify core claims, and the work is externally falsifiable by whether the proposed test sets can be implemented and labeled as described.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the paper introduces no free parameters, axioms, or invented entities; it is a high-level proposal for an evaluation framework.

pith-pipeline@v0.9.0 · 5665 in / 1038 out tokens · 46741 ms · 2026-05-25T09:17:40.057740+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 10 internal anchors

[1]

The tale of the ﬁnch: adaptive radia- tion and behavioural ﬂexibility

Sabine Tebbich, Kim Sterelny, and Irmgard Teschke. The tale of the ﬁnch: adaptive radia- tion and behavioural ﬂexibility. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1543):1099–1109, April 2010

work page 2010
[2]

Krutzen, J

M. Krutzen, J. Mann, M. R. Heithaus, R. C. Connor, L. Bejder, and W. B. Sherwin. Cultural transmission of tool use in bottlenose dolphins.Proceedings of the National Academy of Sciences, 102(25):8939–8943, June 2005

work page 2005
[3]

First observation of tool use in wild gorillas

Thomas Breuer, Mireille Ndoundou-Hockemba, and Vicki Fishlock. First observation of tool use in wild gorillas. PLoS Biology, 3(11):e380, October 2005

work page 2005
[4]

Teschke, C

I. Teschke, C. A. F. Wascher, M. F. Scriba, A. M. P. von Bayern, V. Huml, B. Siemers, and S. Tebbich. Did tool-use evolve with enhanced physical cognitive abilities? Philosophical Transactions of the Royal Society B: Biological Sciences , 368(1630):20120418–20120418, Octo- ber 2013

work page 2013
[5]

Lack of comprehension of cause-eﬀect relations in tool-using capuchin monkeys (cebus apella)

Elisabetta Visalberghi and Luca Limongelli. Lack of comprehension of cause-eﬀect relations in tool-using capuchin monkeys (cebus apella). Journal of Comparative Psychology, 108(1):15–22, 1994

work page 1994
[6]

Tool Use and Causal Cognition

Teresa McCormack, Christoph Hoerl, and Stephen Butterﬁll, editors. Tool Use and Causal Cognition. Oxford University Press, August 2011

work page 2011
[7]

Reaux and Daniel J

James E. Reaux and Daniel J. Povinelli. The trap-tube problem. In Folk Physics for Apes , pages 108–131. Oxford University Press, May 2003

work page 2003
[8]

Povinelli and Derek C

Daniel J. Povinelli and Derek C. Penn. Through a ﬂoppy tool darkly. In Tool Use and Causal Cognition, pages 69–88. Oxford University Press, August 2011

work page 2011
[9]

Penn, Keith J

Derek C. Penn, Keith J. Holyoak, and Daniel J. Povinelli. Darwin’s mistake: Explaining the discontinuity between human and nonhuman minds. Behavioral and Brain Sciences, 31(2):109– 130, April 2008

work page 2008
[10]

Miller, and Karl S

Judy S DeLoache, Kevin F. Miller, and Karl S. Rosengren. The credible shrinking room: Very young children’s performance with symbolic and nonsymbolic relations. Psychological Science, 8(4):308–313, July 1997

work page 1997
[11]

Animal tool-use

Amanda Seed and Richard Byrne. Animal tool-use. Current Biology, 20(23):R1032–R1039, dec 2010

work page 2010
[12]

Li and S.S

Z. Li and S.S. Sastry. Task-oriented optimal grasping by multiﬁngered robot hands. IEEE Journal on Robotics and Automation , 4(1):32–44, 1988

work page 1988
[13]

K.B. Shimoga. Robot grasp synthesis algorithms: A survey. The International Journal of Robotics Research, 15(3):230–266, June 1996

work page 1996
[14]

Cooperative manipulation of objects by multiple mobile robots with tools *

Atsushi Yamashita, Jun Sasaki, Jun Ota, and Tamio Arai. Cooperative manipulation of objects by multiple mobile robots with tools *. 1998. 10

work page 1998
[15]

Gupta, C.J.J

S.K. Gupta, C.J.J. Paredis, and P.F. Brown. Micro planning for mechanical assembly opera- tions. In Proceedings. 1998 IEEE ICRA (Cat. No.98CH36146) . IEEE

work page 1998
[16]

Halperin, J.-C

D. Halperin, J.-C. Latombe, and R. H. Wilson. A general framework for assembly planning: The motion space approach. Algorithmica, 26(3-4):577–601, March 2000

work page 2000
[17]

Stoytchev

A. Stoytchev. Behavior-grounded representation of tool aﬀordances. In Proceedings of the 2005 IEEE ICRA. IEEE

work page 2005
[18]

Tool use and learning in robots

Solly Brown and Claude Sammut. Tool use and learning in robots. In Encyclopedia of the Sciences of Learning, pages 3327–3330. Springer US, 2012

work page 2012
[19]

Relational tool use learning by a robot in a real and simulated world

Handy Wicaksono and Claude Sammut. Relational tool use learning by a robot in a real and simulated world. 2016

work page 2016
[20]

Towards a relational approach for tool creation by robots

Handy Wicaksono. Towards a relational approach for tool creation by robots. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence . International Joint Conferences on Artiﬁcial Intelligence Organization, August 2017

work page 2017
[21]

Knepper, and Ashutosh Saxena

Ian Lenz, Ross A. Knepper, and Ashutosh Saxena. Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems , 2015

work page 2015
[22]

Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, and Silvio Savarese. Learning task-oriented grasping for tool manipulation from simulated self-supervision. CoRR, abs/1806.09266, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. CoRR, abs/1709.10087, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

Self-Supervised Visual Planning with Temporal Skip Connections

Frederik Ebert, Chelsea Finn, Alex X. Lee, and Sergey Levine. Self-supervised visual planning with temporal skip connections. CoRR, abs/1710.05268, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex X. Lee, and Sergey Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. CoRR, abs/1812.00568, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight

Annie Xie, Frederik Ebert, Sergey Levine, and Chelsea Finn. Improvisation through physical understanding: Using novel objects as tools with visual foresight. CoRR, abs/1904.05538, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[27]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018

work page 2018
[28]

Generalization and regularization in dqn

Jesse Farebrother, Marlos C. Machado, and Michael Bowling. Generalization and regularization in DQN. CoRR, abs/1810.00123, 2018

work page arXiv 2018
[29]

Assessing Generalization in Deep Reinforcement Learning

Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr¨ ahenb¨ uhl, Vladlen Koltun, and Dawn Song. Assessing generalization in deep reinforcement learning. CoRR, abs/1810.12282, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Quantifying Generalization in Reinforcement Learning

Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schulman. Quantifying generalization in reinforcement learning. CoRR, abs/1812.02341, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Robert St Amant and Thomas E. Horton. Revisiting the deﬁnition of animal tool use. Animal Behaviour, 75(4):1199–1208, apr 2008

work page 2008
[32]

Alex Kacelnik, Jackie Chappell, Ben Kenward, and Alex A. S. Weir. Cognitive adaptations for tool-related behavior in new caledonian crows. In Comparative CognitionExperimental Explo- rations of Animal Intelligence , pages 515–528. Oxford University Press, April 2009. 11

work page 2009
[33]

Seed, Josep Call, Nathan J

Amanda M. Seed, Josep Call, Nathan J. Emery, and Nicola S. Clayton. Chimpanzees solve the trap problem when the confound of tool-use is removed. Journal of Experimental Psychology: Animal Behavior Processes, 35(1):23–34, 2009

work page 2009
[34]

Causal knowledge in corvids, primates, and children

Amanda Seed, Daniel Hanus, and Josep Call. Causal knowledge in corvids, primates, and children. In Tool Use and Causal Cognition , pages 89–110. Oxford University Press, August 2011

work page 2011
[35]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven explo- ration by self-supervised prediction. CoRR, abs/1705.05363, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

KyungHyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. CoRR, abs/1409.1259, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[38]

Tools for the body (schema)

Angelo Maravita and Atsushi Iriki. Tools for the body (schema). Trends in Cognitive Sciences, 8(2):79 – 86, 2004

work page 2004
[39]

Innocent Killers

Hugo Van Lawick and Jane Goodall. Innocent Killers. Houghton Miﬄin, 1971

work page 1971
[40]

The evolution of the use of tools by feeding animals

John Alcock. The evolution of the use of tools by feeding animals. Evolution, 26(3):464–473, 1972

work page 1972
[41]

Benjamin B. Beck. Animal Tool Behavior: The Use and Manufacture of Tools by Animals . Garland STPM Press, 1980

work page 1980
[42]

Taylor, Gavin R

Alex H. Taylor, Gavin R. Hunt, Jennifer C. Holzhaider, and Russell D. Gray. Spontaneous metatool use by new caledonian crows. Current Biology, 17(17):1504–1507, September 2007. 12 A Appendix: Deﬁnitions A.1 Tool Use Although there are many proposed tool use deﬁnitions, in this paper we have decided that the Amant and Horton [31] deﬁnition is most represen...

work page 2007

[1] [1]

The tale of the ﬁnch: adaptive radia- tion and behavioural ﬂexibility

Sabine Tebbich, Kim Sterelny, and Irmgard Teschke. The tale of the ﬁnch: adaptive radia- tion and behavioural ﬂexibility. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1543):1099–1109, April 2010

work page 2010

[2] [2]

Krutzen, J

M. Krutzen, J. Mann, M. R. Heithaus, R. C. Connor, L. Bejder, and W. B. Sherwin. Cultural transmission of tool use in bottlenose dolphins.Proceedings of the National Academy of Sciences, 102(25):8939–8943, June 2005

work page 2005

[3] [3]

First observation of tool use in wild gorillas

Thomas Breuer, Mireille Ndoundou-Hockemba, and Vicki Fishlock. First observation of tool use in wild gorillas. PLoS Biology, 3(11):e380, October 2005

work page 2005

[4] [4]

Teschke, C

I. Teschke, C. A. F. Wascher, M. F. Scriba, A. M. P. von Bayern, V. Huml, B. Siemers, and S. Tebbich. Did tool-use evolve with enhanced physical cognitive abilities? Philosophical Transactions of the Royal Society B: Biological Sciences , 368(1630):20120418–20120418, Octo- ber 2013

work page 2013

[5] [5]

Lack of comprehension of cause-eﬀect relations in tool-using capuchin monkeys (cebus apella)

Elisabetta Visalberghi and Luca Limongelli. Lack of comprehension of cause-eﬀect relations in tool-using capuchin monkeys (cebus apella). Journal of Comparative Psychology, 108(1):15–22, 1994

work page 1994

[6] [6]

Tool Use and Causal Cognition

Teresa McCormack, Christoph Hoerl, and Stephen Butterﬁll, editors. Tool Use and Causal Cognition. Oxford University Press, August 2011

work page 2011

[7] [7]

Reaux and Daniel J

James E. Reaux and Daniel J. Povinelli. The trap-tube problem. In Folk Physics for Apes , pages 108–131. Oxford University Press, May 2003

work page 2003

[8] [8]

Povinelli and Derek C

Daniel J. Povinelli and Derek C. Penn. Through a ﬂoppy tool darkly. In Tool Use and Causal Cognition, pages 69–88. Oxford University Press, August 2011

work page 2011

[9] [9]

Penn, Keith J

Derek C. Penn, Keith J. Holyoak, and Daniel J. Povinelli. Darwin’s mistake: Explaining the discontinuity between human and nonhuman minds. Behavioral and Brain Sciences, 31(2):109– 130, April 2008

work page 2008

[10] [10]

Miller, and Karl S

Judy S DeLoache, Kevin F. Miller, and Karl S. Rosengren. The credible shrinking room: Very young children’s performance with symbolic and nonsymbolic relations. Psychological Science, 8(4):308–313, July 1997

work page 1997

[11] [11]

Animal tool-use

Amanda Seed and Richard Byrne. Animal tool-use. Current Biology, 20(23):R1032–R1039, dec 2010

work page 2010

[12] [12]

Li and S.S

Z. Li and S.S. Sastry. Task-oriented optimal grasping by multiﬁngered robot hands. IEEE Journal on Robotics and Automation , 4(1):32–44, 1988

work page 1988

[13] [13]

K.B. Shimoga. Robot grasp synthesis algorithms: A survey. The International Journal of Robotics Research, 15(3):230–266, June 1996

work page 1996

[14] [14]

Cooperative manipulation of objects by multiple mobile robots with tools *

Atsushi Yamashita, Jun Sasaki, Jun Ota, and Tamio Arai. Cooperative manipulation of objects by multiple mobile robots with tools *. 1998. 10

work page 1998

[15] [15]

Gupta, C.J.J

S.K. Gupta, C.J.J. Paredis, and P.F. Brown. Micro planning for mechanical assembly opera- tions. In Proceedings. 1998 IEEE ICRA (Cat. No.98CH36146) . IEEE

work page 1998

[16] [16]

Halperin, J.-C

D. Halperin, J.-C. Latombe, and R. H. Wilson. A general framework for assembly planning: The motion space approach. Algorithmica, 26(3-4):577–601, March 2000

work page 2000

[17] [17]

Stoytchev

A. Stoytchev. Behavior-grounded representation of tool aﬀordances. In Proceedings of the 2005 IEEE ICRA. IEEE

work page 2005

[18] [18]

Tool use and learning in robots

Solly Brown and Claude Sammut. Tool use and learning in robots. In Encyclopedia of the Sciences of Learning, pages 3327–3330. Springer US, 2012

work page 2012

[19] [19]

Relational tool use learning by a robot in a real and simulated world

Handy Wicaksono and Claude Sammut. Relational tool use learning by a robot in a real and simulated world. 2016

work page 2016

[20] [20]

Towards a relational approach for tool creation by robots

Handy Wicaksono. Towards a relational approach for tool creation by robots. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence . International Joint Conferences on Artiﬁcial Intelligence Organization, August 2017

work page 2017

[21] [21]

Knepper, and Ashutosh Saxena

Ian Lenz, Ross A. Knepper, and Ashutosh Saxena. Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems , 2015

work page 2015

[22] [22]

Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, and Silvio Savarese. Learning task-oriented grasping for tool manipulation from simulated self-supervision. CoRR, abs/1806.09266, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. CoRR, abs/1709.10087, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

Self-Supervised Visual Planning with Temporal Skip Connections

Frederik Ebert, Chelsea Finn, Alex X. Lee, and Sergey Levine. Self-supervised visual planning with temporal skip connections. CoRR, abs/1710.05268, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex X. Lee, and Sergey Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. CoRR, abs/1812.00568, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight

Annie Xie, Frederik Ebert, Sergey Levine, and Chelsea Finn. Improvisation through physical understanding: Using novel objects as tools with visual foresight. CoRR, abs/1904.05538, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[27] [27]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018

work page 2018

[28] [28]

Generalization and regularization in dqn

Jesse Farebrother, Marlos C. Machado, and Michael Bowling. Generalization and regularization in DQN. CoRR, abs/1810.00123, 2018

work page arXiv 2018

[29] [29]

Assessing Generalization in Deep Reinforcement Learning

Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr¨ ahenb¨ uhl, Vladlen Koltun, and Dawn Song. Assessing generalization in deep reinforcement learning. CoRR, abs/1810.12282, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Quantifying Generalization in Reinforcement Learning

Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schulman. Quantifying generalization in reinforcement learning. CoRR, abs/1812.02341, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Robert St Amant and Thomas E. Horton. Revisiting the deﬁnition of animal tool use. Animal Behaviour, 75(4):1199–1208, apr 2008

work page 2008

[32] [32]

Alex Kacelnik, Jackie Chappell, Ben Kenward, and Alex A. S. Weir. Cognitive adaptations for tool-related behavior in new caledonian crows. In Comparative CognitionExperimental Explo- rations of Animal Intelligence , pages 515–528. Oxford University Press, April 2009. 11

work page 2009

[33] [33]

Seed, Josep Call, Nathan J

Amanda M. Seed, Josep Call, Nathan J. Emery, and Nicola S. Clayton. Chimpanzees solve the trap problem when the confound of tool-use is removed. Journal of Experimental Psychology: Animal Behavior Processes, 35(1):23–34, 2009

work page 2009

[34] [34]

Causal knowledge in corvids, primates, and children

Amanda Seed, Daniel Hanus, and Josep Call. Causal knowledge in corvids, primates, and children. In Tool Use and Causal Cognition , pages 89–110. Oxford University Press, August 2011

work page 2011

[35] [35]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[36] [36]

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven explo- ration by self-supervised prediction. CoRR, abs/1705.05363, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

KyungHyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. CoRR, abs/1409.1259, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[38] [38]

Tools for the body (schema)

Angelo Maravita and Atsushi Iriki. Tools for the body (schema). Trends in Cognitive Sciences, 8(2):79 – 86, 2004

work page 2004

[39] [39]

Innocent Killers

Hugo Van Lawick and Jane Goodall. Innocent Killers. Houghton Miﬄin, 1971

work page 1971

[40] [40]

The evolution of the use of tools by feeding animals

John Alcock. The evolution of the use of tools by feeding animals. Evolution, 26(3):464–473, 1972

work page 1972

[41] [41]

Benjamin B. Beck. Animal Tool Behavior: The Use and Manufacture of Tools by Animals . Garland STPM Press, 1980

work page 1980

[42] [42]

Taylor, Gavin R

Alex H. Taylor, Gavin R. Hunt, Jennifer C. Holzhaider, and Russell D. Gray. Spontaneous metatool use by new caledonian crows. Current Biology, 17(17):1504–1507, September 2007. 12 A Appendix: Deﬁnitions A.1 Tool Use Although there are many proposed tool use deﬁnitions, in this paper we have decided that the Amant and Horton [31] deﬁnition is most represen...

work page 2007