Robot Learning of Shifting Objects for Grasping in Cluttered Environments

Lars Berscheid; Pascal Mei{\ss}ner; Torsten Kr\"oger

arxiv: 1907.11035 · v1 · pith:62O65KLVnew · submitted 2019-07-25 · 💻 cs.RO

Robot Learning of Shifting Objects for Grasping in Cluttered Environments

Lars Berscheid , Pascal Mei{\ss}ner , Torsten Kr\"oger This is my paper

Pith reviewed 2026-05-24 16:13 UTC · model grok-4.3

classification 💻 cs.RO

keywords robotic graspingshifting objectspre-grasping manipulationself-supervised learningbin pickingcluttered environmentsdata-efficient learning

0 comments

The pith

By tying shift learning directly to grasp success, a robot masters both skills through self-supervised data for cluttered bin picking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for a robot to learn shifting objects in clutter so that their chance of successful grasping rises. By making the shifting skill depend on the grasping objective, the system avoids sparse rewards and gathers training data more efficiently from its own interactions. After collecting around 25000 grasp attempts and 2500 shift attempts, the robot empties bins completely, files objects at 274 picks per hour, and handles objects it has never seen before.

Core claim

We present an algorithm that learns the optimal pose for manipulation primitives and trains non-prehensile shift actions to raise grasp probability. Linking shifting to grasping removes the need for sparse rewards and enables data-efficient self-supervised learning. Applied to the industrial bin-picking task, the system empties bins after training on approximately 25000 grasp and 2500 shift actions, reaches 274 picks per hour, and generalizes to novel objects.

What carries the argument

The explicit dependence of learned shift actions on increasing grasp probability, which guides selection of manipulation poses without separate reward engineering.

If this is right

Complete emptying of cluttered bins becomes possible by sequencing learned shifts before grasps.
Training proceeds from self-supervised interactions alone, without hand-designed reward functions.
The robot sustains 274 picks per hour while filing objects from bins.
The same learned behaviors transfer to objects not present during data collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dependence structure could guide learning of other preparatory actions such as pushing if each can be scored by its effect on a primary skill.
Industrial deployments might require fewer task-specific reward designs when preparatory moves are anchored to an end goal like grasping.
Scaling the self-supervised loop to longer action sequences could address more layered clutter without redesigning the training process.

Load-bearing premise

Shifting actions learned to increase grasp probability will continue to raise success rates on new objects and scenes without further tuning.

What would settle it

A test bin containing only objects absent from the original 25000 grasp trials in which applying the learned shifts produces no measurable rise in grasp success rate.

Figures

Figures reproduced from arXiv: 1907.11035 by Lars Berscheid, Pascal Mei{\ss}ner, Torsten Kr\"oger.

**Figure 2.** Figure 2: Our fully-convolutional neural network (NN) archi [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of depth images before (left) and after (right) an applied motion primitive. The maximal grasp probability [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: State diagram of the combined grasping and shifting [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: Grasp rate and picks per hour (PPH) depending on [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 5.** Figure 5: Examples of heat maps for shifting. The NN predicts [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 7.** Figure 7: The object set for testing the system’s ability to [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

read the original abstract

Robotic grasping in cluttered environments is often infeasible due to obstacles preventing possible grasps. Then, pre-grasping manipulation like shifting or pushing an object becomes necessary. We developed an algorithm that can learn, in addition to grasping, to shift objects in such a way that their grasp probability increases. Our research contribution is threefold: First, we present an algorithm for learning the optimal pose of manipulation primitives like clamping or shifting. Second, we learn non-prehensible actions that explicitly increase the grasping probability. Making one skill (shifting) directly dependent on another (grasping) removes the need of sparse rewards, leading to more data-efficient learning. Third, we apply a real-world solution to the industrial task of bin picking, resulting in the ability to empty bins completely. The system is trained in a self-supervised manner with around 25000 grasp and 2500 shift actions. Our robot is able to grasp and file objects with 274 picks per hour. Furthermore, we demonstrate the system's ability to generalize to novel objects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper presents a self-supervised learning method for robotic bin picking in clutter that jointly learns grasping and shifting actions. Shifting is trained to explicitly increase measured grasp success probability, which the authors argue removes the need for sparse rewards and yields data-efficient learning (approximately 25,000 grasp and 2,500 shift actions). The system is deployed on an industrial bin-picking task, achieving 274 picks per hour while emptying bins completely, and is claimed to generalize to novel objects.

Significance. If the central claims hold, the work would be significant for practical robotic manipulation: it offers a concrete mechanism for learning non-prehensile pre-grasping actions without manual reward design and demonstrates real-world throughput suitable for industrial use. The self-supervised data collection and explicit linkage between skills are strengths that could influence subsequent work on multi-skill robotic learning.

major comments (3)

[Abstract] Abstract: the performance figure of 274 picks per hour is reported without error bars, trial counts, or variance estimates; this is load-bearing for the data-efficiency and real-world applicability claims.
[Abstract] Abstract: no ablation studies, baseline comparisons, or quantitative metrics are described to substantiate that tying shift reward to grasp success produces measurably more data-efficient learning than alternatives.
[Abstract] Abstract: the generalization claim to novel objects is asserted without any held-out object set, coverage statistics, or success rates on unseen items, which directly affects whether the learned shift policy transfers as required by the method.

minor comments (1)

[Abstract] Abstract: the phrase 'around 25000 grasp and 2500 shift actions' is imprecise; exact counts, collection protocol, and how grasp probability is estimated should be stated explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments on the abstract. We address each point below and indicate where revisions to the manuscript will be made to improve clarity and substantiation of the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the performance figure of 274 picks per hour is reported without error bars, trial counts, or variance estimates; this is load-bearing for the data-efficiency and real-world applicability claims.

Authors: We agree that the abstract would benefit from additional statistical context for the 274 picks per hour figure. This value was obtained from a single extended deployment in which the bin was fully emptied. In the revised manuscript we will update the abstract to report the number of trials performed and any observed variance or success statistics from the real-world experiments. revision: yes
Referee: [Abstract] Abstract: no ablation studies, baseline comparisons, or quantitative metrics are described to substantiate that tying shift reward to grasp success produces measurably more data-efficient learning than alternatives.

Authors: The manuscript's central argument is that making the shift policy's reward an explicit function of measured grasp success removes the need for manually designed sparse rewards, thereby enabling data-efficient learning with only 2,500 shift actions. While the current version does not contain explicit ablation studies or baseline comparisons, the reported data volumes and successful industrial deployment provide supporting evidence for the efficiency claim. We will expand the discussion section to more explicitly articulate this design choice and its implications; however, performing new comparative experiments would require substantial additional data collection that is outside the scope of a revision. revision: partial
Referee: [Abstract] Abstract: the generalization claim to novel objects is asserted without any held-out object set, coverage statistics, or success rates on unseen items, which directly affects whether the learned shift policy transfers as required by the method.

Authors: The full manuscript contains experiments demonstrating transfer to novel objects. To make this claim more precise in the abstract, we will revise the abstract to reference the held-out object set and include the corresponding success rates reported in the results section. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical self-supervised learning with no derivations reducing to inputs by construction

full rationale

The paper describes a self-supervised robotic learning system where shifting actions are trained to increase measured grasp success probability. This is a design choice for reward shaping, not a mathematical derivation. No equations, uniqueness theorems, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. Generalization to novel objects is asserted empirically from 2500 shift actions; the claim does not reduce to a tautology or self-referential fit. The derivation chain is absent, so no load-bearing step collapses by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; all details on learning formulation and data collection are absent.

pith-pipeline@v0.9.0 · 5715 in / 1053 out tokens · 21770 ms · 2026-05-24T16:13:47.346136+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Data-driven grasp synthesis—a survey,

J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp synthesis—a survey,” IEEE Transactions on Robotics , vol. 30, no. 2, pp. 289–309, 2014

work page 2014
[2]

Planning optimal grasps,

C. Ferrari and J. Canny, “Planning optimal grasps,” in Proceedings 1992 IEEE International Conference on Robotics and Automation . IEEE, 1992, pp. 2290–2295

work page 1992
[3]

Graspit! A Versatile Simulator for Robotic Grasping,

A. T. Miller and P. K. Allen, “Graspit! A Versatile Simulator for Robotic Grasping,” IEEE Robotics & Automation Magazine , vol. 11, no. 4, pp. 110–122, 2004

work page 2004
[4]

Push-grasping with dexterous hands: Mechanics and a method,

M. R. Dogar and S. S. Srinivasa, “Push-grasping with dexterous hands: Mechanics and a method,” in Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on . IEEE, 2010, pp. 2123– 2130

work page 2010
[5]

Planning pre-grasp manipulation for transport tasks,

L. Y . Chang, S. S. Srinivasa, and N. S. Pollard, “Planning pre-grasp manipulation for transport tasks,” in Robotics and Automation (ICRA), 2010 IEEE International Conference on . IEEE, 2010, pp. 2697–2704

work page 2010
[6]

Deep Spatial Autoencoders for Visuomotor Learning,

C. Finn, X. Y . Tan, Y . Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep Spatial Autoencoders for Visuomotor Learning,” in Interna- tional Conference on Robotics and Automation (ICRA) . IEEE, 2016, pp. 512–519

work page 2016
[7]

Dex-Net 1.0: A cloud-based network of 3d objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards

J. Mahler, F. T. Pokorny, B. Hou, M. Roderick, M. Laskey, M. Aubry, K. Kohlhoff, T. Kroger, J. Kuffner, and K. Goldberg, “Dex-Net 1.0: A cloud-based network of 3d objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards.” IEEE, May 2016, pp. 1957–1964

work page 2016
[8]

Using Simulation and Domain Adaptation to Improve Efﬁciency of Deep Robotic Grasping,

K. Bousmalis, A. Irpan, P. Wohlhart, Y . Bai, M. Kelcey, M. Kalakr- ishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige et al. , “Using Simulation and Domain Adaptation to Improve Efﬁciency of Deep Robotic Grasping,” in International Conference on Robotics and Automation (ICRA) . IEEE, 2018, pp. 4243–4250

work page 2018
[9]

Online movement adaptation based on previous sensor experiences,

P. Pastor, L. Righetti, M. Kalakrishnan, and S. Schaal, “Online movement adaptation based on previous sensor experiences,” in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2011, pp. 365–371

work page 2011
[10]

Supersizing Self-supervision: Learning to Grasp from 50k Tries and 700 Robot Hours,

L. Pinto and A. Gupta, “Supersizing Self-supervision: Learning to Grasp from 50k Tries and 700 Robot Hours,” in International Con- ference on Robotics and Automation (ICRA) . IEEE, 2016, pp. 3406– 3413

work page 2016
[11]

Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection,

S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection,” in International Symposium on Experimental Robotics . Springer, 2016, pp. 173–184

work page 2016
[12]

Improving Data Efﬁciency of Self-supervised Learning for Robotic Grasping,

L. Berscheid, T. Rühr, and T. Kröger, “Improving Data Efﬁciency of Self-supervised Learning for Robotic Grasping,” in International Conference on Robotics and Automation (ICRA) . IEEE, 2019

work page 2019
[13]

Learning synergies between pushing and grasping with self- supervised deep reinforcement learning,

A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self- supervised deep reinforcement learning,” in 2018 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS) . IEEE, 2018, pp. 4238–4245

work page 2018
[14]

Scalable deep reinforcement learning for vision-based robotic manipulation,

D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke et al., “Scalable deep reinforcement learning for vision-based robotic manipulation,” in Conference on Robot Learning , 2018, pp. 651–673

work page 2018
[15]

Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods,

D. Quillen, E. Jang, O. Nachum, C. Finn, J. Ibarz, and S. Levine, “Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods,” in International Conference on Robotics and Automation (ICRA) . IEEE, 2018, pp. 6284–6291

work page 2018
[16]

Dropout as a bayesian approximation: Insights and applications,

Y . Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Insights and applications,” in Deep Learning Workshop, ICML , vol. 1, 2015, p. 2

work page 2015

[1] [1]

Data-driven grasp synthesis—a survey,

J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp synthesis—a survey,” IEEE Transactions on Robotics , vol. 30, no. 2, pp. 289–309, 2014

work page 2014

[2] [2]

Planning optimal grasps,

C. Ferrari and J. Canny, “Planning optimal grasps,” in Proceedings 1992 IEEE International Conference on Robotics and Automation . IEEE, 1992, pp. 2290–2295

work page 1992

[3] [3]

Graspit! A Versatile Simulator for Robotic Grasping,

A. T. Miller and P. K. Allen, “Graspit! A Versatile Simulator for Robotic Grasping,” IEEE Robotics & Automation Magazine , vol. 11, no. 4, pp. 110–122, 2004

work page 2004

[4] [4]

Push-grasping with dexterous hands: Mechanics and a method,

M. R. Dogar and S. S. Srinivasa, “Push-grasping with dexterous hands: Mechanics and a method,” in Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on . IEEE, 2010, pp. 2123– 2130

work page 2010

[5] [5]

Planning pre-grasp manipulation for transport tasks,

L. Y . Chang, S. S. Srinivasa, and N. S. Pollard, “Planning pre-grasp manipulation for transport tasks,” in Robotics and Automation (ICRA), 2010 IEEE International Conference on . IEEE, 2010, pp. 2697–2704

work page 2010

[6] [6]

Deep Spatial Autoencoders for Visuomotor Learning,

C. Finn, X. Y . Tan, Y . Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep Spatial Autoencoders for Visuomotor Learning,” in Interna- tional Conference on Robotics and Automation (ICRA) . IEEE, 2016, pp. 512–519

work page 2016

[7] [7]

Dex-Net 1.0: A cloud-based network of 3d objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards

J. Mahler, F. T. Pokorny, B. Hou, M. Roderick, M. Laskey, M. Aubry, K. Kohlhoff, T. Kroger, J. Kuffner, and K. Goldberg, “Dex-Net 1.0: A cloud-based network of 3d objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards.” IEEE, May 2016, pp. 1957–1964

work page 2016

[8] [8]

Using Simulation and Domain Adaptation to Improve Efﬁciency of Deep Robotic Grasping,

K. Bousmalis, A. Irpan, P. Wohlhart, Y . Bai, M. Kelcey, M. Kalakr- ishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige et al. , “Using Simulation and Domain Adaptation to Improve Efﬁciency of Deep Robotic Grasping,” in International Conference on Robotics and Automation (ICRA) . IEEE, 2018, pp. 4243–4250

work page 2018

[9] [9]

Online movement adaptation based on previous sensor experiences,

P. Pastor, L. Righetti, M. Kalakrishnan, and S. Schaal, “Online movement adaptation based on previous sensor experiences,” in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2011, pp. 365–371

work page 2011

[10] [10]

Supersizing Self-supervision: Learning to Grasp from 50k Tries and 700 Robot Hours,

L. Pinto and A. Gupta, “Supersizing Self-supervision: Learning to Grasp from 50k Tries and 700 Robot Hours,” in International Con- ference on Robotics and Automation (ICRA) . IEEE, 2016, pp. 3406– 3413

work page 2016

[11] [11]

Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection,

S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection,” in International Symposium on Experimental Robotics . Springer, 2016, pp. 173–184

work page 2016

[12] [12]

Improving Data Efﬁciency of Self-supervised Learning for Robotic Grasping,

L. Berscheid, T. Rühr, and T. Kröger, “Improving Data Efﬁciency of Self-supervised Learning for Robotic Grasping,” in International Conference on Robotics and Automation (ICRA) . IEEE, 2019

work page 2019

[13] [13]

Learning synergies between pushing and grasping with self- supervised deep reinforcement learning,

A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self- supervised deep reinforcement learning,” in 2018 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS) . IEEE, 2018, pp. 4238–4245

work page 2018

[14] [14]

Scalable deep reinforcement learning for vision-based robotic manipulation,

D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke et al., “Scalable deep reinforcement learning for vision-based robotic manipulation,” in Conference on Robot Learning , 2018, pp. 651–673

work page 2018

[15] [15]

Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods,

D. Quillen, E. Jang, O. Nachum, C. Finn, J. Ibarz, and S. Levine, “Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods,” in International Conference on Robotics and Automation (ICRA) . IEEE, 2018, pp. 6284–6291

work page 2018

[16] [16]

Dropout as a bayesian approximation: Insights and applications,

Y . Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Insights and applications,” in Deep Learning Workshop, ICML , vol. 1, 2015, p. 2

work page 2015