A Visual Reinforcement Learning-Based Separate Primitive Policy for Peg-in-Hole Tasks

Guocai Yang; Jingdong Zhao; Lei Zhuang; Yuntao Li; Zhaomin Wang; Zhiyuan Zhao; Zichun Xu

arxiv: 2504.14820 · v2 · pith:LTHXE24Vnew · submitted 2025-04-21 · 💻 cs.RO

A Visual Reinforcement Learning-Based Separate Primitive Policy for Peg-in-Hole Tasks

Zichun Xu , Zhaomin Wang , Yuntao Li , Lei Zhuang , Zhiyuan Zhao , Guocai Yang , Jingdong Zhao This is my paper

Pith reviewed 2026-05-22 18:40 UTC · model grok-4.3

classification 💻 cs.RO

keywords peg-in-holereinforcement learningvisual RLassembly tasksprimitive policysample efficiencyrobot manipulation

0 comments

The pith

A separate primitive policy for visual RL lets agents master peg-in-hole tasks with fewer samples and higher success rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper draws from human binocular vision to split peg-in-hole assembly into a location primitive that positions the peg above the hole and an insertion primitive that completes the mating. It encodes this split as a Separate Primitive Policy (S2P) compatible with any model-free reinforcement learning algorithm. Ten polygon benchmarks in simulation show the split yields better sample efficiency and success even when force limits are active. Real-robot trials confirm the approach transfers without retraining from scratch. Ablation tests explore how the separation affects generalization across task variations.

Core claim

The central claim is that explicitly separating the policy into independent location and insertion primitives allows visual reinforcement learning agents to derive both action types simultaneously yet learn each phase more effectively than a single joint policy, producing measurable gains in sample efficiency and success rate across ten distinct polygon insertion tasks under force constraints.

What carries the argument

The Separate Primitive Policy (S2P), which decomposes the action space into a location primitive and an insertion primitive so that each can be learned while the other is also active.

Load-bearing premise

The assumption that splitting the policy into separate location and insertion primitives improves learning dynamics over a single joint policy.

What would settle it

Running the exact same ten polygon benchmarks and force-constrained settings with a single joint policy that matches or exceeds S2P's sample efficiency and success rate would falsify the claimed benefit of the separation.

Figures

Figures reproduced from arXiv: 2504.14820 by Guocai Yang, Jingdong Zhao, Lei Zhuang, Yuntao Li, Zhaomin Wang, Zhiyuan Zhao, Zichun Xu.

**Figure 2.** Figure 2: Network architectures for the actor and critic of S2P-DrQ-v2. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Simulation setup and peg-in-hole suites with different shapes, where pegs are initialized with being grasped by the gripper [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Training performance of S2P against the plain policy, where the solid line and the shaded area represent the mean and standard deviation across [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Benchmark results of S2P-DrQ-v2 and DrQ-v2 with force penalty. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Training procedure and communication network on the real [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Real-world platform setup and a completed insertion process with the trained model of S2P-DrQ-v2. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Ablation analysis on the effect of action repeat on S2P-DrQ-v2. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

read the original abstract

For peg-in-hole tasks, humans rely on binocular visual perception to locate the peg above the hole surface and then proceed with insertion. This paper draws insights from this behavior to enable agents to learn efficient assembly strategies through visual reinforcement learning. Hence, we propose a Separate Primitive Policy (S2P) to learn how to derive location and insertion actions simultaneously. S2P is compatible with model-free reinforcement learning algorithms. Ten insertion tasks featuring different polygons are developed as benchmarks for evaluations. Simulation experiments show that S2P can boost the sample efficiency and success rate even with force constraints. Real-world experiments are also performed to verify the feasibility of S2P. Ablations are finally given to discuss the generalizability of S2P and some factors that affect its performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

S2P splits location and insertion into simultaneous primitives for visual RL on peg-in-hole and reports gains in sim and real, but without a matched single-policy baseline the separation's causal role stays unproven.

read the letter

The main point is that this work splits the policy into independent location and insertion primitives that run at the same time under visual RL, and claims this raises sample efficiency and success rates on peg-in-hole even when force limits are active. They test it on ten different polygon shapes in simulation and move the same setup to a real robot for verification. The method stays compatible with ordinary model-free RL, which keeps the implementation straightforward. The benchmark collection across shapes and the sim-to-real step are the parts that actually add value; they give a concrete way to compare assembly strategies under contact constraints. Ablations are mentioned for generalizability and performance factors, which is a step in the right direction. The soft spot is exactly the one the stress-test flags: there is no direct head-to-head against an otherwise identical joint policy trained on the same tasks. Without that, it is difficult to tell whether the primitive split itself improves learning dynamics or whether other choices in reward, network, or visual encoder are responsible. The abstract also gives no numbers, error bars, or details on how force constraints are enforced inside the RL loop, so the size of the reported gains is hard to judge. This paper is aimed at people working on visual RL for contact-rich robotic assembly. A reader who needs practical benchmarks or ideas for splitting manipulation into primitives would get some usable material from the experiments. I would send it to peer review because the core setup is grounded and the real-world check adds weight, even though stronger controls on the ablations would make the central claim tighter.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Separate Primitive Policy (S2P) for visual reinforcement learning in peg-in-hole tasks. Drawing from human binocular vision, S2P decomposes the policy into independent location and insertion primitives that are learned simultaneously. It is compatible with model-free RL algorithms and is evaluated on ten polygon insertion benchmarks in simulation (showing gains in sample efficiency and success rate under force constraints) plus real-world verification. Ablations discuss generalizability and performance factors.

Significance. If the separation mechanism is shown to be causal, the work could support modular policy designs for contact-rich assembly tasks and improve sample efficiency in constrained visual RL settings. The multi-benchmark simulation suite and real-world transfer provide a reasonable empirical foundation, though the absence of matched baselines limits the strength of the causal claim.

major comments (2)

[Experiments / Ablations] Experiments and ablations sections: The central claim that primitive separation itself boosts sample efficiency and success rate (even with force constraints) is not supported by a direct head-to-head comparison against an otherwise identical monolithic joint policy. Ablations appear to vary secondary factors (network size, reward shaping, visual encoder) but do not report training a single-policy baseline on the same ten benchmarks with matched hyperparameters, architecture, and force handling. This leaves the operative mechanism unproven.
[Methods] Methods or implementation details: The abstract and results claim improved performance 'even with force constraints,' yet the manuscript provides insufficient detail on how force limits are enforced during training (e.g., via reward penalties, action clipping, or external controllers) and whether the same constraints are applied identically to any baselines. This detail is load-bearing for the robustness claim.

minor comments (2)

[Abstract] Abstract: No quantitative numbers, error bars, or baseline comparisons are reported, which weakens the ability to assess the magnitude of the claimed gains.
[Figures / Notation] Notation and figures: Clarify whether the two primitives share any parameters or visual features, and ensure all figures include clear legends distinguishing S2P from any comparison methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our results. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [Experiments / Ablations] Experiments and ablations sections: The central claim that primitive separation itself boosts sample efficiency and success rate (even with force constraints) is not supported by a direct head-to-head comparison against an otherwise identical monolithic joint policy. Ablations appear to vary secondary factors (network size, reward shaping, visual encoder) but do not report training a single-policy baseline on the same ten benchmarks with matched hyperparameters, architecture, and force handling. This leaves the operative mechanism unproven.

Authors: We agree that a direct head-to-head comparison against a monolithic joint policy with matched hyperparameters, architecture, and force handling on the same ten benchmarks would provide stronger evidence for the causal benefit of primitive separation. The current ablations examine factors internal to S2P (such as network size and reward shaping) and compare against methods from the literature, but do not include this specific baseline. In the revised manuscript we will add this experiment and report the corresponding sample-efficiency and success-rate results. revision: yes
Referee: [Methods] Methods or implementation details: The abstract and results claim improved performance 'even with force constraints,' yet the manuscript provides insufficient detail on how force limits are enforced during training (e.g., via reward penalties, action clipping, or external controllers) and whether the same constraints are applied identically to any baselines. This detail is load-bearing for the robustness claim.

Authors: We acknowledge that the manuscript currently lacks explicit implementation details on force-limit enforcement. In the revised version we will expand the Methods section to describe that force limits are enforced via a combination of reward penalties for exceeding predefined force thresholds and action clipping inside the simulator. The same enforcement mechanism is applied uniformly to S2P and all baselines to maintain comparability. revision: yes

Circularity Check

0 steps flagged

Empirical RL method proposal with no derivation chain

full rationale

The paper proposes S2P as an algorithmic design choice (separate location and insertion primitives) inspired by human behavior, then validates it via simulation benchmarks on ten polygon tasks and real-world tests. No equations, fitted parameters, or self-citations are used to derive the core claim; results are reported directly from training runs. This is self-contained empirical work with independent experimental evidence, so no circularity is present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on standard RL assumptions such as Markov decision process formulation and reward design for insertion success, but no explicit free parameters, axioms, or invented entities are detailed in the abstract.

pith-pipeline@v0.9.0 · 5676 in / 1106 out tokens · 25861 ms · 2026-05-22T18:40:19.780481+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Two separate policies are trained simultaneously to derive location and insertion actions, respectively, which are executed sequentially by the agent... Eqs. 4-7 reformulate the critic and actor losses for the two primitives.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Ten insertion tasks featuring different polygons... success rate... force constraints.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

Z. Yuan, Z. Xue, B. Yuan, X. Wang, Y . Wu, Y . Gao, and H. Xu, “Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning.” in Conference on Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[2]

Polyfit: A Peg- in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to- real Adaptation

G. Lee, J. Lee, S. Noh, M. Ko, K. Kim, and K. Lee, “Polyfit: A Peg- in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to- real Adaptation.” in IEEE/RJS International Conference on Intelligent Robots and Systems (IROS) , vol. abs/2312.02531, 2024, pp. 533–540

work page arXiv 2024
[3]

The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

C. Sferrazza, Y . Seo, H. Liu, Y . Lee, and P. Abbeel, “The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning.” in IEEE/RJS International Conference on Intelligent Robots and Systems (IROS) , vol. abs/2311.00924, 2024, pp. 9698–9705

work page arXiv 2024
[4]

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

N. Hansen, Z. Yuan, Y . Ze, T. Mu, A. Rajeswaran, H. Su, H. Xu, and X. Wang, “On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline.” in International Conference on Machine Learning (ICML) , 2023, pp. 12 511–12 526

work page 2023
[5]

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Re- inforcement Learning,

Z. Yuan, T. Wei, S. Cheng, G. Zhang, Y . Chen, and H. Xu, “Learning to Manipulate Anywhere: A Visual Generalizable Framework For Re- inforcement Learning,” in 8th Annual Conference on Robot Learning , 2024

work page 2024
[6]

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

N. Hansen, H. Su, and X. Wang, “Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation.” in Conference on Neural Information Processing Systems (NeurIPS) , 2021, pp. 3680–3693

work page 2021
[7]

Augmenting Reinforcement Learn- ing with Behavior Primitives for Diverse Manipulation Tasks,

S. Nasiriany, H. Liu, and Y . Zhu, “Augmenting Reinforcement Learn- ing with Behavior Primitives for Diverse Manipulation Tasks,” in2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 7477–7484

work page 2022
[8]

Learning Sequences of Manip- ulation Primitives for Robotic Assembly,

N. Vuong, H. Pham, and Q.-C. Pham, “Learning Sequences of Manip- ulation Primitives for Robotic Assembly,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021

work page 2021
[9]

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

X. Chen, C. Wang, Z. Zhou, and K. W. Ross, “Randomized Ensembled Double Q-Learning: Learning Fast Without a Model.” in International Conference on Learning Representations (ICLR) , 2021

work page 2021
[10]

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages,

G. Ma, L. Li, S. Zhang, Z. Liu, Z. Wang, Y . Chen, L. Shen, X. Wang, and D. Tao, “Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages,” in International Conference on Learning Representations (ICLR) , 2024

work page 2024
[11]

Reinforcement Learning of Impedance Policies for Peg-in-Hole Tasks: Role of Asymmetric Matrices,

S. Kozlovsky, E. Newman, and M. Zacksenhouse, “Reinforcement Learning of Impedance Policies for Peg-in-Hole Tasks: Role of Asymmetric Matrices,” IEEE Robotics and Automation Letters , vol. 7, no. 4, pp. 10 898–10 905, 2022

work page 2022
[12]

Benchmarking Protocols for Evaluating Small Parts Robotic Assem- bly Systems,

Kimble, Kenneth, Van, Wyk, Karl, Falco, Joe, Messina, Elena, Sun, Yu, Shibata, Mizuho, Uemura, Wataru, Yokokohji, and Yasuyoshi, “Benchmarking Protocols for Evaluating Small Parts Robotic Assem- bly Systems,” IEEE Robotics and Automation Letters , 2020

work page 2020
[13]

Multimodality Driven Impedance-Based Sim2Real Transfer Learning for Robotic Multiple Peg-in-Hole Assembly,

W. Chen, C. Zeng, H. Liang, F. Sun, and J. Zhang, “Multimodality Driven Impedance-Based Sim2Real Transfer Learning for Robotic Multiple Peg-in-Hole Assembly,” IEEE Transactions on Cybernetics , pp. 1–14, 2024

work page 2024
[14]

Visual-Force- Tactile Fusion for Gentle Intricate Insertion Tasks,

P. Jin, B. Huang, W. W. Lee, T. Li, and W. Yang, “Visual-Force- Tactile Fusion for Gentle Intricate Insertion Tasks,” IEEE Robotics and Automation Letters , pp. 1–8, 2024

work page 2024
[15]

Tactile-RL for Insertion: Generalization to Objects of Un- known Geometry,

S. Dong, D. K. Jha, D. Romeres, S. Kim, D. Nikovski, and A. Ro- driguez, “Tactile-RL for Insertion: Generalization to Objects of Un- known Geometry,” in2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 6437–6443

work page 2021
[16]

Reinforcement Learning on Variable Impedance Con- troller for High-Precision Robotic Assembly,

J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar, and P. Abbeel, “Reinforcement Learning on Variable Impedance Con- troller for High-Precision Robotic Assembly,” in 2019 International Conference on Robotics and Automation (ICRA) . IEEE, 2019, pp. 3080–3087

work page 2019
[17]

Tacsl: A Library for Visuotactile Sensor Simulation and Learning,

I. Akinola, J. Xu, J. Carius, D. Fox, and Y . S. Narang, “Tacsl: A Library for Visuotactile Sensor Simulation and Learning,” IEEE Transactions on robotics , vol. abs/2408.06506, 2024

work page arXiv 2024
[18]

Vi- sual Spatial Attention and Proprioceptive Data-Driven Reinforcement Learning for Robust Peg-in-Hole Task Under Variable Conditions,

A. Y . Yasutomi, H. Ichiwara, H. Ito, H. Mori, and T. Ogata, “Vi- sual Spatial Attention and Proprioceptive Data-Driven Reinforcement Learning for Robust Peg-in-Hole Task Under Variable Conditions,” IEEE Robotics and Automation Letters , vol. 8, no. 3, pp. 1834–1841, 2023

work page 2023
[19]

Proactive Action Visual Residual Reinforcement Learning for Contact-Rich Tasks Using a Torque-Controlled Robot

Y . Shi, Z. Chen, H. Liu, S. Riedel, C. Gao, Q. Feng, J. Deng, and J. Zhang, “Proactive Action Visual Residual Reinforcement Learning for Contact-Rich Tasks Using a Torque-Controlled Robot.” in IEEE International Conference on Robotics and Automation (ICRA) , 2021, pp. 765–771

work page 2021
[20]

Automate: Specialist and Generalist Assembly Policies over Diverse Geometries

B. Tang, I. Akinola, J. Xu, B. Wen, A. Handa, K. V . Wyk, D. Fox, G. S. Sukhatme, F. Ramos, and Y . S. Narang, “Automate: Specialist and Generalist Assembly Policies over Diverse Geometries.” in Robotics: Science and Systems Conference (RSS) , 2024

work page 2024
[21]

Learning Insertion Primitives with Discrete-Continuous Hybrid Action Space for Robotic Assembly Tasks,

X. Zhang, S. Jin, C. Wang, X. Zhu, and M. Tomizuka, “Learning Insertion Primitives with Discrete-Continuous Hybrid Action Space for Robotic Assembly Tasks,” in 2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 9881–9887

work page 2022
[22]

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Y . Ze, Y . Liu, R. Shi, J. Qin, Z. Yuan, J. Wang, and H. Xu, “H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation.” in Conference on Neural Information Processing Systems (NeurIPS) , 2023

work page 2023
[23]

Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning

D. Bertoin, A. Zouitine, M. Zouitine, and E. Rachelson, “Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning.” in Conference on Neural Information Processing Systems (NeurIPS) , 2022

work page 2022
[24]

Reinforcement Learning with Augmented Data

M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement Learning with Augmented Data.” in Conference on Neural Information Processing Systems (NeurIPS) , 2020

work page 2020
[25]

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

D. Yarats, I. Kostrikov, and R. Fergus, “Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels.” in International Conference on Learning Representations (ICLR) , 2021

work page 2021
[26]

Mastering Visual Con- tinuous Control: Improved Data-Augmented Reinforcement Learning

D. Yarats, R. Fergus, A. Lazaric, and L. Pinto, “Mastering Visual Con- tinuous Control: Improved Data-Augmented Reinforcement Learning.” in International Conference on Learning Representations (ICLR) , 2022

work page 2022
[27]

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

A. Almuzairee, N. Hansen, and H. I. Christensen, “A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning.” Reinforcement Learning Conference (RLC) , vol. 1, pp. 130–157, 2024

work page 2024
[28]

Taco: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

R. Zheng, X. Wang, Y . Sun, S. Ma, J. Zhao, H. Xu, H. D. III, and F. Huang, “Taco: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning.” inConference on Neural Information Processing Systems (NeurIPS) , 2023

work page 2023
[29]

R3m: A Universal Visual Representation for Robot Manipulation

S. Nair, A. Rajeswaran, V . Kumar, C. Finn, and A. Gupta, “R3m: A Universal Visual Representation for Robot Manipulation.” in Confer- ence on Robot Learning (CoRL) , 2022, pp. 892–909

work page 2022
[30]

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset,

G. Jiang, Y . Sun, T. Huang, H. Li, Y . Liang, and H. Xu, “Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset,” in The Thirteenth International Confer- ence on Learning Representations , vol. abs/2410.22325, 2025

work page arXiv 2025
[31]

A markovian decision process,

R. Bellman, “A markovian decision process,” Journal of mathematics and mechanics , pp. 679–684, 1957

work page 1957
[32]

Continuous control with deep reinforce- ment learning,

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” arXiv, 2015

work page 2015
[33]

Addressing function approxi- mation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” in International conference on machine learning . PMLR, 2018, pp. 1587–1596

work page 2018
[34]

One policy to control them all: Shared modular policies for agent-agnostic control,

W. Huang, I. Mordatch, and D. Pathak, “One policy to control them all: Shared modular policies for agent-agnostic control,” in International Conference on Machine Learning . PMLR, 2020, pp. 4455–4464

work page 2020
[35]

Active Vision Reinforcement Learning under Limited Visual Observability

J. Shang and M. S. Ryoo, “Active Vision Reinforcement Learning under Limited Visual Observability.” in Conference on Neural Infor- mation Processing Systems (NeurIPS) . arXiv, 2023

work page 2023
[36]

A unified approach for motion and force control of robot manipulators: The operational space formulation

O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation.” IEEE Transactions on Robotics , vol. 3, no. 1, pp. 43–53, 1987

work page 1987

[1] [1]

Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

Z. Yuan, Z. Xue, B. Yuan, X. Wang, Y . Wu, Y . Gao, and H. Xu, “Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning.” in Conference on Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[2] [2]

Polyfit: A Peg- in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to- real Adaptation

G. Lee, J. Lee, S. Noh, M. Ko, K. Kim, and K. Lee, “Polyfit: A Peg- in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to- real Adaptation.” in IEEE/RJS International Conference on Intelligent Robots and Systems (IROS) , vol. abs/2312.02531, 2024, pp. 533–540

work page arXiv 2024

[3] [3]

The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

C. Sferrazza, Y . Seo, H. Liu, Y . Lee, and P. Abbeel, “The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning.” in IEEE/RJS International Conference on Intelligent Robots and Systems (IROS) , vol. abs/2311.00924, 2024, pp. 9698–9705

work page arXiv 2024

[4] [4]

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

N. Hansen, Z. Yuan, Y . Ze, T. Mu, A. Rajeswaran, H. Su, H. Xu, and X. Wang, “On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline.” in International Conference on Machine Learning (ICML) , 2023, pp. 12 511–12 526

work page 2023

[5] [5]

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Re- inforcement Learning,

Z. Yuan, T. Wei, S. Cheng, G. Zhang, Y . Chen, and H. Xu, “Learning to Manipulate Anywhere: A Visual Generalizable Framework For Re- inforcement Learning,” in 8th Annual Conference on Robot Learning , 2024

work page 2024

[6] [6]

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

N. Hansen, H. Su, and X. Wang, “Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation.” in Conference on Neural Information Processing Systems (NeurIPS) , 2021, pp. 3680–3693

work page 2021

[7] [7]

Augmenting Reinforcement Learn- ing with Behavior Primitives for Diverse Manipulation Tasks,

S. Nasiriany, H. Liu, and Y . Zhu, “Augmenting Reinforcement Learn- ing with Behavior Primitives for Diverse Manipulation Tasks,” in2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 7477–7484

work page 2022

[8] [8]

Learning Sequences of Manip- ulation Primitives for Robotic Assembly,

N. Vuong, H. Pham, and Q.-C. Pham, “Learning Sequences of Manip- ulation Primitives for Robotic Assembly,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021

work page 2021

[9] [9]

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

X. Chen, C. Wang, Z. Zhou, and K. W. Ross, “Randomized Ensembled Double Q-Learning: Learning Fast Without a Model.” in International Conference on Learning Representations (ICLR) , 2021

work page 2021

[10] [10]

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages,

G. Ma, L. Li, S. Zhang, Z. Liu, Z. Wang, Y . Chen, L. Shen, X. Wang, and D. Tao, “Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages,” in International Conference on Learning Representations (ICLR) , 2024

work page 2024

[11] [11]

Reinforcement Learning of Impedance Policies for Peg-in-Hole Tasks: Role of Asymmetric Matrices,

S. Kozlovsky, E. Newman, and M. Zacksenhouse, “Reinforcement Learning of Impedance Policies for Peg-in-Hole Tasks: Role of Asymmetric Matrices,” IEEE Robotics and Automation Letters , vol. 7, no. 4, pp. 10 898–10 905, 2022

work page 2022

[12] [12]

Benchmarking Protocols for Evaluating Small Parts Robotic Assem- bly Systems,

Kimble, Kenneth, Van, Wyk, Karl, Falco, Joe, Messina, Elena, Sun, Yu, Shibata, Mizuho, Uemura, Wataru, Yokokohji, and Yasuyoshi, “Benchmarking Protocols for Evaluating Small Parts Robotic Assem- bly Systems,” IEEE Robotics and Automation Letters , 2020

work page 2020

[13] [13]

Multimodality Driven Impedance-Based Sim2Real Transfer Learning for Robotic Multiple Peg-in-Hole Assembly,

W. Chen, C. Zeng, H. Liang, F. Sun, and J. Zhang, “Multimodality Driven Impedance-Based Sim2Real Transfer Learning for Robotic Multiple Peg-in-Hole Assembly,” IEEE Transactions on Cybernetics , pp. 1–14, 2024

work page 2024

[14] [14]

Visual-Force- Tactile Fusion for Gentle Intricate Insertion Tasks,

P. Jin, B. Huang, W. W. Lee, T. Li, and W. Yang, “Visual-Force- Tactile Fusion for Gentle Intricate Insertion Tasks,” IEEE Robotics and Automation Letters , pp. 1–8, 2024

work page 2024

[15] [15]

Tactile-RL for Insertion: Generalization to Objects of Un- known Geometry,

S. Dong, D. K. Jha, D. Romeres, S. Kim, D. Nikovski, and A. Ro- driguez, “Tactile-RL for Insertion: Generalization to Objects of Un- known Geometry,” in2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 6437–6443

work page 2021

[16] [16]

Reinforcement Learning on Variable Impedance Con- troller for High-Precision Robotic Assembly,

J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar, and P. Abbeel, “Reinforcement Learning on Variable Impedance Con- troller for High-Precision Robotic Assembly,” in 2019 International Conference on Robotics and Automation (ICRA) . IEEE, 2019, pp. 3080–3087

work page 2019

[17] [17]

Tacsl: A Library for Visuotactile Sensor Simulation and Learning,

I. Akinola, J. Xu, J. Carius, D. Fox, and Y . S. Narang, “Tacsl: A Library for Visuotactile Sensor Simulation and Learning,” IEEE Transactions on robotics , vol. abs/2408.06506, 2024

work page arXiv 2024

[18] [18]

Vi- sual Spatial Attention and Proprioceptive Data-Driven Reinforcement Learning for Robust Peg-in-Hole Task Under Variable Conditions,

A. Y . Yasutomi, H. Ichiwara, H. Ito, H. Mori, and T. Ogata, “Vi- sual Spatial Attention and Proprioceptive Data-Driven Reinforcement Learning for Robust Peg-in-Hole Task Under Variable Conditions,” IEEE Robotics and Automation Letters , vol. 8, no. 3, pp. 1834–1841, 2023

work page 2023

[19] [19]

Proactive Action Visual Residual Reinforcement Learning for Contact-Rich Tasks Using a Torque-Controlled Robot

Y . Shi, Z. Chen, H. Liu, S. Riedel, C. Gao, Q. Feng, J. Deng, and J. Zhang, “Proactive Action Visual Residual Reinforcement Learning for Contact-Rich Tasks Using a Torque-Controlled Robot.” in IEEE International Conference on Robotics and Automation (ICRA) , 2021, pp. 765–771

work page 2021

[20] [20]

Automate: Specialist and Generalist Assembly Policies over Diverse Geometries

B. Tang, I. Akinola, J. Xu, B. Wen, A. Handa, K. V . Wyk, D. Fox, G. S. Sukhatme, F. Ramos, and Y . S. Narang, “Automate: Specialist and Generalist Assembly Policies over Diverse Geometries.” in Robotics: Science and Systems Conference (RSS) , 2024

work page 2024

[21] [21]

Learning Insertion Primitives with Discrete-Continuous Hybrid Action Space for Robotic Assembly Tasks,

X. Zhang, S. Jin, C. Wang, X. Zhu, and M. Tomizuka, “Learning Insertion Primitives with Discrete-Continuous Hybrid Action Space for Robotic Assembly Tasks,” in 2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 9881–9887

work page 2022

[22] [22]

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Y . Ze, Y . Liu, R. Shi, J. Qin, Z. Yuan, J. Wang, and H. Xu, “H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation.” in Conference on Neural Information Processing Systems (NeurIPS) , 2023

work page 2023

[23] [23]

Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning

D. Bertoin, A. Zouitine, M. Zouitine, and E. Rachelson, “Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning.” in Conference on Neural Information Processing Systems (NeurIPS) , 2022

work page 2022

[24] [24]

Reinforcement Learning with Augmented Data

M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement Learning with Augmented Data.” in Conference on Neural Information Processing Systems (NeurIPS) , 2020

work page 2020

[25] [25]

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

D. Yarats, I. Kostrikov, and R. Fergus, “Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels.” in International Conference on Learning Representations (ICLR) , 2021

work page 2021

[26] [26]

Mastering Visual Con- tinuous Control: Improved Data-Augmented Reinforcement Learning

D. Yarats, R. Fergus, A. Lazaric, and L. Pinto, “Mastering Visual Con- tinuous Control: Improved Data-Augmented Reinforcement Learning.” in International Conference on Learning Representations (ICLR) , 2022

work page 2022

[27] [27]

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

A. Almuzairee, N. Hansen, and H. I. Christensen, “A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning.” Reinforcement Learning Conference (RLC) , vol. 1, pp. 130–157, 2024

work page 2024

[28] [28]

Taco: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

R. Zheng, X. Wang, Y . Sun, S. Ma, J. Zhao, H. Xu, H. D. III, and F. Huang, “Taco: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning.” inConference on Neural Information Processing Systems (NeurIPS) , 2023

work page 2023

[29] [29]

R3m: A Universal Visual Representation for Robot Manipulation

S. Nair, A. Rajeswaran, V . Kumar, C. Finn, and A. Gupta, “R3m: A Universal Visual Representation for Robot Manipulation.” in Confer- ence on Robot Learning (CoRL) , 2022, pp. 892–909

work page 2022

[30] [30]

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset,

G. Jiang, Y . Sun, T. Huang, H. Li, Y . Liang, and H. Xu, “Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset,” in The Thirteenth International Confer- ence on Learning Representations , vol. abs/2410.22325, 2025

work page arXiv 2025

[31] [31]

A markovian decision process,

R. Bellman, “A markovian decision process,” Journal of mathematics and mechanics , pp. 679–684, 1957

work page 1957

[32] [32]

Continuous control with deep reinforce- ment learning,

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” arXiv, 2015

work page 2015

[33] [33]

Addressing function approxi- mation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” in International conference on machine learning . PMLR, 2018, pp. 1587–1596

work page 2018

[34] [34]

One policy to control them all: Shared modular policies for agent-agnostic control,

W. Huang, I. Mordatch, and D. Pathak, “One policy to control them all: Shared modular policies for agent-agnostic control,” in International Conference on Machine Learning . PMLR, 2020, pp. 4455–4464

work page 2020

[35] [35]

Active Vision Reinforcement Learning under Limited Visual Observability

J. Shang and M. S. Ryoo, “Active Vision Reinforcement Learning under Limited Visual Observability.” in Conference on Neural Infor- mation Processing Systems (NeurIPS) . arXiv, 2023

work page 2023

[36] [36]

A unified approach for motion and force control of robot manipulators: The operational space formulation

O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation.” IEEE Transactions on Robotics , vol. 3, no. 1, pp. 43–53, 1987

work page 1987