arxiv: 2605.03290 · v1 · submitted 2026-05-05 · 💻 cs.RO · cs.SY· eess.SY

Recognition: unknown

On Surprising Effects of Risk-Aware Domain Randomization for Contact-Rich Sampling-based Predictive Control

Sergio A. Esteban , Junheng Li , Vince Kurtz , Aaron D. Ames

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:04 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords domain randomizationsampling-based predictive controlcontact-rich tasksrisk-aware aggregationPush-Tmodel uncertaintybasin of attractionpredictive sampling

0 comments

The pith

Risk-aware domain randomization changes which contact actions a sampling optimizer prefers by reshaping their basin of attraction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Domain randomization is normally applied to make learned policies robust to model error. Here the authors apply it inside sampling-based predictive control for a contact-rich Push-T task and compare three ways of aggregating rollouts across randomized models: average, optimistic, and pessimistic. Their tests show that the randomization does more than add robustness; it alters the effective cost surface the optimizer sees, making sequences that produce contact more or less likely to be chosen depending on the risk attitude used. The result suggests a new lever for influencing contact behavior under uncertainty without rewriting the original cost function.

Core claim

When domain-randomized model instances are rolled out and aggregated with risk-aware statistics before the sampling optimizer selects actions, the basin of attraction around contact-producing controls expands or contracts on the simple Push-T task. Average aggregation preserves a broad basin, while pessimistic aggregation narrows it and optimistic aggregation widens it, producing measurable shifts in the frequency of successful contact even when the underlying task cost remains unchanged.

What carries the argument

Risk-aware aggregation of rollouts across randomized model instances, which alters the basin of attraction around contact-producing actions.

Load-bearing premise

The basin-of-attraction reshaping observed on the simple simulated Push-T task will appear in more complex contact-rich tasks and on real hardware without other confounding effects.

What would settle it

Measure the fraction of sampled trajectories that produce contact when the same sampling optimizer is run on the Push-T task once with domain randomization plus risk aggregation and once without; if the fractions are statistically indistinguishable, the claimed reshaping effect is absent.

Figures

Figures reproduced from arXiv: 2605.03290 by Aaron D. Ames, Junheng Li, Sergio A. Esteban, Vince Kurtz.

**Figure 1.** Figure 1: In contact-rich control settings, successful contact may correspond to a narrow low-cost region. We hypothesize that while risk-averse aggregation can suppress this region and shrink its basin of attraction, risk-seeking aggregation can enlarge the basin around promising contact-producing actions. randomized model instances and aggregate the results using three risk operators: 1) Average: mean performance … view at source ↗

**Figure 2.** Figure 2: Comparison of risk-sensitive domain randomization strategies on the Push-T task, averaged over S = 20 simulations with distinct randomization seeds. (a) Time-averaged total cost over Tsim = 7.0 second trajectories (± SE). (b) Block position error over time (± SE) for selected values of R view at source ↗

**Figure 3.** Figure 3: Effect of risk-sensitive domain randomization on a scalar cost landscape J(u) with two local minima. Each row corresponds to a different perturbation magnitude δ. The optimistic strategy J¯opt widens the basins of attraction around local minima, while the pessimistic strategy J¯pes narrows them. As δ increases, these basin-shaping effects become more pronounced. landscape in complicated ways, but for ease … view at source ↗

read the original abstract

Domain randomization (DR) is widely used in policy learning to improve robustness to modeling error, but remains underexplored in contact-rich sampling-based predictive control (SPC), where rollout quality is highly sensitive to uncertainty. In this work, we take the first step by studying risk-aware DR in predictive sampling on a simple yet representative Push-T task, comparing average, optimistic, and pessimistic rollout aggregations under randomized model instances. Our initial results suggest that DR affects not only robustness to model error, but also the effective cost landscape seen by the sampling-based optimizer, by reshaping the basin of attraction around contact-producing actions. This opens up potential for exploring better grounded risk-aware contact-rich SPC under model uncertainty. Video: https://youtu.be/f1F0ALXxhSM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents an initial empirical study of risk-aware domain randomization (DR) within sampling-based predictive control (SPC) for contact-rich tasks. On a simple Push-T task, the authors compare average, optimistic, and pessimistic aggregation of rollouts drawn from randomized model instances and report that DR appears to reshape the effective cost landscape seen by the optimizer, specifically by altering the basin of attraction around contact-producing actions, in addition to its usual robustness benefits.

Significance. If the reported reshaping effect is confirmed, the work could open a new line of inquiry into how domain randomization influences not only robustness but the geometry of the optimization landscape in sampling-based contact-rich controllers. This would be a useful observation for the robotics community working on model-based planning under uncertainty. The provision of a video demonstration is a positive step toward reproducibility.

major comments (1)

[Abstract] Abstract and initial-results presentation: the claim that DR 'reshapes the basin of attraction around contact-producing actions' rests on qualitative observations from a single simple task. No quantitative metrics, error bars, statistical tests, or explicit measurement of basin size/shape are described, which makes the strength and repeatability of the effect difficult to evaluate.

minor comments (2)

The manuscript would benefit from a clearer description of the exact procedure used to generate and aggregate the randomized rollouts (number of samples, randomization ranges, cost function details) so that the experiments can be reproduced.
Consider adding a short discussion of how the observed landscape change might be quantified (e.g., via sampling density around contact states or success-rate heatmaps) in a revision.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our initial empirical study. We address the major comment point by point below and outline revisions that will strengthen the presentation of our findings on risk-aware domain randomization in sampling-based predictive control.

read point-by-point responses

Referee: [Abstract] Abstract and initial-results presentation: the claim that DR 'reshapes the basin of attraction around contact-producing actions' rests on qualitative observations from a single simple task. No quantitative metrics, error bars, statistical tests, or explicit measurement of basin size/shape are described, which makes the strength and repeatability of the effect difficult to evaluate.

Authors: We agree that the current evidence for the reshaping effect is preliminary and relies on qualitative observations from the Push-T task, consistent with the manuscript's framing as an initial study. This limitation makes it difficult to fully assess repeatability without additional quantification. In the revised version, we will add quantitative metrics, including explicit measurements of basin size and shape obtained by systematically varying initial conditions around contact-producing actions, success rates aggregated over multiple random seeds with error bars, and basic statistical comparisons (e.g., t-tests) between average, optimistic, and pessimistic aggregation strategies. These additions will be incorporated into both the abstract and the results section while preserving the initial-study scope. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical observations from simulation experiments

full rationale

The paper presents an empirical comparative study on a Push-T task using sampling-based predictive control with domain randomization and different risk-aware aggregations. No mathematical derivations, equations, or parameter-fitting steps are claimed or present that could reduce to self-definition or fitted inputs. All reported effects on the cost landscape are direct outcomes of the described simulation runs, with the central claim explicitly qualified as 'initial results' that 'open up potential' for further work rather than asserting universality or parameter-free guarantees. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes, and the work is self-contained against external benchmarks via the reported experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical study on a simulation task and introduces no new mathematical axioms, free parameters, or invented entities; it relies on standard assumptions from robotics simulation and sampling-based control.

pith-pipeline@v0.9.0 · 5445 in / 1073 out tokens · 48593 ms · 2026-05-07T16:04:51.309234+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in2017 IEEE/RSJ international con- ference on intelligent robots and systems (IROS), pp. 23–30, IEEE, 2017

2017
[2]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033, IEEE, 2012

2012
[3]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano- Munoz, X. Yao, R. Zurbr ¨ugg, N. Rudin,et al., “Isaac lab: A gpu- accelerated simulation framework for multi-modal robot learning,” arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review arXiv 2025
[4]

Predictive sampling: Real-time behaviour synthesis with mujoco,

T. Howell, N. Gileadi, S. Tunyasuvunakool, K. Zakka, T. Erez, and Y . Tassa, “Predictive sampling: Real-time behaviour synthesis with mujoco,”arXiv preprint arXiv:2212.00541, 2022

work page arXiv 2022
[5]

Aggressive driving with model predictive path integral control,

G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Aggressive driving with model predictive path integral control,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1433–1440, 2016

2016
[6]

The cross-entropy method for combinatorial and continuous optimization,

R. Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,”Methodology and computing in applied probability, vol. 1, no. 2, pp. 127–190, 1999

1999
[7]

Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning,

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu,et al., “Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning,”arXiv preprint arXiv:2501.02116, 2025

work page arXiv 2025
[8]

Learning dexterous in-hand manipulation,

O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. Mc- Grew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray,et al., “Learning dexterous in-hand manipulation,”The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020

2020
[9]

Active domain randomization,

B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active domain randomization,” inConference on Robot Learning, pp. 1162–1176, PMLR, 2020

2020
[10]

Closing the sim-to-real loop: Adapting simula- tion randomization with real world experience,

Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simula- tion randomization with real world experience,” in2019 international conference on robotics and automation (ICRA), pp. 8973–8979, IEEE, 2019

2019
[11]

Robot learning from randomized simulations: A review,

F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot learning from randomized simulations: A review,”Frontiers in Robotics and AI, vol. 9, p. 799893, 2022

2022
[12]

Model Predictive Path Integral Control using Covariance Variable Importance Sampling

G. Williams, A. Aldrich, and E. Theodorou, “Model predictive path integral control using covariance variable importance sampling,”arXiv preprint arXiv:1509.01149, 2015

work page Pith review arXiv 2015
[13]

Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,

V . Kurtz, “Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,” 2024. https://github.com/vincekurtz/hydrax

2024
[14]

Constrained covariance steering based tube-mppi,

I. M. Balci, E. Bakolas, B. Vlahov, and E. A. Theodorou, “Constrained covariance steering based tube-mppi,” in2022 American Control Conference (ACC), pp. 4197–4202, IEEE, 2022

2022
[15]

Robust model predictive path integral control: Analysis and performance guarantees,

M. S. Gandhi, B. Vlahov, J. Gibson, G. Williams, and E. A. Theodorou, “Robust model predictive path integral control: Analysis and performance guarantees,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1423–1430, 2021

2021
[16]

Risk-aware model predictive path integral control using conditional value-at-risk,

J. Yin, Z. Zhang, and P. Tsiotras, “Risk-aware model predictive path integral control using conditional value-at-risk,” in2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pp. 7937– 7943, IEEE, 2023

2023
[17]

Shield model predictive path integral: A computationally efficient robust mpc method using control barrier functions,

J. Yin, C. Dawson, C. Fan, and P. Tsiotras, “Shield model predictive path integral: A computationally efficient robust mpc method using control barrier functions,”IEEE Robotics and Automation Letters, vol. 8, no. 11, pp. 7106–7113, 2023

2023
[18]

Parameter- robust mppi for safe online learning of unknown parameters,

M. Vahs, J. Choi, N. Schmid, J. Tumova, and C. Fan, “Parameter- robust mppi for safe online learning of unknown parameters,”arXiv preprint arXiv:2601.02948, 2026

work page arXiv 2026
[19]

Monte carlo motion plan- ning for robot trajectory optimization under uncertainty,

L. Janson, E. Schmerling, and M. Pavone, “Monte carlo motion plan- ning for robot trajectory optimization under uncertainty,” inRobotics Research: V olume 2, pp. 343–361, Springer, 2017

2017
[20]

Risk contours map for risk bounded motion planning under perception uncertainties.,

A. M. Jasour and B. C. Williams, “Risk contours map for risk bounded motion planning under perception uncertainties.,” inRobotics: Science and Systems, pp. 22–26, 2019

2019
[21]

Scenario-based motion planning with bounded probability of colli- sion,

O. De Groot, L. Ferranti, D. M. Gavrila, and J. Alonso-Mora, “Scenario-based motion planning with bounded probability of colli- sion,”The International Journal of Robotics Research, vol. 44, no. 9, pp. 1507–1525, 2025

2025
[22]

Risk-averse trajectory opti- mization via sample average approximation,

T. Lew, R. Bonalli, and M. Pavone, “Risk-averse trajectory opti- mization via sample average approximation,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1500–1507, 2023

2023
[23]

Bundled gradients through contact via randomized smoothing,

H. J. T. Suh, T. Pang, and R. Tedrake, “Bundled gradients through contact via randomized smoothing,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4000–4007, 2022

2022
[24]

Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models,

T. Pang, H. T. Suh, L. Yang, and R. Tedrake, “Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models,”IEEE Transactions on robotics, vol. 39, no. 6, pp. 4691–4711, 2023

2023
[25]

Dynamic On-Palm Manipulation via Con- trolled Sliding,

W. Yang and M. Posa, “Dynamic On-Palm Manipulation via Con- trolled Sliding,” inProceedings of Robotics: Science and Systems, (Delft, Netherlands), July 2024

2024
[26]

Generative predictive control: Flow matching policies for dynamic and difficult-to-demonstrate tasks,

V . Kurtz and J. W. Burdick, “Generative predictive control: Flow matching policies for dynamic and difficult-to-demonstrate tasks,” arXiv preprint arXiv:2502.13406, 2025

work page arXiv 2025
[27]

An introduction to zero-order optimization techniques for robotics,

A. Jordana, J. Zhang, J. Amigo, and L. Righetti, “An introduction to zero-order optimization techniques for robotics,”arXiv preprint arXiv:2506.22087, 2025

work page arXiv 2025