OVOW reconstructs instance-level, simulation-ready 4D mesh scenes from monocular video via a four-stage training-free pipeline and introduces a new benchmark for structured Video-to-4D evaluation.
hub Canonical reference
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
Canonical reference. 76% of citing Pith papers cite this work as background.
abstract
Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 2-3 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. We host the results and videos at \url{https://sites.google.com/view/isaacgym-nvidia} and isaac gym can be downloaded at \url{https://developer.nvidia.com/isaac-gym}.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Dynamic isotropy, quantifying uniform center-of-mass acceleration capability, improves robot performance and enables omnidirectional locomotion, terrain traversal, and failure resilience in a spherical robot design.
PhysEditWorld is a new dataset of over 60 million frames from 12 UE5 cinematic scenes with synchronized multimodal signals and explicit gravity labels, built via replay to support physics-editable world models.
CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.
CoDi decomposes the multi-agent diffusion score into pre-trained single-agent policies plus a gradient-free cost guidance term to generate coordinated behavior from single-agent data alone.
A two-stage framework augments HOI data with dynamic priors and blends pre-trained dynamic motion and static interaction agents via a composer network to enable long-term dynamic human-object interactions with higher success rates and reduced training time.
HiPAN enables quadruped robots to navigate unstructured 3D environments more successfully by combining a high-level posture-adaptive policy with a low-level controller and curriculum learning on depth images.
HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.
Any-ttach shows that rapid end-effector swapping combined with demonstration collection and task planning enables reliable multi-tool skills in long-horizon tasks such as sandwich making.
RoboWits benchmark with 238 tasks shows pre-trained VLAs succeed on seed tasks but fail on mutated ones, highlighting brittleness in reasoning.
UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.
SCSP is a cascaded optimization framework using a surrogate contact model and discrete-continuous search to enable simultaneous contact selection and planning for robust contact-rich manipulation.
X-DiffVLA proposes a diffusion VLA model using Embodiment Forcing and Morphological Tree Diffusion to achieve SOTA cross-embodied performance on simulation benchmarks with 15.3% and 12.5% gains.
SCRIPT presents a scalable diffusion policy with JAST-DiT architecture, nonlinear history conditioning, and RLHR post-training that claims to outperform prior methods on text alignment, motion quality, and physical realism while scaling on a 1200-hour dataset.
Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.
ARC-RL is a new suite of four MuJoCo continuous-control environments featuring game-inspired hexapod and quadruped morphologies, a single closed-form multi-component reward function, CPG demonstrators, and empirical comparisons of online and offline-to-online RL algorithms.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
SECOND-Grasp integrates semantic contact proposals from vision-language reasoning with geometric refinement to achieve 98%+ lifting success and improved intent-aware grasping on seen and unseen objects.
NavOL collects expert trajectory labels online from a global planner during policy rollouts in simulation to train a diffusion navigation policy, mitigating distribution shift and improving performance on visual navigation tasks.
Explicit conditioning of a PPO policy on interpretable stair parameters (height, depth, yaw) yields improved generalization to unseen stairs and reliable real-world traversal on the Unitree G1, including 33 consecutive outdoor steps.
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
RigidFormer learns mesh-free rigid dynamics from point clouds using object-centric anchors, Anchor-Vertex Pooling, Anchor-based RoPE, and differentiable Kabsch alignment to enforce rigidity.
ANO derives a robust policy optimizer from geometric principles that replaces clipping with a smooth redescending gradient, showing better performance and stability than PPO, SPO, and GRPO in MuJoCo, Atari, and RLHF experiments.
GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for consistent environments.
citing papers explorer
-
One Video, One World: Turning Monocular Video into Physical 4D Scenes
OVOW reconstructs instance-level, simulation-ready 4D mesh scenes from monocular video via a four-stage training-free pipeline and introduces a new benchmark for structured Video-to-4D evaluation.
-
Extreme dynamic symmetry enables omnidirectional and multifunctional robots
Dynamic isotropy, quantifying uniform center-of-mass acceleration capability, improves robot performance and enables omnidirectional locomotion, terrain traversal, and failure resilience in a spherical robot design.
-
PhysEditWorld: A Large-Scale Dataset Toward Physics-Editable World Models
PhysEditWorld is a new dataset of over 60 million frames from 12 UE5 cinematic scenes with synchronized multimodal signals and explicit gravity labels, built via replay to support physics-editable world models.
-
Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation
CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.
-
Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations
CoDi decomposes the multi-agent diffusion score into pre-trained single-agent policies plus a gradient-free cost guidance term to generate coordinated behavior from single-agent data alone.
-
Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers
A two-stage framework augments HOI data with dynamic priors and blends pre-trained dynamic motion and static interaction agents via a composer network to enable long-term dynamic human-object interactions with higher success rates and reduced training time.
-
HiPAN: Hierarchical Posture-Adaptive Navigation for Quadruped Robots in Unstructured 3D Environments
HiPAN enables quadruped robots to navigate unstructured 3D environments more successfully by combining a high-level posture-adaptive policy with a low-level controller and curriculum learning on depth images.
-
HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness
HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.
-
Any-ttach: Quick End-effector Swapping Enables Manipulation Dexterity with Simplicity
Any-ttach shows that rapid end-effector swapping combined with demonstration collection and task planning enables reliable multi-tool skills in long-horizon tasks such as sandwich making.
-
RoboWits: Unexpected Challenges for Robotic Creative Problem Solving
RoboWits benchmark with 238 tasks shows pre-trained VLAs succeed on seed tasks but fail on mutated ones, highlighting brittleness in reasoning.
-
UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms
UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.
-
Simultaneous Contact Selection and Planning for Contact-Rich Manipulation with Cascaded Optimization
SCSP is a cascaded optimization framework using a surrogate contact model and discrete-continuous search to enable simultaneous contact selection and planning for robust contact-rich manipulation.
-
X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models
X-DiffVLA proposes a diffusion VLA model using Embodiment Forcing and Morphological Tree Diffusion to achieve SOTA cross-embodied performance on simulation benchmarks with 15.3% and 12.5% gains.
-
SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-based Humanoid Control
SCRIPT presents a scalable diffusion policy with JAST-DiT architecture, nonlinear history conditioning, and RLHR post-training that claims to outperform prior methods on text alignment, motion quality, and physical realism while scaling on a 1200-hour dataset.
-
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors
Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.
-
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
ARC-RL is a new suite of four MuJoCo continuous-control environments featuring game-inspired hexapod and quadruped morphologies, a single closed-form multi-component reward function, CPG demonstrators, and empirical comparisons of online and offline-to-online RL algorithms.
-
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
-
SECOND-Grasp: Semantic Contact-guided Dexterous Grasping
SECOND-Grasp integrates semantic contact proposals from vision-language reasoning with geometric refinement to achieve 98%+ lifting success and improved intent-aware grasping on seen and unseen objects.
-
NavOL: Navigation Policy with Online Imitation Learning
NavOL collects expert trajectory labels online from a global planner during policy rollouts in simulation to train a diffusion navigation policy, mitigating distribution shift and improving performance on visual navigation tasks.
-
Explicit Stair Geometry Conditioning for Robust Humanoid Locomotion
Explicit conditioning of a PPO policy on interpretable stair parameters (height, depth, yaw) yields improved generalization to unseen stairs and reliable real-world traversal on the Unitree G1, including 33 consecutive outdoor steps.
-
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
-
RigidFormer: Learning Rigid Dynamics using Transformers
RigidFormer learns mesh-free rigid dynamics from point clouds using object-centric anchors, Anchor-Vertex Pooling, Anchor-based RoPE, and differentiable Kabsch alignment to enforce rigidity.
-
ANO: A Principled Approach to Robust Policy Optimization
ANO derives a robust policy optimizer from geometric principles that replaces clipping with a smooth redescending gradient, showing better performance and stability than PPO, SPO, and GRPO in MuJoCo, Atari, and RLHF experiments.
-
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for consistent environments.
-
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model
A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.
-
ETac: A Lightweight and Efficient Tactile Simulation Framework for Learning Dexterous Manipulation
ETac is a data-driven tactile simulation framework that matches FEM deformation accuracy at high speed, supporting 4096 parallel environments at 869 FPS and yielding 84.45% success in blind grasping across four object types.
-
FLASH: Fast Learning via GPU-Accelerated Simulation for High-Fidelity Deformable Manipulation in Minutes
A new GPU-accelerated deformable simulation framework trains manipulation policies in minutes using only synthetic data, achieving robust zero-shot transfer to physical robots.
-
Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
CoUR uses LLMs for efficient RL reward design through uncertainty quantification and similarity selection, achieving better performance and lower evaluation costs on IsaacGym and Bidexterous Manipulation benchmarks.
-
Trajectory-based actuator identification via differentiable simulation
Differentiable simulation enables torque-sensor-free actuator model identification from trajectory data, achieving 1.88x better position tracking than a stand-trained baseline and 46% longer travel in downstream locomotion policies.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
-
Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?
Veo-3 video predictions enable approximate task-level robot trajectories in zero-shot settings but require hierarchical integration with low-level VLA policies for reliable manipulation performance.
-
Physically Accurate Rigid-Body Dynamics in Particle-Based Simulation
PBD-R adds a momentum-conservation constraint to position-based dynamics to deliver physically accurate rigid-body dynamics while remaining computationally lighter than MuJoCo.
-
One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation
A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.
-
Semantic-Contact Fields for Category-Level Generalizable Tactile Tool Manipulation
SCFields fuses semantics and contact data in a sim-to-real pipeline to enable category-level generalization for tactile tool manipulation with diffusion policies.
-
Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation
PAPL uses phase-conditioned FiLM layers in RL networks to create a unified policy for quadruped robots to ride skateboards by capturing phase-dependent behaviors while sharing knowledge across phases.
-
HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control
HUSKY combines humanoid-skateboard dynamics modeling with adversarial motion priors and physics-guided lean-to-steer strategies to achieve real-world stable skateboarding on a humanoid robot.
-
Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making
An adaptive RL-MPC framework uses RL to inform MPPI sampling and aggregates MPPI samples for value estimation, delivering up to 72% higher success rates and 2.1x faster convergence on tasks like race driving and Lunar Lander with obstacles.
-
Unify Robot Actions in Camera Frame
CalibAll estimates camera extrinsics on existing datasets to convert robot actions into a unified camera-frame representation, enabling stronger cross-embodiment pretraining.
-
Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning
A multi-stage RL curriculum produces a unified whole-body controller enabling humanoid robots to sustain badminton rallies in simulation and return shuttles at up to 19.1 m/s in real hardware, with both EKF-based and prediction-free variants.
-
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.
-
A Reconfigured Wheel-Legged Robot for Enhanced Steering and Adaptability
FLORES is a wheel-legged robot with front-leg hip-yaw DoFs replacing hip-roll, paired with a custom RL controller using adapted HIM and tailored rewards for smooth wheeled-to-legged transitions and efficient gaits.
-
AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation
AnyPos automates task-agnostic action collection and inverse-dynamics modeling with arm/end-effector decoupling plus a direction-aware decoder, delivering 51% higher test accuracy and 30-40% better success rates on bimanual tasks.
-
DreamPolicy: A Unified World-model Policy for Scalable Humanoid Locomotion
DreamPolicy integrates an autoregressive diffusion world model with policy learning to produce a single scalable policy that generalizes to unseen composite terrains for humanoid locomotion.
-
The Hive Mind is a Single Reinforcement Learning Agent
Bee hive mind from weighted voter imitation equals a single RL agent using a new multi-armed bandit rule called Maynard-Cross Learning.
-
Diffusion Policy Policy Optimization
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
-
Learning Locomotion on Discrete Terrain via Minimal Proximity Sensing
Foot-mounted infrared proximity sensors supply pre-contact signals that, when incorporated into RL, improve quadrupedal traversal of discrete terrain such as gaps and stepping stones.
-
SWIM: Single-Instance Whole-Body Imitation for swiMming
SWIM is a single-instance imitation method for learning and generalizing physically simulated swimming motions to new environments, bodies, and styles.
-
SSR: Scaling Surefooted and Symmetric Humanoid Traversal to the Open World
SSR is an end-to-end vision-based framework for humanoid traversal that learns imagined foothold guidance, equivariant latent-space symmetry augmentation, and terrain-specific multi-discriminator motion priors to enable safe locomotion on diverse real-world terrains.
-
SPRINT: Efficient Spectral Priors for Humanoid Athletic Sprints
SPRINT generates sprint trajectories for humanoids via spectral priors from five human motion sequences, achieving 6 m/s peak velocity with zero-shot sim-to-real transfer on Unitree G1.
-
MuJoCoUni:Persistent Batched Runtime Primitives for MuJoCo
MuJoCoUni introduces BatchEnvPool, a C++/pybind11-based executor providing persistent batched stateful MuJoCo environments with domain randomization, sensor queries, and short stepping.