Soft RTC uses partially denoised states for overlap tokens and token-wise blending to reduce action delta and jerk by ~9% versus hard RTC while matching solve rates on Kinetix levels.
hub Canonical reference
Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation, June 2024
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.
Test-time sparsity with a parallel pipeline and omnidirectional feature reuse accelerates action diffusion by 5x to 47.5 Hz while cutting FLOPs 92% with no performance loss.
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
HDP3 is a pocket-scale 3D diffusion policy with a Diffusion Mixer decoder that achieves state-of-the-art visuomotor control using two-step DDIM inference and under 1% of the parameters of prior 3D diffusion policies.
MISTY delivers state-of-the-art closed-loop scores on nuPlan Test14-hard (80.32 non-reactive, 82.21 reactive) at 10.1 ms latency via single-step MLP-Mixer inference and a latent drifting loss that encourages proactive maneuvers.
Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld while achieving new state-of-the-art results.
Chronos elevates full observation history to the policy's latent state via selective SSM tokens and a Schrödinger-inspired acceleration bridge, achieving large gains on memory-dependent robot tasks with fewer parameters.
TISED decomposes inference optimization effects on embodied tasks and identifies paradoxical outcomes where faster per-step inference can increase task completion time on static tasks or raise success rates on dynamic tasks.
IDP generates one-step robot actions by adaptively weighting a scalar potential objective using conditional expert geometry derived from local variations of observation-similar expert actions, combined with expert-proximal terminal evaluation.
VADF adds an Adaptive Loss Network for hard-negative training sampling and a Hierarchical Vision Task Segmenter for adaptive noise scheduling during inference to speed convergence and reduce timeouts in diffusion robotic policies.
SnapFlow compresses multi-step denoising in flow-matching VLAs into one step via progressive self-distillation using two-step Euler shortcuts from marginal velocities, matching 10-step teacher success rates with 9.6x speedup on pi0.5.
SeedPolicy introduces self-evolving gated attention to extend the temporal horizon of diffusion policies, yielding 36.8% and 169% relative gains over standard DP on clean and randomized RoboTwin 2.0 tasks.
R2RGen introduces a simulator-free three-stage pipeline that parses, augments, and post-processes real pointcloud observation-action pairs to improve spatial generalization in robotic manipulation policies.
Real-time chunking (RTC) allows diffusion- and flow-based action chunking policies to execute smoothly and asynchronously, maintaining high success rates on dynamic tasks even with significant inference latency.
HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
RoamFlow applies MeanFlow to predict average velocity fields for one-step action policies in image-goal navigation, trained via expert imitation followed by RL refinement.
BORA combines offline RL critic training with online chunk-wise residual adaptation to raise average success rates of real-world dexterous VLA policies by 33% and up to 43% on unseen objects across five tasks.
FocalPolicy introduces frequency-optimized chunking and locally anchored flow matching with a foresight composite objective to reduce inter-chunk discontinuities in visuomotor policies.
Sparse ActionGen accelerates diffusion policies up to 4x for robot control via rollout-adaptive pruning and zig-zag activation reuse without performance loss.
OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
citing papers explorer
-
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
-
DSSP: Diffusion State Space Policy with Full-History Encoding
DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.
-
SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation
SeedPolicy introduces self-evolving gated attention to extend the temporal horizon of diffusion policies, yielding 36.8% and 169% relative gains over standard DP on clean and randomized RoboTwin 2.0 tasks.