SCRIPT presents a scalable diffusion policy with JAST-DiT architecture, nonlinear history conditioning, and RLHR post-training that claims to outperform prior methods on text alignment, motion quality, and physical realism while scaling on a 1200-hour dataset.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.GR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
No citing papers match the current filters.