PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.
arXiv preprint arXiv:2511.01571 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
MotionVLA converts short past video windows into compact trajectory-field tokens to supply motion-consistent evidence for vision-language-action robot policies, improving long-horizon manipulation.
SPARC generates reliable spatial annotations for robot demonstrations by leveraging spatio-temporal task structure, outperforming detection baselines on localization accuracy while retaining more samples and enabling competitive model performance without manual annotations.
FocalPolicy introduces frequency-optimized chunking and locally anchored flow matching with a foresight composite objective to reduce inter-chunk discontinuities in visuomotor policies.