PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.
arXiv preprint arXiv:2511.01571 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
MotionVLA converts short past video windows into compact trajectory-field tokens to supply motion-consistent evidence for vision-language-action robot policies, improving long-horizon manipulation.
SPARC generates reliable spatial annotations for robot demonstrations by leveraging spatio-temporal task structure, outperforming detection baselines on localization accuracy while retaining more samples and enabling competitive model performance without manual annotations.
FocalPolicy introduces frequency-optimized chunking and locally anchored flow matching with a foresight composite objective to reduce inter-chunk discontinuities in visuomotor policies.
citing papers explorer
-
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks
PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.
-
MotionVLA: Injecting Geometric Motion into Vision-Language-Action Model
MotionVLA converts short past video windows into compact trajectory-field tokens to supply motion-consistent evidence for vision-language-action robot policies, improving long-horizon manipulation.
-
SPARC: Reliable Spatial Annotations from Robot Demonstrations at Scale
SPARC generates reliable spatial annotations for robot demonstrations by leveraging spatio-temporal task structure, outperforming detection baselines on localization accuracy while retaining more samples and enabling competitive model performance without manual annotations.
-
FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy
FocalPolicy introduces frequency-optimized chunking and locally anchored flow matching with a foresight composite objective to reduce inter-chunk discontinuities in visuomotor policies.