arXiv preprint arXiv:2511.01571 , year=

· 2026 · arXiv 2511.01571

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

cs.CV · 2026-02-06 · unverdicted · novelty 7.0

PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.

MotionVLA: Injecting Geometric Motion into Vision-Language-Action Model

cs.RO · 2026-06-06 · unverdicted · novelty 6.0

MotionVLA converts short past video windows into compact trajectory-field tokens to supply motion-consistent evidence for vision-language-action robot policies, improving long-horizon manipulation.

SPARC: Reliable Spatial Annotations from Robot Demonstrations at Scale

cs.RO · 2026-06-11 · unverdicted · novelty 5.0

SPARC generates reliable spatial annotations for robot demonstrations by leveraging spatio-temporal task structure, outperforming detection baselines on localization accuracy while retaining more samples and enabling competitive model performance without manual annotations.

FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy

cs.RO · 2026-05-15 · unverdicted · novelty 5.0 · 2 refs

FocalPolicy introduces frequency-optimized chunking and locally anchored flow matching with a foresight composite objective to reduce inter-chunk discontinuities in visuomotor policies.

AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement

cs.RO · 2026-04-12

citing papers explorer

Showing 4 of 4 citing papers after filters.

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks cs.CV · 2026-02-06 · unverdicted · none · ref 27
PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.
MotionVLA: Injecting Geometric Motion into Vision-Language-Action Model cs.RO · 2026-06-06 · unverdicted · none · ref 47
MotionVLA converts short past video windows into compact trajectory-field tokens to supply motion-consistent evidence for vision-language-action robot policies, improving long-horizon manipulation.
SPARC: Reliable Spatial Annotations from Robot Demonstrations at Scale cs.RO · 2026-06-11 · unverdicted · none · ref 4
SPARC generates reliable spatial annotations for robot demonstrations by leveraging spatio-temporal task structure, outperforming detection baselines on localization accuracy while retaining more samples and enabling competitive model performance without manual annotations.
FocalPolicy: Frequency-Optimized Chunking and Locally Anchored Flow Matching for Coherent Visuomotor Policy cs.RO · 2026-05-15 · unverdicted · none · ref 51 · 2 links
FocalPolicy introduces frequency-optimized chunking and locally anchored flow matching with a foresight composite objective to reduce inter-chunk discontinuities in visuomotor policies.

arXiv preprint arXiv:2511.01571 , year=

fields

years

verdicts

representative citing papers

citing papers explorer