Progress-enhanced VLA model raises simulated bimanual furniture assembly success from 48% to 80% across three furniture types and shows 16% drop on real Kinova robot.
Seqvla: Sequential task execution for long-horizon manipulation with completion-aware vision- language-action model
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.RO 5years
2026 5verdicts
UNVERDICTED 5representative citing papers
SADP trains diffusion policies on foundation-model-generated subgoal-annotated demonstrations and adds a completion predictor to give robots built-in, subgoal-level explainability alongside improved task performance.
ROG-Grasp estimates produce orientation from root surface geometry via YOLO detection and point cloud plane fitting to generate stable grasp poses and constrained motion plans, achieving higher reliability and speed than VLA policies in tomato and onion experiments.
VILAS integrates low-cost modular hardware with a kirigami soft gripper and evaluates fine-tuned pi_0, pi_0.5, and GR00T N1.6 models on grape grasping using a ZMQ-based teleoperation and deployment framework.
Threading optimization of RTAC for VLA models reduces end-to-end latency and improves stability on low-cost agricultural robotic arms without changing the policy.
citing papers explorer
-
FurnitureVLA: Learning Long-Horizon Bimanual Furniture Assembly with Vision-Language-Action Model
Progress-enhanced VLA model raises simulated bimanual furniture assembly success from 48% to 80% across three furniture types and shows 16% drop on real Kinova robot.
-
SADP: Subgoal-Aware Diffusion Policy for Explainable Robots Learned from Foundation Model Generated Demonstrations
SADP trains diffusion policies on foundation-model-generated subgoal-annotated demonstrations and adds a completion predictor to give robots built-in, subgoal-level explainability alongside improved task performance.
-
ROG-Grasp: Root-Oriented Geometry for Robotic Grasping and Placement
ROG-Grasp estimates produce orientation from root surface geometry via YOLO detection and point cloud plane fitting to generate stable grasp poses and constrained motion plans, achieving higher reliability and speed than VLA policies in tomato and onion experiments.
-
VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation
VILAS integrates low-cost modular hardware with a kirigami soft gripper and evaluates fine-tuned pi_0, pi_0.5, and GR00T N1.6 models on grape grasping using a ZMQ-based teleoperation and deployment framework.
-
Threading Optimization for Vision-Language-Action Model Inference in Low-Cost Smart Agricultural Manipulation
Threading optimization of RTAC for VLA models reduces end-to-end latency and improves stability on low-cost agricultural robotic arms without changing the policy.