The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

Bing-Cheng Chuang; Bor-Jiun Lin; Chun-Yi Lee; I-Hsuan Chu; Min Sun; YuanFu Yang

arxiv: 2606.01847 · v1 · pith:PHCQ6K63new · submitted 2026-06-01 · 💻 cs.RO · cs.LG

The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

Bing-Cheng Chuang , I-Hsuan Chu , Bor-Jiun Lin , YuanFu Yang , Min Sun , Chun-Yi Lee This is my paper

classification 💻 cs.RO cs.LG

keywords driftequivarianceeuclideanfallacymanifoldmethodpoliciesspace

0 comments

read the original abstract

Diffusion-based Vision-Language-Action policies achieve remarkable success in robotic manipulation, yet commit a fundamental geometric error we term the $\textbf{Euclidean Fallacy}$: representing SE(3) poses as flat $\mathbb{R}^{12}$ vectors. This approximation induces (1) manifold drift violating SO(3) constraints, (2) broken equivariance under coordinate transformations, and (3) non-geodesic trajectories with excessive kinematic cost. We introduce $\textbf{Lie Diffuser Actor (LDA)}$, a diffusion framework operating intrinsically on SE(3). Our method injects noise through left-invariant SDEs, predicts scores in the tangent space, and retracts samples via the exponential map. This formulation eliminates manifold drift by construction while guaranteeing coordinate-frame equivariance and geodesic optimality. On CALVIN ABC$\rightarrow$D, LDA improves average task length from $3.27$ to $3.51$ ($+7.3\%$). We further validate our method on real robot and the results show that our methodology outperforms the baseline on majority tasks.

This paper has not been read by Pith yet.

The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

discussion (0)