Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Chuanzhang Meng; Dongjie Huo; Feng Xiong; Haoyun Liu; Jianzhuang Zhao; Jiayuan Tan; Mu Xu; Sheng Zhong; SongLin Dong; Tianle Shi

arxiv: 2603.01766 · v2 · pith:WGWDT7WTnew · submitted 2026-03-02 · 💻 cs.RO

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Haoyun Liu , Jianzhuang Zhao , Xinyuan Chang , Tianle Shi , Chuanzhang Meng , Jiayuan Tan , Feng Xiong , Tong Lin

show 6 more authors

Dongjie Huo Mu Xu SongLin Dong Zhiheng Ma Yihong Gong Sheng Zhong

This is my paper

classification 💻 cs.RO

keywords actioncontroldiscretemodelsniafwaypointscontinuousfields

0 comments

read the original abstract

Despite the rapid progress of vision-language-action (VLA) models, the prevailing practice of predicting action chunks as discrete waypoints remains structurally misaligned with the intrinsic continuity of physical motion. This discretization arises naturally from fixed-rate robot data collection and the token-by-token prediction paradigm of large language models, but ties actions to rigid sampling rates, does not naturally support analytically consistent higher-order derivatives, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implicit Action Fields (NIAF), which reformulates chunk-level action representation from discrete waypoints to continuous action functions. Using a vision-language model as a hierarchical spectral modulator over a learnable motion prior, NIAF synthesizes continuous-time action manifolds with arbitrary temporal resolution. This formulation enables analytical differentiation, allowing explicit supervision of velocity and regularization of higher-order derivative signals to promote mathematical consistency, physical plausibility, and control smoothness. Our approach achieves strong results on CALVIN and LIBERO across diverse backbones. Real-world experiments further confirm that NIAF supports stable impedance control, bridging policy-side action generation and execution-side smooth control.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models
cs.RO 2026-04 unverdicted novelty 6.0

Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.