Backward token warping in ViT-based MLLMs enables reliable reasoning from nearby viewpoints by preserving semantic coherence better than pixel-wise warping or fine-tuning baselines.
Mental rotation of three-dimensional objects.Science, 171(3972):701–703
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
citing papers explorer
-
Token Warping Helps MLLMs Look from Nearby Viewpoints
Backward token warping in ViT-based MLLMs enables reliable reasoning from nearby viewpoints by preserving semantic coherence better than pixel-wise warping or fine-tuning baselines.
-
C3G: Learning Compact 3D Representations with 2K Gaussians
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.