pith. sign in

VGGT: Visual geometry grounded transformer

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.CV 4

years

2026 4

verdicts

UNVERDICTED 4

representative citing papers

Stitched Value Model for Diffusion Alignment

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

StitchVM stitches clean-image reward models with diffusion backbones to enable efficient value estimation for noisy latents, speeding up diffusion alignment methods like DPS by 3.2x and halving memory.

Vista4D: Video Reshooting with 4D Point Clouds

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Vista4D re-synthesizes dynamic videos from new viewpoints by grounding them in a 4D point cloud built with static segmentation and multiview training.

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

HorizonStream is a long-horizon Transformer that factorizes geometric evidence influence into channel-wise linear attention for long-range temporal propagation and local spatiotemporal attention for short-range matching, claiming stable generalization from 48-frame training to over 10,000-frame test

citing papers explorer

Showing 4 of 4 citing papers.

  • Stitched Value Model for Diffusion Alignment cs.CV · 2026-05-19 · unverdicted · none · ref 82

    StitchVM stitches clean-image reward models with diffusion backbones to enable efficient value estimation for noisy latents, speeding up diffusion alignment methods like DPS by 3.2x and halving memory.

  • Vista4D: Video Reshooting with 4D Point Clouds cs.CV · 2026-04-23 · unverdicted · none · ref 42

    Vista4D re-synthesizes dynamic videos from new viewpoints by grounding them in a 4D point cloud built with static segmentation and multiview training.

  • HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction cs.CV · 2026-05-22 · unverdicted · none · ref 43

    HorizonStream is a long-horizon Transformer that factorizes geometric evidence influence into channel-wise linear attention for long-range temporal propagation and local spatiotemporal attention for short-range matching, claiming stable generalization from 48-frame training to over 10,000-frame test

  • VGGT-Occ: Geometry-Grounded and Density-Aware Gated Fusion for 3D Occupancy Prediction cs.CV · 2026-05-16 · unverdicted · none · ref 36

    VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.