hub Baseline reference

Scannet: Richly-annotated 3d reconstructions of indoor scenes

Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner · 2017

Baseline reference. 80% of citing Pith papers use this work as a benchmark or comparison.

11 Pith papers citing it

Baseline 80% of classified citations

browse 11 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

dataset 4 background 1

citation-polarity summary

use dataset 3 background 1 baseline 1

representative citing papers

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.

TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

TurboVGGT uses adaptive sparse global attention with varying sparsity levels across frames and layers plus frame attention to enable faster multi-view 3D reconstruction while keeping competitive quality versus prior state-of-the-art methods.

RelFlexformer: Efficient Attention 3D-Transformers for Integrable Relative Positional Encodings

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

RelFlexformers enable flexible integrable 3D RPE in attention via NU-FFT, generalizing prior methods to heterogeneous token positions with O(L log L) complexity.

ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

ViSRA boosts MLLM 3D spatial reasoning performance by up to 28.9% on unseen tasks via a plug-and-play video-based agent that extracts explicit spatial cues from expert models without any post-training.

Rethinking the State Update Gate for Long-Sequence Recurrent 3D Reconstruction

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

A closed-form scalar frame-level gate α_t derived from internal feature changes extends effective memory in recurrent 3D reconstruction and improves accuracy on long sequences up to 4541 frames.

Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Invaria trains point cloud encoders with next-resolution prediction to learn scale and density invariant features, yielding higher mIoU on ScanNet under lower resolution and scaled objects while using a smaller model.

SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.

ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.

OpenGaFF: Open-Vocabulary Gaussian Feature Field with Codebook Attention

cs.CV · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

OpenGaFF adds a geometry-conditioned Gaussian Feature Field and codebook-guided attention to 3D Gaussian Splatting for spatially consistent open-vocabulary 3D semantic understanding.

StreamCacheVGGT: Streaming Visual Geometry Transformers with Robust Scoring and Hybrid Cache Compression

cs.CV · 2026-04-16 · unverdicted · novelty 5.0

StreamCacheVGGT improves streaming 3D geometry reconstruction accuracy and stability under fixed memory by using cross-layer token importance scoring and hybrid cache compression instead of pure eviction.

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

cs.CV · 2026-04-15 · unverdicted · novelty 4.0

HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.

citing papers explorer

Showing 11 of 11 citing papers.

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis cs.CV · 2026-05-21 · unverdicted · none · ref 13
VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.
TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention cs.CV · 2026-05-14 · unverdicted · none · ref 7
TurboVGGT uses adaptive sparse global attention with varying sparsity levels across frames and layers plus frame attention to enable faster multi-view 3D reconstruction while keeping competitive quality versus prior state-of-the-art methods.
RelFlexformer: Efficient Attention 3D-Transformers for Integrable Relative Positional Encodings cs.LG · 2026-05-11 · unverdicted · none · ref 15
RelFlexformers enable flexible integrable 3D RPE in attention via NU-FFT, generalizing prior methods to heterogeneous token positions with O(L log L) complexity.
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models cs.CV · 2026-05-11 · unverdicted · none · ref 9
ViSRA boosts MLLM 3D spatial reasoning performance by up to 28.9% on unseen tasks via a plug-and-play video-based agent that extracts explicit spatial cues from expert models without any post-training.
Rethinking the State Update Gate for Long-Sequence Recurrent 3D Reconstruction cs.CV · 2026-05-16 · unverdicted · none · ref 6
A closed-form scalar frame-level gate α_t derived from internal feature changes extends effective memory in recurrent 3D reconstruction and improves accuracy on long sequences up to 4541 frames.
Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction cs.CV · 2026-05-15 · unverdicted · none · ref 41
Invaria trains point cloud encoders with next-resolution prediction to learn scale and density invariant features, yielding higher mIoU on ScanNet under lower resolution and scaled objects while using a smaller model.
SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images cs.CV · 2026-05-12 · unverdicted · none · ref 17
SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation cs.CV · 2026-05-08 · unverdicted · none · ref 31
ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
OpenGaFF: Open-Vocabulary Gaussian Feature Field with Codebook Attention cs.CV · 2026-05-07 · unverdicted · none · ref 33 · 2 links
OpenGaFF adds a geometry-conditioned Gaussian Feature Field and codebook-guided attention to 3D Gaussian Splatting for spatially consistent open-vocabulary 3D semantic understanding.
StreamCacheVGGT: Streaming Visual Geometry Transformers with Robust Scoring and Hybrid Cache Compression cs.CV · 2026-04-16 · unverdicted · none · ref 45
StreamCacheVGGT improves streaming 3D geometry reconstruction accuracy and stability under fixed memory by using cross-layer token importance scoring and hybrid cache compression instead of pure eviction.
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds cs.CV · 2026-04-15 · unverdicted · none · ref 12
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.

Scannet: Richly-annotated 3d reconstructions of indoor scenes

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer