hub

Grounding image matching in 3d with mast3r

Vincent Leroy, Yohann Cabon, Jérôme Revaud · 2024 · arXiv 2406.09756

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Geo-Align: Video Generation Alignment via Metric Geometry Reward

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

Geo-Align applies RL with a perceptual reward derived from 3D camera trajectory estimation to improve controllability and fidelity in video generation without paired training data.

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.

EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control

cs.RO · 2026-05-21 · conditional · novelty 7.0

EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.

CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

CRePE supplies depth-aware positional distributions along curved rays for stable unified-camera control in frozen video DiT models.

WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.

LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

A new benchmark with real lunar stereo ground truth and analog data shows that sim-to-real fine-tuned monocular depth models achieve large in-domain gains but minimal generalization to actual lunar images.

Affostruction: 3D Affordance Grounding with Generative Reconstruction

cs.CV · 2026-01-14 · unverdicted · novelty 7.0

Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.

A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features

cs.CV · 2025-10-01 · unverdicted · novelty 7.0

FastForward represents scenes as collections of 3D-anchored image features and performs camera pose estimation via feed-forward correspondence prediction, achieving competitive accuracy with minimal mapping time.

SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

SADGE is a new fused similarity metric combining DINOv3 appearance and MASt3R geometry via constrained bilinear interaction that correlates with downstream synthetic-to-real performance at Pearson r=0.88 across multiple benchmarks.

SpaceMind++: Toward Allocentric Cognitive Maps for Spatially Grounded Video MLLMs

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

SpaceMind++ adds an explicit voxelized allocentric cognitive map and coordinate-guided fusion to video MLLMs, claiming SOTA on VSI-Bench and improved out-of-distribution generalization on three other 3D benchmarks.

FluSplat: Sparse-View 3D Editing without Test-Time Optimization

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.

VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction

cs.CV · 2026-04-11 · conditional · novelty 6.0

Reframing head pose estimation as relative pose prediction between image pairs enables a synthetic-only trained model to outperform absolute regression methods on real benchmarks.

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models

cs.CV · 2025-11-01 · unverdicted · novelty 6.0

A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.

Streaming 4D Visual Geometry Transformer

cs.CV · 2025-07-15 · unverdicted · novelty 6.0

A causal transformer with key-value caching and distillation from a bidirectional VGGT model enables efficient online 4D geometry reconstruction from videos.

RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos

cs.CV · 2024-12-04 · unverdicted · novelty 6.0

RoDyGS separates static and dynamic elements in monocular videos using Gaussian splatting with regularization and introduces the Kubric-MRig benchmark for pose-free dynamic novel view synthesis.

Bundle Adjustment in the Eager Mode

cs.RO · 2024-09-18 · unverdicted · novelty 6.0

Introduces an eager-mode PyTorch BA library with GPU-accelerated sparse ops claiming 18.5-23x speedups over GTSAM, g2o, and Ceres.

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

cs.CV · 2024-09-03 · unverdicted · novelty 6.0

ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

cs.CV · 2026-04-14 · unverdicted · novelty 5.0

PatchPoison injects 12x12 pixel checkerboard patches into multi-view images to disrupt SfM feature matching, causing 3DGS reconstructions to diverge with 6.8x higher LPIPS error on NeRF-Synthetic while remaining unobtrusive.

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

cs.CV · 2025-02-27 · conditional · novelty 5.0

UniDepthV2 predicts metric 3D points directly from single images using a self-promptable camera module, pseudo-spherical representation, and new losses for improved cross-domain generalization.

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

cs.CV · 2024-08-25 · unverdicted · novelty 5.0

Splatt3R is a feed-forward network that predicts 3D Gaussian splats directly from uncalibrated stereo image pairs by extending MASt3R with appearance attributes and a two-stage training procedure.

citing papers explorer

Showing 21 of 21 citing papers.

Geo-Align: Video Generation Alignment via Metric Geometry Reward cs.CV · 2026-05-22 · unverdicted · none · ref 40
Geo-Align applies RL with a perceptual reward derived from 3D camera trajectory estimation to improve controllability and fidelity in video generation without paired training data.
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction cs.CV · 2026-05-22 · unverdicted · none · ref 11
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos cs.CV · 2026-05-21 · unverdicted · none · ref 36
NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control cs.RO · 2026-05-21 · conditional · none · ref 16
EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 22
CRePE supplies depth-aware positional distributions along curved rays for stable unified-camera control in frozen video DiT models.
WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images cs.CV · 2026-04-23 · unverdicted · none · ref 29
WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.
LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation cs.CV · 2026-04-10 · unverdicted · none · ref 29
A new benchmark with real lunar stereo ground truth and analog data shows that sim-to-real fine-tuned monocular depth models achieve large in-domain gains but minimal generalization to actual lunar images.
Affostruction: 3D Affordance Grounding with Generative Reconstruction cs.CV · 2026-01-14 · unverdicted · none · ref 16
Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features cs.CV · 2025-10-01 · unverdicted · none · ref 10
FastForward represents scenes as collections of 3D-anchored image features and performs camera pose estimation via feed-forward correspondence prediction, achieving competitive accuracy with minimal mapping time.
SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data cs.CV · 2026-05-21 · unverdicted · none · ref 22
SADGE is a new fused similarity metric combining DINOv3 appearance and MASt3R geometry via constrained bilinear interaction that correlates with downstream synthetic-to-real performance at Pearson r=0.88 across multiple benchmarks.
SpaceMind++: Toward Allocentric Cognitive Maps for Spatially Grounded Video MLLMs cs.CV · 2026-05-10 · unverdicted · none · ref 28
SpaceMind++ adds an explicit voxelized allocentric cognitive map and coordinate-guided fusion to video MLLMs, claiming SOTA on VSI-Bench and improved out-of-distribution generalization on three other 3D benchmarks.
FluSplat: Sparse-View 3D Editing without Test-Time Optimization cs.CV · 2026-04-21 · unverdicted · none · ref 27
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction cs.CV · 2026-04-11 · conditional · none · ref 23
Reframing head pose estimation as relative pose prediction between image pairs enables a synthetic-only trained model to outperform absolute regression methods on real benchmarks.
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models cs.CV · 2025-11-01 · unverdicted · none · ref 30
A feed-forward video latent transformer that predicts time-varying 3D Gaussian primitives from one image to produce controllable 4D scenes with appearance, geometry, and motion.
Streaming 4D Visual Geometry Transformer cs.CV · 2025-07-15 · unverdicted · none · ref 6
A causal transformer with key-value caching and distillation from a bidirectional VGGT model enables efficient online 4D geometry reconstruction from videos.
RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos cs.CV · 2024-12-04 · unverdicted · none · ref 34
RoDyGS separates static and dynamic elements in monocular videos using Gaussian splatting with regularization and introduces the Kubric-MRig benchmark for pose-free dynamic novel view synthesis.
Bundle Adjustment in the Eager Mode cs.RO · 2024-09-18 · unverdicted · none · ref 58
Introduces an eager-mode PyTorch BA library with GPU-accelerated sparse ops claiming 18.5-23x speedups over GTSAM, g2o, and Ceres.
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis cs.CV · 2024-09-03 · unverdicted · none · ref 20
ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.
PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction cs.CV · 2026-04-14 · unverdicted · none · ref 9
PatchPoison injects 12x12 pixel checkerboard patches into multi-view images to disrupt SfM feature matching, causing 3DGS reconstructions to diverge with 6.8x higher LPIPS error on NeRF-Synthetic while remaining unobtrusive.
UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler cs.CV · 2025-02-27 · conditional · none · ref 51
UniDepthV2 predicts metric 3D points directly from single images using a self-promptable camera module, pseudo-spherical representation, and new losses for improved cross-domain generalization.
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs cs.CV · 2024-08-25 · unverdicted · none · ref 31
Splatt3R is a feed-forward network that predicts 3D Gaussian splats directly from uncalibrated stereo image pairs by extending MASt3R with appearance attributes and a two-stage training procedure.

Grounding image matching in 3d with mast3r

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer