Efficient 3D Content Reconstruction and Generation

Jiahao Li

arxiv: 2605.18052 · v1 · pith:YAIQGPHXnew · submitted 2026-05-18 · 💻 cs.CV

Efficient 3D Content Reconstruction and Generation

Jiahao Li This is my paper

Pith reviewed 2026-05-20 11:35 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D generation3D reconstructionmulti-view diffusionstructure-from-motionfeed-forward reconstructionGPU optimizationpose estimation

0 comments

The pith

Instant3D generates high-quality 3D assets from text or images in 5-20 seconds while FastMap speeds up structure-from-motion by up to 10 times with comparable accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to replace slow manual 3D modeling and scanning with automated systems that synthesize or recover assets directly from text or images. Instant3D pairs multi-view diffusion with feed-forward sparse-view reconstruction to deliver usable 3D models in seconds. FastMap replaces earlier iterative solvers in structure-from-motion with first-order optimization and fused GPU kernels to cut runtime by an order of magnitude. A reader would care because these changes make rapid asset creation practical for games, virtual reality, robotics, and the collection of training data for larger 3D models.

Core claim

The thesis advances automatic 3D content creation by presenting Instant3D, which combines multi-view diffusion with feed-forward sparse-view 3D reconstruction to produce high-quality assets in 5-20 seconds, and FastMap, a structure-from-motion pipeline that achieves up to 10x speedup over prior state-of-the-art by using first-order optimization with fused GPU kernels extensively while maintaining comparable pose accuracy and downstream novel view synthesis quality.

What carries the argument

Instant3D's combination of multi-view diffusion followed by feed-forward reconstruction, and FastMap's replacement of conventional solvers with first-order optimization plus fused GPU kernels inside the structure-from-motion pipeline.

If this is right

Text or single-image inputs become sufficient to produce usable 3D assets without lengthy manual modeling.
Large image collections can be processed for camera poses and geometry at speeds that support real-time or near-real-time workflows.
Novel view synthesis quality stays comparable to slower pipelines, so downstream rendering and simulation tasks remain reliable.
Rapid iteration on 3D content becomes feasible for applications such as video-game prototyping and virtual-world construction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the two components are chained, a single short pipeline could turn text prompts into editable 3D scenes with minimal additional compute.
The GPU-kernel approach used for pose optimization may transfer to other iterative refinement steps common in multi-view geometry.
Wider adoption could lower the cost of curating large-scale 3D training sets for foundation models.

Load-bearing premise

The premise that first-order optimization with fused GPU kernels can deliver the claimed speedup in pose estimation without introducing accuracy losses that would degrade later novel view synthesis.

What would settle it

A side-by-side run of FastMap against a prior structure-from-motion method on a standard multi-view benchmark that shows either pose error rising or novel view synthesis quality dropping at the reported 10x speedup.

Figures

Figures reproduced from arXiv: 2605.18052 by Jiahao Li.

**Figure 3.1.** Figure 3.1: Our method generates high-quality 3D NeRF assets from the given text prompts [PITH_FULL_IMAGE:figures/full_fig_p031_3_1.png] view at source ↗

**Figure 3.2.** Figure 3.2: Overview of our method. Given a text prompt (‘a car made out of sushi’), [PITH_FULL_IMAGE:figures/full_fig_p032_3_2.png] view at source ↗

**Figure 3.3.** Figure 3.3: Architecture of our sparse-view reconstructor. The model applies a pretrained [PITH_FULL_IMAGE:figures/full_fig_p034_3_3.png] view at source ↗

**Figure 3.4.** Figure 3.4: Qualitative comparisons on text-to-3D against previous methods. [PITH_FULL_IMAGE:figures/full_fig_p036_3_4.png] view at source ↗

**Figure 3.5.** Figure 3.5: Qualitative comparisons on results generated with and without Gaussian blob [PITH_FULL_IMAGE:figures/full_fig_p039_3_5.png] view at source ↗

**Figure 3.6.** Figure 3.6: Our Carve3D algorithm steadily improves the 3D consistency of a multi-view [PITH_FULL_IMAGE:figures/full_fig_p044_3_6.png] view at source ↗

**Figure 3.7.** Figure 3.7: Overview of Carve3D. Given a prompt sampled from our curated prompt set [PITH_FULL_IMAGE:figures/full_fig_p045_3_7.png] view at source ↗

**Figure 3.8.** Figure 3.8: Qualitative correlation between MRC and multi-view inconsistency with in [PITH_FULL_IMAGE:figures/full_fig_p045_3_8.png] view at source ↗

**Figure 3.9.** Figure 3.9: Quantitative correlation between MRC and multi-view inconsistency with in [PITH_FULL_IMAGE:figures/full_fig_p046_3_9.png] view at source ↗

**Figure 3.10.** Figure 3.10: Comparing the IS and the SF versions of Carve3D reward curves on the testing [PITH_FULL_IMAGE:figures/full_fig_p050_3_10.png] view at source ↗

**Figure 3.11.** Figure 3.11: We observe qualitative correlation between the KL value and the prompt [PITH_FULL_IMAGE:figures/full_fig_p051_3_11.png] view at source ↗

**Figure 3.12.** Figure 3.12: Scaling law for Carve3D’s diffusion model RLFT algorithm. When we scale [PITH_FULL_IMAGE:figures/full_fig_p054_3_12.png] view at source ↗

**Figure 3.13.** Figure 3.13: We conducted a user study with 20 randomly selected testing prompts and the [PITH_FULL_IMAGE:figures/full_fig_p056_3_13.png] view at source ↗

**Figure 3.14.** Figure 3.14: Top left: our approach achieves fast 3D generation from text or single-image [PITH_FULL_IMAGE:figures/full_fig_p061_3_14.png] view at source ↗

**Figure 3.15.** Figure 3.15: Overview of our method. We denoise multiple views (three shown in the figure; four used in experiments) for 3D generation. Our multi-view denoiser is a large transformer model that reconstructs a noise-free triplane NeRF from input noisy images with camera poses (parameterized by Plucker rays). During training, we supervise the triplane NeRF with a rendering loss at input and novel viewpoints. During in… view at source ↗

**Figure 3.16.** Figure 3.16: Qualitative comparisons on single-image reconstruction. Method VIT-B/32 ViT-L/14 R-Prec AP R-Prec AP Point-E 33.33 40.06 46.4 54.13 Shap-E 38.39 46.02 51.40 58.03 Ours 39.72 47.96 55.14 61.32 [PITH_FULL_IMAGE:figures/full_fig_p068_3_16.png] view at source ↗

**Figure 3.17.** Figure 3.17: Qualitative comparison on Text-to-3D . with different numbers (1, 2, 4, 6) of input views in Tab. 3.7. We can see that our model consistently achieves better quality when using more images, benefiting from capturing more shape and appearance information. However, the performance improvement of 6 views over four views is marginal, where some metrics (like PSNR) from the 4-view model is even better. We th… view at source ↗

**Figure 3.18.** Figure 3.18: Robustness on out-of-domain inputs of synthetic, real, and generated images. a large margin. In general, the novel view supervision is crucial for our model to achieve meaningful 3D generation, avoiding the model to learn a local minima that merely recovers the sparse multi-view images. In addition, our design of Plucker coordinate-based camera conditioning is also effective, leading to better quantitat… view at source ↗

**Figure 4.1.** Figure 4.1: Timing of FastMap compared to COLMAP and GLOMAP (all with GPU acceleration on a single A6000) on scenes from eight datasets, excluding the matching stage for all methods. Note the logarithmic time scale. Lines represent a least squares power function fit to timing across multiple datasets, as a function of the number of matched keypoint pairs. have been adopted to speed up LM optimization, such as the Sc… view at source ↗

**Figure 4.2.** Figure 4.2: An overview of FastMap. Input images are processed using feature extraction and matching. Given the matching results, FastMap estimates the intrinsics and extrinsics of the cameras. Finally a sparse point cloud is generated by triangulation. Matching FastMap’s matching stage is identical to that of both COLMAP and GLOMAP: it involves first extracting and matching keypoints from the input images, followed… view at source ↗

**Figure 4.3.** Figure 4.3: Effect of distortion on focal length estimation. Curves in [PITH_FULL_IMAGE:figures/full_fig_p078_4_3.png] view at source ↗

read the original abstract

Automatic 3D content creation seeks to replace labor-intensive modeling and scanning pipelines with systems that can synthesize or recover 3D assets directly from text or images. Its applications span video games, virtual reality, robotics, and simulation, enabling rapid asset prototyping, diverse interactive world generation, and efficient 3D data collection for training foundation models. Contemporary solutions largely follow two complementary paradigms: (i) text- or image-to-3D generation, which learns priors over 3D geometry and appearance to create novel assets from natural language or a single view image; and (ii) 3D reconstruction, which estimates camera poses and geometry from RGB images. This thesis advances both directions. On the generation side, I introduce Instant3D, which combines multi-view diffusion with feed-forward sparse-view 3D reconstruction to produce high-quality assets in 5-20 seconds. On the reconstruction side, I develop FastMap, a structure-from-motion pipeline that achieves up to 10x speedup over prior state-of-the-art by using first-order optimization with fused GPU kernels extensively, while maintaining comparable pose accuracy and downstream novel view synthesis quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Instant3D and FastMap deliver claimed speed gains in 3D generation and SfM but the FastMap accuracy numbers rest on thin evidence.

read the letter

Hi, the main things to know are that this thesis puts forward Instant3D for turning text or single images into 3D assets in 5-20 seconds via multi-view diffusion plus feed-forward reconstruction, and FastMap for cutting structure-from-motion runtime by up to 10x through first-order optimization and fused GPU kernels while claiming to hold pose accuracy and novel-view quality steady. Both target real bottlenecks in games, VR, and robotics pipelines. The generation side looks like a sensible stacking of existing diffusion and reconstruction tricks to hit practical latency targets. The reconstruction side correctly identifies that standard second-order solvers are slow on GPU and tries to replace them with cheaper first-order steps plus kernel fusion. That engineering focus is the paper's clearest strength. The soft spot is the FastMap accuracy claim. First-order methods are known to be sensitive to step-size choice, initialization, and scene conditioning, and fused kernels can add rounding issues over many views. The abstract states comparable rotation/translation errors and downstream NVS quality, yet supplies no iteration counts, convergence plots, or controlled comparisons against Levenberg-Marquardt baselines on hard scenes. Without those tables it is hard to judge whether the 10x wall-clock win is free or whether accuracy quietly degrades on some inputs. The work is aimed at applied CV and graphics groups that need faster asset tools rather than new theory. A reader building production pipelines could pull useful implementation patterns if the numbers hold. It is coherent enough and the claims are falsifiable, so it deserves a serious referee who can ask for the missing ablation data on the optimizer.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces Instant3D, which combines multi-view diffusion with feed-forward sparse-view 3D reconstruction to generate high-quality 3D assets from text or images in 5-20 seconds, and FastMap, a structure-from-motion pipeline that uses first-order optimization with fused GPU kernels to achieve up to 10x speedup over prior state-of-the-art while maintaining comparable pose accuracy and downstream novel view synthesis quality.

Significance. If the empirical claims hold, the work would advance efficient 3D content creation by reducing generation and reconstruction times substantially, with direct relevance to VR, robotics, and large-scale 3D data pipelines. The practical focus on GPU-accelerated first-order methods for SfM represents a useful engineering contribution if accuracy preservation is rigorously demonstrated.

major comments (1)

The FastMap description claims that first-order optimization with fused GPU kernels preserves rotation/translation errors and NVS quality at ~10x speedup relative to second-order baselines (e.g., Levenberg-Marquardt or bundle adjustment). No iteration counts, convergence curves, per-scene error tables, or ablation on step-size/conditioning are referenced, leaving open whether accuracy holds after controlling for initialization sensitivity and kernel artifacts.

minor comments (1)

The abstract and introduction would benefit from explicit dataset names and scene counts used for the 5-20s generation and 10x speedup claims to allow direct comparison with prior work.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment on FastMap below and will incorporate the suggested clarifications and additional results in the revised version.

read point-by-point responses

Referee: The FastMap description claims that first-order optimization with fused GPU kernels preserves rotation/translation errors and NVS quality at ~10x speedup relative to second-order baselines (e.g., Levenberg-Marquardt or bundle adjustment). No iteration counts, convergence curves, per-scene error tables, or ablation on step-size/conditioning are referenced, leaving open whether accuracy holds after controlling for initialization sensitivity and kernel artifacts.

Authors: We agree that the current presentation would benefit from more explicit empirical details. In the revised manuscript we will add iteration counts for our first-order method versus the second-order baselines, convergence curves across representative scenes, per-scene rotation/translation error tables, and ablations on step-size and conditioning. These will appear in Section 4.2. We already control for initialization by using identical COLMAP-derived starting poses for all compared methods and report mean and standard deviation over the full test set; we will make this protocol explicit. Numerical equivalence of the fused kernels to their unfused counterparts (within FP32 precision) is verified in the supplementary material, which we will reference in the main text. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; empirical claims rest on proposed pipelines without self-referential definitions or fitted predictions

full rationale

The abstract and description introduce Instant3D (multi-view diffusion + feed-forward reconstruction) and FastMap (first-order optimization + fused kernels) as new combinations for 3D tasks. No equations, fitted parameters, or self-citations appear in the provided text that would define a result in terms of itself or rename an input fit as a prediction. Speedup and accuracy claims are stated as outcomes of the methods rather than derived by construction. This matches the default non-finding for a methods paper whose central content is independent of any visible self-referential math.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The work appears to rely on standard assumptions from diffusion models and structure-from-motion literature.

pith-pipeline@v0.9.0 · 5717 in / 1130 out tokens · 36970 ms · 2026-05-20T11:35:21.505016+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FastMap ... using first-order optimization with fused GPU kernels extensively, while maintaining comparable pose accuracy
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Instant3D ... transformer-based sparse-view 3D reconstructor ... triplane representation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

294 extracted references · 294 canonical work pages · 28 internal anchors

[1]

Learning representations and generative models for 3d point clouds

Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3d point clouds. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learn- ing, volume 80 ofProceedings of Machine Learning Research, pages 40–49. PMLR, 10–15 Jul 2018. URLhttps://...

work page 2018
[2]

Adobe firefly.https://www.adobe.com/sensei/generative-ai/firefly

Adobe. Adobe firefly.https://www.adobe.com/sensei/generative-ai/firefly. html, 2023. Accessed: 2023-11-15

work page 2023
[3]

Building rome in a day

Sameer Agarwal, Noah Snavely, Ian Simon, Steven M Seitz, and Richard Szeliski. Building rome in a day. In2009 IEEE 12th International Conference on Computer Vision, pages 72–79, September 2009. doi: 10.1109/ICCV.2009.5459148. URLhttp: //dx.doi.org/10.1109/ICCV.2009.5459148

work page doi:10.1109/iccv.2009.5459148 2009
[4]

Seitz, and Richard Szeliski

Sameer Agarwal, Noah Snavely, Steven M. Seitz, and Richard Szeliski. Bundle adjust- ment in the large. InComputer Vision – ECCV 2010, pages 29–42, 2010. doi: 10.1007/ 978-3-642-15552-9 3. URLhttp://dx.doi.org/10.1007/978-3-642-15552-9_3

work page doi:10.1007/978-3-642-15552-9_3 2010
[5]

Seitz, and Richard Szeliski

Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski. Building rome in a day.Communications of the ACM, 54(10):105–112, October 2011. doi: 10.1145/2001269.2001293. URL http://dx.doi.org/10.1145/2001269.2001293

work page doi:10.1145/2001269.2001293 2011
[6]

Ceres Solver

Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver. Software, 10

work page
[7]

URLhttps://github.com/ceres-solver/ceres-solver. 88

work page
[8]

Mixvpr: Feature mixing for visual place recognition

Amar Ali-bey, Brahim Chaib-draa, and Philippe Gigu` ere. Mixvpr: Feature mixing for visual place recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2998–3007, January 2023

work page 2023
[9]

Renderdiffusion: Image diffusion for 3d recon- struction, inpainting and generation

Titas Anciukeviˇ cius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J Mitra, and Paul Guerrero. Renderdiffusion: Image diffusion for 3d recon- struction, inpainting and generation. InCVPR, 2023

work page 2023
[10]

Netvlad: Cnn architecture for weakly supervised place recognition

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

work page 2016
[11]

Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova Das- Sarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson...

work page 2022
[12]

Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page 2022
[13]

A minimal solution for two-view focal-length estimation using two affine correspondences

Daniel Barath, Tekla Toth, and Levente Hajder. A minimal solution for two-view focal-length estimation using two affine correspondences. InCVPR, 2017

work page 2017
[14]

Fundamental matrix for cameras with radial distortion

Jo˜ ao Pedro Barreto and Kostas Daniilidis. Fundamental matrix for cameras with radial distortion. InICCV, 2005

work page 2005
[15]

Masked feature prediction for self-supervised visual pre-training

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hed- man. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 5460–5469, 2022. doi: 10.1109/CVPR52688.2022.00539. URLhttps://doi.org/10.1109/CVPR52688. 2022.00539

work page doi:10.1109/cvpr52688.2022.00539 2022
[16]

2023 , url =

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased grid-based neural radiance fields. InIEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 19640–19648, 2023. doi: 10.1109/ICCV51070.2023.01804. URLhttps://doi.org/10.1109/ICCV51070.2023. 01804

work page doi:10.1109/iccv51070.2023.01804 2023
[17]

Key.net: Keypoint detection by handcrafted and learned cnn filters, 2019

Axel Barroso-Laguna, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. Key.net: Keypoint detection by handcrafted and learned cnn filters, 2019. URLhttps://arxiv. org/abs/1904.00889

work page arXiv 2019
[18]

Rethinking visual geo- localization for large-scale applications

Gabriele Berton, Carlo Masone, and Barbara Caputo. Rethinking visual geo- localization for large-scale applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4878–4888, June 2022

work page 2022
[19]

DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes

Berta Bescos, Jos´ e M. F´ acil, Javier Civera, and Jos´ e Neira. Dynaslam: Tracking, mapping and inpainting in dynamic scenes, 2018. URLhttps://arxiv.org/abs/ 1806.05620

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨ uller. Zoedepth: Zero-shot transfer by combining relative and metric depth.arXiv preprint arXiv:2302.12288, 2023. 90

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Training diffusion models with reinforcement learning, 2023

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning, 2023

work page 2023
[22]

Deep- calib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras

Oleksandr Bogdan, Viktor Eckstein, Francois Rameau, and Jean-Charles Bazin. Deep- calib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras. InProceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production, 2018

work page 2018
[23]

Neural-guided ransac: Learning where to sample model hypotheses

Eric Brachmann and Carsten Rother. Neural-guided ransac: Learning where to sample model hypotheses. In2019 IEEE/CVF International Conference on Computer Vision (ICCV), page 4321–4330. IEEE, October 2019. doi: 10.1109/iccv.2019.00442. URL http://dx.doi.org/10.1109/iccv.2019.00442

work page doi:10.1109/iccv.2019.00442 2019
[24]

DSAC - Differentiable RANSAC for Camera Localization

Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac - differentiable ransac for camera local- ization, 2018. URLhttps://arxiv.org/abs/1611.05705

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Accelerated co- ordinate encoding: Learning to relocalize in minutes using rgb and poses

Eric Brachmann, Tommaso Cavallari, and Victor Adrian Prisacariu. Accelerated co- ordinate encoding: Learning to relocalize in minutes using rgb and poses. InCVPR, 2023

work page 2023
[26]

Scene coordinate recon- struction: Posing of image collections via incremental learning of a relocalizer

Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, ´Aron Monszpart, Daniyar Turmukhambetov, and Victor Adrian Prisacariu. Scene coordinate recon- struction: Posing of image collections via incremental learning of a relocalizer. In ECCV, 2024

work page 2024
[27]

Language models are few-shot learners.Advances in neural information processing systems, 33, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33, 2020

work page 2020
[28]

Springer International Publishing, 2020

Bingyi Cao, Andr´ e Araujo, and Jack Sim.Unifying Deep Local and Global Features for Image Search, page 726–743. Springer International Publishing, 2020. ISBN 91 9783030585655. doi: 10.1007/978-3-030-58565-5 43. URLhttp://dx.doi.org/10. 1007/978-3-030-58565-5_43

work page doi:10.1007/978-3-030-58565-5 2020
[29]

Emerging properties in self-supervised vision trans- formers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bo- janowski, and Armand Joulin. Emerging properties in self-supervised vision trans- formers. InProceedings of the International Conference on Computer Vision (ICCV), 2021

work page 2021
[30]

Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, and Dylan Hadfield-Menell

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, J´ er´ emy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Rapha¨ el Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud,...

work page 2023
[31]

Chan, Mariano Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein

Eric R. Chan, Mariano Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

work page 2021
[32]

Chan, Connor Z

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3d generative adversarial networks. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022

work page 2022
[33]

Efficient geometry-aware 3d generative adversarial networks

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InCVPR, 2022

work page 2022
[34]

Efficient and robust large-scale ro- tation averaging

Avishek Chatterjee and Venu Madhav Govindu. Efficient and robust large-scale ro- tation averaging. In2013 IEEE International Conference on Computer Vision, pages 92 521–528, December 2013. doi: 10.1109/ICCV.2013.70. URLhttp://dx.doi.org/10. 1109/ICCV.2013.70

work page doi:10.1109/iccv.2013.70 2013
[35]

Robust relative rotation averaging

Avishek Chatterjee and Venu Madhav Govindu. Robust relative rotation averaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):958–972, April

work page
[36]

URLhttp://dx.doi.org/10.1109/TPAMI

doi: 10.1109/TPAMI.2017.2693984. URLhttp://dx.doi.org/10.1109/TPAMI. 2017.2693984

work page doi:10.1109/tpami.2017.2693984 2017
[37]

Tensorf: Ten- sorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Ten- sorial radiance fields. InComputer Vision – ECCV 2022 – 17th European Confer- ence, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXII, volume 13692 ofLecture Notes in Computer Science, pages 333–350. Springer, 2022. doi: 10.1007/ 978-3-031-19824-3 20. URLhttps://doi....

work page doi:10.1007/978-3-031-19824-3_20 2022
[38]

Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction

Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. InICCV, 2023

work page 2023
[39]

Aspanformer: Detector-free image matching with adaptive span transformer, 2022

Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adaptive span transformer, 2022. URLhttps://arxiv.org/abs/2208.14201

work page arXiv 2022
[40]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20622– 20633, October 2023

work page 2023
[41]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation.arXiv preprint arXiv:2303.13873, 2023

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation.arXiv preprint arXiv:2303.13873, 2023

work page arXiv 2023
[42]

Learning implicit fields for generative shape modeling

Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 93

work page 2019
[43]

V3d: Video diffusion models are effective 3d generators, 2024

Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, and Huaping Liu. V3d: Video diffusion models are effective 3d generators, 2024. URLhttps://arxiv.org/abs/ 2403.06738

work page arXiv 2024
[44]

Luciddreamer: Domain-free generation of 3d gaussian splatting scenes.arXiv preprint arXiv:2311.13384, 2023

Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, and Kyoung Mu Lee. Luciddreamer: Domain-free generation of 3d gaussian splatting scenes.arXiv preprint arXiv:2311.13384, 2023

work page arXiv 2023
[45]

Abo: Dataset and benchmarks for real-world 3d object understanding

Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, et al. Abo: Dataset and benchmarks for real-world 3d object understanding. InCVPR, pages 21126–21136, 2022

work page 2022
[46]

Discrete- continuous optimization for large-scale structure from motion

David Crandall, Andrew Owens, Noah Snavely, and Dan Huttenlocher. Discrete- continuous optimization for large-scale structure from motion. InCVPR 2011, June

work page 2011
[47]

URLhttp://dx.doi.org/10.1109/CVPR

doi: 10.1109/CVPR.2011.5995626. URLhttp://dx.doi.org/10.1109/CVPR. 2011.5995626

work page doi:10.1109/cvpr.2011.5995626 2011
[48]

Global structure-from-motion by similarity averaging

Zhaopeng Cui and Ping Tan. Global structure-from-motion by similarity averaging. In2015 IEEE International Conference on Computer Vision (ICCV), pages 864–872, December 2015. doi: 10.1109/ICCV.2015.105. URLhttp://dx.doi.org/10.1109/ ICCV.2015.105

work page doi:10.1109/iccv.2015.105 2015
[49]

Obja- verse: A universe of annotated 3d objects, 2022

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- verse: A universe of annotated 3d objects, 2022

work page 2022
[50]

Objaverse-XL: A Universe of 10M+ 3D Objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.arXiv preprint arXiv:2307.05663, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[51]

Obja- 94 verse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- 94 verse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023

work page 2023
[52]

Obja- verse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- verse: A universe of annotated 3d objects. InCVPR, pages 13142–13153, 2023

work page 2023
[53]

Superpoint: Self- supervised interest point detection and description

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self- supervised interest point detection and description. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), page 337–33712. IEEE, June 2018. doi: 10.1109/cvprw.2018.00060. URLhttp://dx.doi.org/10. 1109/cvprw.2018.00060

work page doi:10.1109/cvprw.2018.00060 2018
[54]

Theia: A library for structure-from-motion

Theia Developers. Theia: A library for structure-from-motion. urlhttp://www.theia-sfm.org/, 2026. Accessed 2026-02-03

work page 2026
[55]

Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization

Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, and Yanchao Yang. Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16739–16752, June 2025

work page 2025
[56]

Google scanned objects: A high-quality dataset of 3d scanned household items

Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high-quality dataset of 3d scanned household items. In2022 International Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022

work page 2022
[57]

Close-range camera calibration.Photogramm

C Brown Duane. Close-range camera calibration.Photogramm. Eng, 37(8), 1971

work page 1971
[58]

MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3R-SfM: A fully-integrated solution for uncon- strained structure-from-motion.arXiv preprint arXiv:2409.19152, 2024

work page arXiv 2024
[59]

MASt3r-sfm: a fully-integrated solution for unconstrained 95 structure-from-motion

Bardienus Pieter Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3r-sfm: a fully-integrated solution for unconstrained 95 structure-from-motion. InInternational Conference on 3D Vision (3DV), 2025. URL https://openreview.net/forum?id=5uw1GRBFoT

work page 2025
[60]

D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-net: A trainable cnn for joint detection and description of local features, 2019. URLhttps://arxiv.org/abs/1905.03561

work page internal anchor Pith review Pith/arXiv arXiv 2019
[62]

Hyperparameters in rein- forcement learning and how to tune them, 2023

Theresa Eimer, Marius Lindauer, and Roberta Raileanu. Hyperparameters in rein- forcement learning and how to tune them, 2023

work page 2023
[63]

Syncity: Training-free generation of 3d worlds

Jakob Engstler, Gaisi Xia, Brandon Trabucco, Huicheng Yan, Rui Ding, Yun-Chun Chen, and Gordon Wetzstein. Syncity: Training-free generation of 3d worlds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18577–18587, 2025

work page 2025
[64]

Disentangled 3d scene gener- ation with layout learning

Dave Epstein, Alisa Kim, Xueyue Wang, Hsin-Ying Chen, Hung-Yu Tseng, Hsiao- Yu Fish Chen, Ming-Hsuan Yang, and Wei Wang. Disentangled 3d scene gener- ation with layout learning. In Dylan Fortune, Paul Vicol, Serena Yeung, Han- nah Zhang, Frank Hutter, and Hugo Larochelle, editors,Proceedings of the 41st International Conference on Machine Learning, volume...

work page 2024
[65]

A review of the one-parameter division undistortion model.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 1, 2021

Bastian Erdn¨ uß. A review of the one-parameter division undistortion model.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 1, 2021

work page 2021
[66]

Haoqiang Fan, Hao Su, and Leonidas J. Guibas. A point set generation network for 96 3d object reconstruction from a single image. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

work page 2017
[67]

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023

work page 2023
[68]

Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints

Chuan Fang, Yuan Dong, Kunming Luo, Xiaotao Hu, Rakesh Shrestha, and Ping Tan. Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints. InInternational Conference on 3D Vision (3DV), pages 692–701, 2025. doi: 10.1109/ 3DV66043.2025.00069

work page arXiv 2025
[69]

MIT Press, 1993

Olivier Faugeras.Three-dimensional computer vision: A geometric viewpoint. MIT Press, 1993

work page 1993
[70]

Simultaneous linear estimation of multiple view geometry and lens distortion

Andrew W Fitzgibbon. Simultaneous linear estimation of multiple view geometry and lens distortion. InCVPR, 2001

work page 2001
[71]

Get3d: A generative model of high quality 3d textured shapes learned from images.Advances in Neural Information Processing Systems, 35:31841–31854, 2022

Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images.Advances in Neural Information Processing Systems, 35:31841–31854, 2022

work page 2022
[72]

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, and Ying Shan. SEED-X: multimodal models with unified multi-granularity comprehension and generation.CoRR, abs/2404.14396, 2024. doi: 10.48550/ARXIV. 2404.14396. URLhttps://doi.org/10.48550/arXiv.2404.14396

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024
[73]

Learning shape templates with structured implicit functions

Kyle Genova, Ethan Highlander, Ozgur Sirin, Wonmin Liao, Kostas Daniilidis, and Thomas Funkhouser. Learning shape templates with structured implicit functions. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

work page 2020
[74]

S2dnet: Learning accurate 97 correspondences for sparse-to-dense feature matching, 2020

Hugo Germain, Guillaume Bourmaud, and Vincent Lepetit. S2dnet: Learning accurate 97 correspondences for sparse-to-dense feature matching, 2020. URLhttps://arxiv. org/abs/2004.01673

work page arXiv 2020
[75]

Vivid-1-to-3: Novel view synthesis with video diffusion models, 2023

Jeong gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, and Kwang Moo Yi. Vivid-1-to-3: Novel view synthesis with video diffusion models, 2023. URLhttps: //arxiv.org/abs/2312.01305

work page arXiv 2023
[77]

All for one, and one for all: Urbansyn dataset, the third musketeer of synthetic driving scenes.Neurocomputing, 637:130038, 2025

Jose L G´ omez, Manuel Silva, Antonio Seoane, Agn` es Borr´ as, Mario Noriega, Germ´ an Ros, Jose A Iglesias-Guitian, and Antonio M L´ opez. All for one, and one for all: Urbansyn dataset, the third musketeer of synthetic driving scenes.Neurocomputing, 637:130038, 2025

work page 2025
[79]

V. M. Govindu. Lie-algebraic averaging for globally consistent motion estimation. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages 684–691, 2004. doi: 10.1109/ CVPR.2004.1315098. URLhttp://dx.doi.org/10.1109/CVPR.2004.1315098

work page doi:10.1109/cvpr.2004.1315098 2004
[80]

Kubric: A scalable dataset generator

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duck- worth, David J Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, et al. Kubric: A scalable dataset generator. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3749–3761, 2022

work page 2022
[81]

Kim, Bryan C

Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and Mathieu 98 Aubry. A papier-mache approach to learning 3d surface generation. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

work page 2018
[82]

Cas- cade cost volume for high-resolution multi-view stereo and stereo matching, 2020

Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, and Ping Tan. Cas- cade cost volume for high-resolution multi-view stereo and stereo matching, 2020. URL https://arxiv.org/abs/1912.06378

work page arXiv 2020
[83]

threestudio: A unified framework for 3d content generation.https://github.com/ threestudio-project/threestudio, 2023

Yuan-Chen Guo, Ying-Tian Liu, Ruizhi Shao, Christian Laforte, Vikram Voleti, Guan Luo, Chia-Hao Chen, Zi-Xin Zou, Chen Wang, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3d content generation.https://github.com/ threestudio-project/threestudio, 2023

work page 2023

Showing first 80 references.

[1] [1]

Learning representations and generative models for 3d point clouds

Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3d point clouds. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learn- ing, volume 80 ofProceedings of Machine Learning Research, pages 40–49. PMLR, 10–15 Jul 2018. URLhttps://...

work page 2018

[2] [2]

Adobe firefly.https://www.adobe.com/sensei/generative-ai/firefly

Adobe. Adobe firefly.https://www.adobe.com/sensei/generative-ai/firefly. html, 2023. Accessed: 2023-11-15

work page 2023

[3] [3]

Building rome in a day

Sameer Agarwal, Noah Snavely, Ian Simon, Steven M Seitz, and Richard Szeliski. Building rome in a day. In2009 IEEE 12th International Conference on Computer Vision, pages 72–79, September 2009. doi: 10.1109/ICCV.2009.5459148. URLhttp: //dx.doi.org/10.1109/ICCV.2009.5459148

work page doi:10.1109/iccv.2009.5459148 2009

[4] [4]

Seitz, and Richard Szeliski

Sameer Agarwal, Noah Snavely, Steven M. Seitz, and Richard Szeliski. Bundle adjust- ment in the large. InComputer Vision – ECCV 2010, pages 29–42, 2010. doi: 10.1007/ 978-3-642-15552-9 3. URLhttp://dx.doi.org/10.1007/978-3-642-15552-9_3

work page doi:10.1007/978-3-642-15552-9_3 2010

[5] [5]

Seitz, and Richard Szeliski

Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski. Building rome in a day.Communications of the ACM, 54(10):105–112, October 2011. doi: 10.1145/2001269.2001293. URL http://dx.doi.org/10.1145/2001269.2001293

work page doi:10.1145/2001269.2001293 2011

[6] [6]

Ceres Solver

Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver. Software, 10

work page

[7] [7]

URLhttps://github.com/ceres-solver/ceres-solver. 88

work page

[8] [8]

Mixvpr: Feature mixing for visual place recognition

Amar Ali-bey, Brahim Chaib-draa, and Philippe Gigu` ere. Mixvpr: Feature mixing for visual place recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2998–3007, January 2023

work page 2023

[9] [9]

Renderdiffusion: Image diffusion for 3d recon- struction, inpainting and generation

Titas Anciukeviˇ cius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J Mitra, and Paul Guerrero. Renderdiffusion: Image diffusion for 3d recon- struction, inpainting and generation. InCVPR, 2023

work page 2023

[10] [10]

Netvlad: Cnn architecture for weakly supervised place recognition

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

work page 2016

[11] [11]

Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova Das- Sarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson...

work page 2022

[12] [12]

Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page 2022

[13] [13]

A minimal solution for two-view focal-length estimation using two affine correspondences

Daniel Barath, Tekla Toth, and Levente Hajder. A minimal solution for two-view focal-length estimation using two affine correspondences. InCVPR, 2017

work page 2017

[14] [14]

Fundamental matrix for cameras with radial distortion

Jo˜ ao Pedro Barreto and Kostas Daniilidis. Fundamental matrix for cameras with radial distortion. InICCV, 2005

work page 2005

[15] [15]

Masked feature prediction for self-supervised visual pre-training

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hed- man. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 5460–5469, 2022. doi: 10.1109/CVPR52688.2022.00539. URLhttps://doi.org/10.1109/CVPR52688. 2022.00539

work page doi:10.1109/cvpr52688.2022.00539 2022

[16] [16]

2023 , url =

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased grid-based neural radiance fields. InIEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 19640–19648, 2023. doi: 10.1109/ICCV51070.2023.01804. URLhttps://doi.org/10.1109/ICCV51070.2023. 01804

work page doi:10.1109/iccv51070.2023.01804 2023

[17] [17]

Key.net: Keypoint detection by handcrafted and learned cnn filters, 2019

Axel Barroso-Laguna, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. Key.net: Keypoint detection by handcrafted and learned cnn filters, 2019. URLhttps://arxiv. org/abs/1904.00889

work page arXiv 2019

[18] [18]

Rethinking visual geo- localization for large-scale applications

Gabriele Berton, Carlo Masone, and Barbara Caputo. Rethinking visual geo- localization for large-scale applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4878–4888, June 2022

work page 2022

[19] [19]

DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes

Berta Bescos, Jos´ e M. F´ acil, Javier Civera, and Jos´ e Neira. Dynaslam: Tracking, mapping and inpainting in dynamic scenes, 2018. URLhttps://arxiv.org/abs/ 1806.05620

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨ uller. Zoedepth: Zero-shot transfer by combining relative and metric depth.arXiv preprint arXiv:2302.12288, 2023. 90

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

Training diffusion models with reinforcement learning, 2023

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning, 2023

work page 2023

[22] [22]

Deep- calib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras

Oleksandr Bogdan, Viktor Eckstein, Francois Rameau, and Jean-Charles Bazin. Deep- calib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras. InProceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production, 2018

work page 2018

[23] [23]

Neural-guided ransac: Learning where to sample model hypotheses

Eric Brachmann and Carsten Rother. Neural-guided ransac: Learning where to sample model hypotheses. In2019 IEEE/CVF International Conference on Computer Vision (ICCV), page 4321–4330. IEEE, October 2019. doi: 10.1109/iccv.2019.00442. URL http://dx.doi.org/10.1109/iccv.2019.00442

work page doi:10.1109/iccv.2019.00442 2019

[24] [24]

DSAC - Differentiable RANSAC for Camera Localization

Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac - differentiable ransac for camera local- ization, 2018. URLhttps://arxiv.org/abs/1611.05705

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

Accelerated co- ordinate encoding: Learning to relocalize in minutes using rgb and poses

Eric Brachmann, Tommaso Cavallari, and Victor Adrian Prisacariu. Accelerated co- ordinate encoding: Learning to relocalize in minutes using rgb and poses. InCVPR, 2023

work page 2023

[26] [26]

Scene coordinate recon- struction: Posing of image collections via incremental learning of a relocalizer

Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, ´Aron Monszpart, Daniyar Turmukhambetov, and Victor Adrian Prisacariu. Scene coordinate recon- struction: Posing of image collections via incremental learning of a relocalizer. In ECCV, 2024

work page 2024

[27] [27]

Language models are few-shot learners.Advances in neural information processing systems, 33, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33, 2020

work page 2020

[28] [28]

Springer International Publishing, 2020

Bingyi Cao, Andr´ e Araujo, and Jack Sim.Unifying Deep Local and Global Features for Image Search, page 726–743. Springer International Publishing, 2020. ISBN 91 9783030585655. doi: 10.1007/978-3-030-58565-5 43. URLhttp://dx.doi.org/10. 1007/978-3-030-58565-5_43

work page doi:10.1007/978-3-030-58565-5 2020

[29] [29]

Emerging properties in self-supervised vision trans- formers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bo- janowski, and Armand Joulin. Emerging properties in self-supervised vision trans- formers. InProceedings of the International Conference on Computer Vision (ICCV), 2021

work page 2021

[30] [30]

Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, and Dylan Hadfield-Menell

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, J´ er´ emy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Rapha¨ el Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud,...

work page 2023

[31] [31]

Chan, Mariano Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein

Eric R. Chan, Mariano Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

work page 2021

[32] [32]

Chan, Connor Z

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3d generative adversarial networks. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022

work page 2022

[33] [33]

Efficient geometry-aware 3d generative adversarial networks

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InCVPR, 2022

work page 2022

[34] [34]

Efficient and robust large-scale ro- tation averaging

Avishek Chatterjee and Venu Madhav Govindu. Efficient and robust large-scale ro- tation averaging. In2013 IEEE International Conference on Computer Vision, pages 92 521–528, December 2013. doi: 10.1109/ICCV.2013.70. URLhttp://dx.doi.org/10. 1109/ICCV.2013.70

work page doi:10.1109/iccv.2013.70 2013

[35] [35]

Robust relative rotation averaging

Avishek Chatterjee and Venu Madhav Govindu. Robust relative rotation averaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):958–972, April

work page

[36] [36]

URLhttp://dx.doi.org/10.1109/TPAMI

doi: 10.1109/TPAMI.2017.2693984. URLhttp://dx.doi.org/10.1109/TPAMI. 2017.2693984

work page doi:10.1109/tpami.2017.2693984 2017

[37] [37]

Tensorf: Ten- sorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Ten- sorial radiance fields. InComputer Vision – ECCV 2022 – 17th European Confer- ence, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXII, volume 13692 ofLecture Notes in Computer Science, pages 333–350. Springer, 2022. doi: 10.1007/ 978-3-031-19824-3 20. URLhttps://doi....

work page doi:10.1007/978-3-031-19824-3_20 2022

[38] [38]

Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction

Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. InICCV, 2023

work page 2023

[39] [39]

Aspanformer: Detector-free image matching with adaptive span transformer, 2022

Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adaptive span transformer, 2022. URLhttps://arxiv.org/abs/2208.14201

work page arXiv 2022

[40] [40]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20622– 20633, October 2023

work page 2023

[41] [41]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation.arXiv preprint arXiv:2303.13873, 2023

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation.arXiv preprint arXiv:2303.13873, 2023

work page arXiv 2023

[42] [42]

Learning implicit fields for generative shape modeling

Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 93

work page 2019

[43] [43]

V3d: Video diffusion models are effective 3d generators, 2024

Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, and Huaping Liu. V3d: Video diffusion models are effective 3d generators, 2024. URLhttps://arxiv.org/abs/ 2403.06738

work page arXiv 2024

[44] [44]

Luciddreamer: Domain-free generation of 3d gaussian splatting scenes.arXiv preprint arXiv:2311.13384, 2023

Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, and Kyoung Mu Lee. Luciddreamer: Domain-free generation of 3d gaussian splatting scenes.arXiv preprint arXiv:2311.13384, 2023

work page arXiv 2023

[45] [45]

Abo: Dataset and benchmarks for real-world 3d object understanding

Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, et al. Abo: Dataset and benchmarks for real-world 3d object understanding. InCVPR, pages 21126–21136, 2022

work page 2022

[46] [46]

Discrete- continuous optimization for large-scale structure from motion

David Crandall, Andrew Owens, Noah Snavely, and Dan Huttenlocher. Discrete- continuous optimization for large-scale structure from motion. InCVPR 2011, June

work page 2011

[47] [47]

URLhttp://dx.doi.org/10.1109/CVPR

doi: 10.1109/CVPR.2011.5995626. URLhttp://dx.doi.org/10.1109/CVPR. 2011.5995626

work page doi:10.1109/cvpr.2011.5995626 2011

[48] [48]

Global structure-from-motion by similarity averaging

Zhaopeng Cui and Ping Tan. Global structure-from-motion by similarity averaging. In2015 IEEE International Conference on Computer Vision (ICCV), pages 864–872, December 2015. doi: 10.1109/ICCV.2015.105. URLhttp://dx.doi.org/10.1109/ ICCV.2015.105

work page doi:10.1109/iccv.2015.105 2015

[49] [49]

Obja- verse: A universe of annotated 3d objects, 2022

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- verse: A universe of annotated 3d objects, 2022

work page 2022

[50] [50]

Objaverse-XL: A Universe of 10M+ 3D Objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.arXiv preprint arXiv:2307.05663, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[51] [51]

Obja- 94 verse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- 94 verse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023

work page 2023

[52] [52]

Obja- verse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- verse: A universe of annotated 3d objects. InCVPR, pages 13142–13153, 2023

work page 2023

[53] [53]

Superpoint: Self- supervised interest point detection and description

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self- supervised interest point detection and description. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), page 337–33712. IEEE, June 2018. doi: 10.1109/cvprw.2018.00060. URLhttp://dx.doi.org/10. 1109/cvprw.2018.00060

work page doi:10.1109/cvprw.2018.00060 2018

[54] [54]

Theia: A library for structure-from-motion

Theia Developers. Theia: A library for structure-from-motion. urlhttp://www.theia-sfm.org/, 2026. Accessed 2026-02-03

work page 2026

[55] [55]

Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization

Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, and Yanchao Yang. Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16739–16752, June 2025

work page 2025

[56] [56]

Google scanned objects: A high-quality dataset of 3d scanned household items

Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high-quality dataset of 3d scanned household items. In2022 International Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022

work page 2022

[57] [57]

Close-range camera calibration.Photogramm

C Brown Duane. Close-range camera calibration.Photogramm. Eng, 37(8), 1971

work page 1971

[58] [58]

MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3R-SfM: A fully-integrated solution for uncon- strained structure-from-motion.arXiv preprint arXiv:2409.19152, 2024

work page arXiv 2024

[59] [59]

MASt3r-sfm: a fully-integrated solution for unconstrained 95 structure-from-motion

Bardienus Pieter Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3r-sfm: a fully-integrated solution for unconstrained 95 structure-from-motion. InInternational Conference on 3D Vision (3DV), 2025. URL https://openreview.net/forum?id=5uw1GRBFoT

work page 2025

[60] [60]

D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-net: A trainable cnn for joint detection and description of local features, 2019. URLhttps://arxiv.org/abs/1905.03561

work page internal anchor Pith review Pith/arXiv arXiv 2019

[61] [62]

Hyperparameters in rein- forcement learning and how to tune them, 2023

Theresa Eimer, Marius Lindauer, and Roberta Raileanu. Hyperparameters in rein- forcement learning and how to tune them, 2023

work page 2023

[62] [63]

Syncity: Training-free generation of 3d worlds

Jakob Engstler, Gaisi Xia, Brandon Trabucco, Huicheng Yan, Rui Ding, Yun-Chun Chen, and Gordon Wetzstein. Syncity: Training-free generation of 3d worlds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18577–18587, 2025

work page 2025

[63] [64]

Disentangled 3d scene gener- ation with layout learning

Dave Epstein, Alisa Kim, Xueyue Wang, Hsin-Ying Chen, Hung-Yu Tseng, Hsiao- Yu Fish Chen, Ming-Hsuan Yang, and Wei Wang. Disentangled 3d scene gener- ation with layout learning. In Dylan Fortune, Paul Vicol, Serena Yeung, Han- nah Zhang, Frank Hutter, and Hugo Larochelle, editors,Proceedings of the 41st International Conference on Machine Learning, volume...

work page 2024

[64] [65]

A review of the one-parameter division undistortion model.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 1, 2021

Bastian Erdn¨ uß. A review of the one-parameter division undistortion model.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 1, 2021

work page 2021

[65] [66]

Haoqiang Fan, Hao Su, and Leonidas J. Guibas. A point set generation network for 96 3d object reconstruction from a single image. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

work page 2017

[66] [67]

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023

work page 2023

[67] [68]

Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints

Chuan Fang, Yuan Dong, Kunming Luo, Xiaotao Hu, Rakesh Shrestha, and Ping Tan. Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints. InInternational Conference on 3D Vision (3DV), pages 692–701, 2025. doi: 10.1109/ 3DV66043.2025.00069

work page arXiv 2025

[68] [69]

MIT Press, 1993

Olivier Faugeras.Three-dimensional computer vision: A geometric viewpoint. MIT Press, 1993

work page 1993

[69] [70]

Simultaneous linear estimation of multiple view geometry and lens distortion

Andrew W Fitzgibbon. Simultaneous linear estimation of multiple view geometry and lens distortion. InCVPR, 2001

work page 2001

[70] [71]

Get3d: A generative model of high quality 3d textured shapes learned from images.Advances in Neural Information Processing Systems, 35:31841–31854, 2022

Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images.Advances in Neural Information Processing Systems, 35:31841–31854, 2022

work page 2022

[71] [72]

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, and Ying Shan. SEED-X: multimodal models with unified multi-granularity comprehension and generation.CoRR, abs/2404.14396, 2024. doi: 10.48550/ARXIV. 2404.14396. URLhttps://doi.org/10.48550/arXiv.2404.14396

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024

[72] [73]

Learning shape templates with structured implicit functions

Kyle Genova, Ethan Highlander, Ozgur Sirin, Wonmin Liao, Kostas Daniilidis, and Thomas Funkhouser. Learning shape templates with structured implicit functions. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

work page 2020

[73] [74]

S2dnet: Learning accurate 97 correspondences for sparse-to-dense feature matching, 2020

Hugo Germain, Guillaume Bourmaud, and Vincent Lepetit. S2dnet: Learning accurate 97 correspondences for sparse-to-dense feature matching, 2020. URLhttps://arxiv. org/abs/2004.01673

work page arXiv 2020

[74] [75]

Vivid-1-to-3: Novel view synthesis with video diffusion models, 2023

Jeong gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, and Kwang Moo Yi. Vivid-1-to-3: Novel view synthesis with video diffusion models, 2023. URLhttps: //arxiv.org/abs/2312.01305

work page arXiv 2023

[75] [77]

All for one, and one for all: Urbansyn dataset, the third musketeer of synthetic driving scenes.Neurocomputing, 637:130038, 2025

Jose L G´ omez, Manuel Silva, Antonio Seoane, Agn` es Borr´ as, Mario Noriega, Germ´ an Ros, Jose A Iglesias-Guitian, and Antonio M L´ opez. All for one, and one for all: Urbansyn dataset, the third musketeer of synthetic driving scenes.Neurocomputing, 637:130038, 2025

work page 2025

[76] [79]

V. M. Govindu. Lie-algebraic averaging for globally consistent motion estimation. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages 684–691, 2004. doi: 10.1109/ CVPR.2004.1315098. URLhttp://dx.doi.org/10.1109/CVPR.2004.1315098

work page doi:10.1109/cvpr.2004.1315098 2004

[77] [80]

Kubric: A scalable dataset generator

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duck- worth, David J Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, et al. Kubric: A scalable dataset generator. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3749–3761, 2022

work page 2022

[78] [81]

Kim, Bryan C

Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and Mathieu 98 Aubry. A papier-mache approach to learning 3d surface generation. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

work page 2018

[79] [82]

Cas- cade cost volume for high-resolution multi-view stereo and stereo matching, 2020

Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, and Ping Tan. Cas- cade cost volume for high-resolution multi-view stereo and stereo matching, 2020. URL https://arxiv.org/abs/1912.06378

work page arXiv 2020

[80] [83]

threestudio: A unified framework for 3d content generation.https://github.com/ threestudio-project/threestudio, 2023

Yuan-Chen Guo, Ying-Tian Liu, Ruizhi Shao, Christian Laforte, Vikram Voleti, Guan Luo, Chia-Hao Chen, Zi-Xin Zou, Chen Wang, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3d content generation.https://github.com/ threestudio-project/threestudio, 2023

work page 2023