arxiv: 2604.13416 · v1 · submitted 2026-04-15 · 💻 cs.CV · cs.AI

Recognition: unknown

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Cheng-You Lu , Yi-Shan Hung , Wei-Ling Chi , Hao-Ping Wang , Charlie Li-Ting Tsai , Yu-Cheng Chang , Yu-Lun Liu , Thomas Do

show 1 more author

Chin-Teng Lin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:22 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords distractor-freenovel view synthesisradiance fieldsdatasetbenchmark3D reconstructionreal-world scenescasual capture

0 comments

The pith

DF3DV-1K supplies 1,048 real scenes each with clean and cluttered image sets to benchmark distractor-free novel view synthesis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to close the data gap for methods that reconstruct scenes while ignoring unwanted objects in casual photographs. It releases DF3DV-1K, a collection of 1,048 scenes captured with ordinary cameras that supplies both clean and cluttered image sets for every scene, plus a focused 41-scene subset for hard cases. The authors run nine recent distractor-free radiance-field techniques plus 3D Gaussian Splatting on the data and measure which ones hold up across indoor and outdoor conditions. They further show that fine-tuning a diffusion enhancer on the new pairs raises PSNR by roughly one decibel on held-out scenes. A shared, large-scale resource of this kind would let researchers compare approaches on consistent real-world clutter instead of isolated small tests.

Core claim

DF3DV-1K comprises 1,048 real-world scenes, each furnished with paired clean and cluttered image sets totaling 89,924 frames taken by consumer cameras, covering 128 distractor types and 161 scene themes. A curated 41-scene subset, DF3DV-41, isolates challenging capture conditions. Benchmarking of nine distractor-free radiance field methods and 3D Gaussian Splatting identifies relative robustness, while fine-tuning a diffusion-based 2D enhancer on DF3DV-1K yields average gains of 0.96 dB PSNR and 0.057 LPIPS on held-out data including DF3DV-41 and the On-the-go dataset.

What carries the argument

The DF3DV-1K dataset itself, which pairs clean and cluttered photographs of the same 1,048 scenes to let methods be tested on their ability to suppress distractors during novel-view reconstruction.

If this is right

Researchers gain a standardized way to compare how different methods handle the same real-world clutter across hundreds of scenes.
Benchmark results highlight which current techniques cope best with particular distractor categories and environments.
The paired clean-cluttered images enable direct measurement of improvement when enhancers or filters are added to existing pipelines.
The scale supports training of more general models that do not require per-scene tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread use of the dataset could shift evaluation away from controlled lab captures toward everyday uncontrolled photography.
The clean-cluttered pairs could be used to train models that remove distractors jointly with reconstruction rather than as a separate step.
Applications such as AR overlays or virtual tours from tourist photos would become more reliable if methods prove robust on this data.

Load-bearing premise

The collected scenes, distractor types, and capture conditions are representative enough of everyday casual photography to support general claims about method robustness.

What would settle it

A distractor-free method that scores high on DF3DV-1K but produces visibly degraded views on an independent collection of casual photos containing new distractors or scene types would show the benchmark does not support broad conclusions.

Figures

Figures reproduced from arXiv: 2604.13416 by Charlie Li-Ting Tsai, Cheng-You Lu, Chin-Teng Lin, Hao-Ping Wang, Thomas Do, Wei-Ling Chi, Yi-Shan Hung, Yu-Cheng Chang, Yu-Lun Liu.

**Figure 2.** Figure 2: Comparison of radiance fields. (a) Distractor-free tasks [63] use casually captured images within a short period. (b) In-the-wild tasks [24,51], with large temporal gaps, often target images collected across seasons. (c) Dynamic tasks [89] target 4D scene synthesis and assume densely captured sequential data. Their variants such as in-the-wild (with large temporal gap) [3, 10, 11, 14, 18, 30, 33, 37, 51, 5… view at source ↗

**Figure 3.** Figure 3: Number of distractor-free radiance field papers. The first distractor-free radiance field method [65], targeting images captured over short time spans, was introduced in 2023 together with a benchmark. The research area rapidly gained attention in 2024 with the release of the On-the-go benchmark [63]. Although the number of works doubled in 2025, a public, large-scale, challenging dataset and benchmark sys… view at source ↗

**Figure 4.** Figure 4: Samples of systematically designed scenarios in DF3DV-41. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Dataset difficulty analysis via per-image performance distributions. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Overview of the data collection and curation pipeline. (a) Scene de [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of DF3DV-1K by capture device, resolution, and month [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Scene count distribution. Clean images per scene skew slightly toward lower bins for efficient novel-view benchmarking, whereas cluttered images extend toward higher bins to capture diverse distractor conditions. Total images per scene remain comparable to typical non-sparse radiance field settings. acquisition and thereby aligns with our dataset design goals. Furthermore, collecting data ourselves rather… view at source ↗

**Figure 9.** Figure 9: Qualitative results of the radiance field methods on DF3DV-41. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative results of enhancers. Leveraging DF3DV-1K*, a large-scale and diverse dataset, DI2FIX effectively removes distractor artifacts (e.g., dynamic chess pieces and vegetable artifacts) while inpainting occluded regions in target views. GT Target (RobustSplat) DF3DV-250 DF3DV-500 DF3DV-750 DF3DV-1K GT Target (3DGS) DF3DV-250 DF3DV-500 DF3DV-750 DF3DV-1K [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative results of DI2FIX trained with different data scales. Increasing the amount and diversity of training data improves robustness. In particular, DI2FIX progressively removes distractor artifacts (e.g., the chess pieces) while avoiding incorrect modifications to static scene content (e.g., the animal toy). training data by reconstructing scenes from cluttered images in DF3DV-1K* (excluding DF3DV… view at source ↗

**Figure 12.** Figure 12: Qualitative results of DI2FIX trained using different LPIPS filtering thresholds. Overly strict thresholds (e.g., 0.1) exclude many challenging image pairs, making artifacts difficult to remove (e.g., floaters around the cutting board). Overly loose thresholds (e.g., 0.9) introduce excessive noisy training samples, which may lead to undesired modifications of scene content (e.g., disappearing game cards).… view at source ↗

read the original abstract

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DF3DV-1K supplies a large paired clean/cluttered dataset that could help standardize testing for distractor removal in radiance fields, but the lack of curation details leaves the representativeness claim thin.

read the letter

The main thing here is a new dataset of 1,048 real scenes, each with both clean and cluttered captures from consumer cameras, plus a 41-scene challenging subset. They also run benchmarks on nine prior distractor-free methods and 3DGS, then show that fine-tuning a diffusion enhancer on the data yields roughly 0.96 dB PSNR and 0.057 LPIPS gains on held-out scenes including On-the-go. That pairing at this scale is the concrete addition; most earlier work either used smaller sets or lacked matched clean versions. The distractor taxonomy and theme coverage add some organization that could make comparisons easier. The benchmarks themselves are straightforward and identify which methods hold up better under clutter. The soft spot is the missing justification for how representative the collection actually is. The abstract says the scenes mimic casual capture across indoor and outdoor settings and that the DF3DV-41 subset is systematically designed, yet it gives no numbers on distractor frequency, lighting variation, or camera motion relative to broader real-world statistics, nor any external check. Without that, the robustness rankings and the enhancer results stay tied to this particular sample. Quality assurance steps for the clean versions are also not described, which matters when the central use case is removing distractors. This paper is aimed at researchers who need evaluation data for methods that must work on imperfect casual captures rather than controlled lab scenes. If you work on radiance fields, 3D reconstruction from phone video, or related AR applications, the paired structure could be worth pulling for your own tests. It deserves peer review because the scale and pairing are uncommon and the empirical benchmarks are reproducible in principle, even if the authors need to add explicit selection criteria and any available validation of the scene distribution before publication.

Referee Report

1 major / 1 minor

Summary. The paper introduces DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes with clean and cluttered image sets, totaling 89,924 images across 128 distractor types and 161 scene themes. Using this dataset, the authors benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identify robust methods and challenging scenarios, and demonstrate an application by fine-tuning a diffusion-based enhancer that achieves average improvements of 0.96 dB PSNR and 0.057 LPIPS on held-out sets including DF3DV-41 and the On-the-go dataset.

Significance. Should the dataset prove representative of real-world casual photography challenges, DF3DV-1K would be a significant contribution as the first large-scale benchmark specifically designed for distractor-free novel view synthesis. It enables comprehensive evaluation of method robustness, highlights challenging scenarios, and provides a resource for developing enhancers, potentially accelerating progress in handling distractors in radiance fields beyond controlled settings. The paired clean/cluttered design per scene is particularly valuable.

major comments (1)

[Dataset Construction and DF3DV-41 Subset] The central assumption that the 1,048 scenes and the systematically designed DF3DV-41 subset are representative of real-world distractor challenges lacks supporting evidence. No quantitative analysis or external validation is provided for the distribution of distractor types, scene themes, capture conditions (e.g., lighting, camera motion), or potential selection biases. This is load-bearing for all benchmarking conclusions and generalizability claims.

minor comments (1)

[Abstract] The total image count and scene numbers are clearly stated, but the paper could benefit from a brief mention of the capture protocol or camera types used to enhance reproducibility.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on the importance of dataset representativeness. We address the major comment below.

read point-by-point responses

Referee: The central assumption that the 1,048 scenes and the systematically designed DF3DV-41 subset are representative of real-world distractor challenges lacks supporting evidence. No quantitative analysis or external validation is provided for the distribution of distractor types, scene themes, capture conditions (e.g., lighting, camera motion), or potential selection biases. This is load-bearing for all benchmarking conclusions and generalizability claims.

Authors: We appreciate the referee highlighting this point, as the dataset's utility for benchmarking and generalizability does depend on its scope. DF3DV-1K was constructed by capturing 1,048 real scenes with consumer cameras to emulate casual photography, deliberately spanning 128 distractor types and 161 scene themes across indoor and outdoor environments, with each scene providing paired clean and cluttered image sets. The DF3DV-41 subset was curated to include challenging combinations of distractors, lighting, and motion. However, we agree that the manuscript currently lacks explicit quantitative breakdowns (e.g., frequency distributions or tables of distractor/scene coverage) and a dedicated discussion of selection biases or capture-condition statistics. We will revise the paper to include these: (1) summary statistics and visualizations of distractor-type and theme distributions, (2) details on capture variations where recorded, and (3) an expanded limitations section that clarifies the design process for DF3DV-41 and tempers generalizability claims to the observed diversity rather than asserting full real-world representativeness. External validation against independent large-scale statistics on casual photography distractors is not feasible within the current scope without new data collection, but the added internal analysis will make the benchmarking conclusions more transparent and defensible. revision: partial

standing simulated objections not resolved

External validation of representativeness against independent real-world statistics on distractor distributions and casual photography conditions

Circularity Check

0 steps flagged

No circularity: empirical dataset and benchmark with no derivations or self-referential predictions

full rationale

The paper introduces DF3DV-1K as a new real-world dataset of 1,048 scenes with clean/cluttered pairs and benchmarks nine prior distractor-free radiance field methods plus 3DGS on it, followed by a standard fine-tuning demonstration of a diffusion enhancer evaluated on held-out data. No equations, fitted parameters renamed as predictions, self-definitional claims, or load-bearing self-citations appear in the derivation chain; the work consists entirely of data collection, empirical evaluation against external methods, and an application that uses the dataset as training input with separate held-out testing. The representativeness assumption for general conclusions is an empirical limitation but does not create circularity by reducing any claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper contributes an empirical dataset and benchmark rather than a derivation; it rests on standard computer-vision assumptions about image formation and radiance-field applicability but introduces no new free parameters or invented entities.

axioms (2)

domain assumption Radiance fields enable photorealistic novel view synthesis from image collections
Invoked in the opening sentence to motivate the need for distractor-free methods.
domain assumption Consumer-camera captures with casual distractors represent realistic usage scenarios
Used to justify the dataset's relevance to practical applications.

pith-pipeline@v0.9.0 · 5616 in / 1443 out tokens · 53737 ms · 2026-05-10T13:22:00.776002+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

102 extracted references · 16 canonical work pages · 3 internal anchors

[1]

Remote Sensing17(9), 1609 (2025)

Bai, N., Yang, A., Chen, H., Du, C.: Satgs: Remote sensing novel view synthe- sis using multi-temporal satellite images with appearance-adaptive 3dgs. Remote Sensing17(9), 1609 (2025)

2025
[2]

In: The Fourteenth International Conference on Learning Representations (2026)

Bao, Y., Liao, J., Huo, J., Gao, Y.: Distractor-free generalizable 3d gaussian splat- ting. In: The Fourteenth International Conference on Learning Representations (2026)

2026
[3]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Bao, Y., Tang, C., Wang, Y., Li, H.: Seg-wild: Interactive segmentation based on 3d gaussian splatting for unconstrained image collections. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 8567–8576 (2025)

2025
[4]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6228–6237 (2018)

2018
[5]

In: International Conference on Machine Learning

Blau, Y., Michaeli, T.: Rethinking lossy compression: The rate-distortion- perception tradeoff. In: International Conference on Machine Learning. pp. 675–
[6]

Broxton, M., Flynn, J., Overbeck, R., Erickson, D., Hedman, P., Duvall, M., Dourgarian, J., Busch, J., Whalen, M., Debevec, P.: Immersive light field video withalayeredmeshrepresentation.ACMTransactionsonGraphics(TOG)39(4), 86–1 (2020)

2020
[7]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition

Cao, A., Johnson, J.: Hexplane: A fast representation for dynamic scenes. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 130–141 (2023)

2023
[8]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cao, J., Wu, H., Feng, Z., Bao, H., Zhou, X., Peng, S.: Universe: Unleashing the scene prior of video diffusion models for robust radiance field reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 27031–27041 (2025)

2025
[9]

In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

Charatan, D., Li, S.L., Tagliasacchi, A., Sitzmann, V.: pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19457–19467 (2024)

2024
[10]

CVPR (2024)

Chen, J., Qin, Y., Liu, L., Lu, J., Li, G.: Nerf-hugs: Improved neural radiance fields in non-static scenes using heuristics-guided segmentation. CVPR (2024)

2024
[11]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Chen, X., Zhang, Q., Li, X., Chen, Y., Feng, Y., Wang, X., Wang, J.: Hallucinated neural radiance fields in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12943–12952 (2022)

2022
[12]

Chen, X., Zhou, W., Cheng, Z.: Wildrayzer: Self-supervised large view synthesis in dynamicenvironments.In:ProceedingsoftheIEEE/CVFconferenceoncomputer vision and pattern recognition (2026)

2026
[13]

In: European conference on computer vision

Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In: European conference on computer vision. pp. 370–386. Springer (2024)

2024
[14]

In: European Confer- ence on Computer Vision

Dahmani, H., Bennehar, M., Piasco, N., Roldao, L., Tsishkou, D.: Swag: Splatting in the wild images with appearance-conditioned gaussians. In: European Confer- ence on Computer Vision. pp. 325–340. Springer (2024)

2024
[15]

IEEE Transactions on Artificial Intelligence (2025) 16 C.-Y

Dey, A., Lu, C.Y., Comport, A.I., Sridhar, S., Lin, C.T., Martinet, J.: Hfgaussian: Learning generalizable gaussian human with integrated human features. IEEE Transactions on Artificial Intelligence (2025) 16 C.-Y. Lu et al

2025
[16]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Du, S., Liu, J., Chen, Q., Chen, H.X., Mu, T.J., Yang, S.: Rge-gs: Reward-guided expansive driving scene reconstruction via diffusion priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25756–25764 (2025)

2025
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: Explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12479– 12488 (2023)

2023
[18]

arXiv preprint arXiv:2512.04815 (2025)

Fu, C., Chen, G., Zhang, Y., Yao, K., Xiong, Y., Huang, C., Cui, S., Matsushita, Y., Cao, X.: Robustsplat++: Decoupling densification, dynamics, and illumina- tion for in-the-wild 3dgs. arXiv preprint arXiv:2512.04815 (2025)

work page arXiv 2025
[19]

In: ICCV (2025)

Fu, C., Zhang, Y., Yao, K., Chen, G., Xiong, Y., Huang, C., Cui, S., Cao, X.: Robustsplat: Decoupling densification and dynamics for transient-free 3dgs. In: ICCV (2025)

2025
[20]

Geometrically consistent generalizable gaus- sian splatting,

Hosseinzadeh, M., Chng, S.F., Xu, Y., Lucey, S., Reid, I., Garg, R.: G3splat: Geometrically consistent generalizable gaussian splatting. arXiv preprint arXiv:2512.17547 (2025)

work page arXiv 2025
[21]

SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images

Huang, X., Liu, X., Wan, Y., Zheng, Z., Zhang, B., Xiong, M., Pei, Y., Zhang, Y.: Skysplat: Generalizable 3d gaussian splatting from multi-temporal sparse satellite images. arXiv preprint arXiv:2508.09479 (2025)

work page internal anchor Pith review arXiv 2025
[22]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 406–413 (2014)

2014
[23]

ACM Transactions on Graphics (TOG)44(6), 1–16 (2025)

Jiang, L., Mao, Y., Xu, L., Lu, T., Ren, K., Jin, Y., Xu, X., Yu, M., Pang, J., Zhao, F., et al.: Anysplat: Feed-forward 3d gaussian splatting from unconstrained views. ACM Transactions on Graphics (TOG)44(6), 1–16 (2025)

2025
[24]

International Journal of Computer Vision129(2), 517–547 (2021)

Jin, Y., Mishkin, D., Mishchuk, A., Matas, J., Fua, P., Yi, K.M., Trulls, E.: Image matching across wide baselines: From paper to practice. International Journal of Computer Vision129(2), 517–547 (2021)

2021
[25]

ilrm: An iterative large 3d reconstruction model.arXiv preprint arXiv:2507.23277, 2025

Kang,G.,Nam,S.,Yang,S.,Sun,X.,Khamis,S.,Mohamed,A.,Park,E.:ilrm:An iterative large 3d reconstruction model. arXiv preprint arXiv:2507.23277 (2025)

work page arXiv 2025
[26]

In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Kästingschäfer, M., Gieruc, T., Bernhard, S., Campbell, D., Insafutdinov, E., Na- jafli, E., Brox, T.: Seed4d: A synthetic ego-exo dynamic 4d data generator, driving dataset and benchmark. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 7752–7764. IEEE (2025)

2025
[27]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

2023
[28]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023)

2023
[29]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Kong, H., Yang, X., Wang, X.: Rogsplat: Robust gaussian splatting via generative priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25735–25745 (2025)

2025
[30]

In: Proceedings of the 38th International Con- ference on Neural Information Processing Systems (2024)

Kulhanek, J., Peng, S., Kukelova, Z., Pollefeys, M., Sattler, T.: WildGaussians: 3D gaussian splatting in the wild. In: Proceedings of the 38th International Con- ference on Neural Information Processing Systems (2024)

2024
[31]

In: Proceedings of the 39th International Con- ference on Neural Information Processing Systems (NeurIPS 2025) (2025) DF3DV-1K 17

Kulhanek, J., Sattler, T.: Nerfbaselines: Consistent and reproducible evaluation of novel view synthesis methods. In: Proceedings of the 39th International Con- ference on Neural Information Processing Systems (NeurIPS 2025) (2025) DF3DV-1K 17

2025
[32]

IEEE Robotics and Automa- tion Letters11(1), 778–785 (2025)

Lee, J., Yang, G., Ma, S., Cho, Y.: Freeze-frame with staticnerf: Uncertainty- guided nerf map reconstruction in dynamic scenes. IEEE Robotics and Automa- tion Letters11(1), 778–785 (2025)

2025
[33]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Li, C., Shi, Z., Lu, Y., He, W., Xu, X.: Robust neural rendering in the wild with asymmetric dual 3d gaussian splatting. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

2025
[34]

Egosplat: Open-vocabulary egocentric scene understanding with language embed- ded 3d gaussian splatting,

Li, D., Feng, J., Chen, J., Dong, W., Li, G., Shi, G., Jiao, L.: Egosplat: Open- vocabulary egocentric scene understanding with language embedded 3d gaussian splatting. arXiv preprint arXiv:2503.11345 (2025)

work page arXiv 2025
[35]

In: International Conference on Machine Learning

Li, J., Cao, J., Guo, Y., Li, W., Zhang, Y.: One diffusion step to real-world super- resolution via flow trajectory distillation. In: International Conference on Machine Learning. pp. 34044–34053. PMLR (2025)

2025
[36]

Advances in Neural Information Processing Systems35, 13485–13498 (2022)

Li, L., Shen, Z., Wang, Z., Shen, L., Tan, P.: Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems35, 13485–13498 (2022)

2022
[37]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Li, M., Zhai, S., Zhao, Z., Sun, L., Wang, X., Li, D., Liu, S., Wang, H.: Wild3a: Novel view synthesis from any dynamic images in seconds. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 7472–7480 (2025)

2025
[38]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Li, R., Cheung, Y.m.: Modeling and identifying distractors with curriculum for robust 3d gaussian splatting. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 10122–10131 (2025)

2025
[39]

In: 2024 IEEE International Conference on Robotics and Au- tomation (ICRA)

Li, Y., Wu, J., Zhao, L., Liu, P.: Derainnerf: 3d scene estimation with adhesive waterdrop removal. In: 2024 IEEE International Conference on Robotics and Au- tomation (ICRA). pp. 2787–2793. IEEE (2024)

2024
[40]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: Dynibar: Neural dynamic image-based rendering. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 4273–4284 (2023)

2023
[41]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Lin, J., Gu, J., Fan, L., Wu, B., Lou, Y., Chen, R., Liu, L., Ye, J.: Hybridgs: De- coupling transients and statics with 2d and 3d gaussian splatting. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 788–797 (2025)

2025
[42]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Lin, K.E., Xiao, L., Liu, F., Yang, G., Ramamoorthi, R.: Deep 3d mask volume for view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1749–1758 (2021)

2021
[43]

ACM Transactions on Graph- ics (TOG)44(6), 1–21 (2025)

Lin, X., Yu, F., Hu, J., You, Z., Shi, W., Ren, J.S., Gu, J., Dong, C.: Harnessing diffusion-yielded score priors for image restoration. ACM Transactions on Graph- ics (TOG)44(6), 1–21 (2025)

2025
[44]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Ling, H., Xu, X., Sun, Y., Sun, Q.: Ocsplats: Observation completeness quan- tification and label noise separation in 3dgs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25680–25689 (2025)

2025
[45]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ling, L., Sheng, Y., Tu, Z., Zhao, W., Xin, C., Wan, K., Yu, L., Guo, Q., Yu, Z., Lu, Y., et al.: Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22160–22169 (2024)

2024
[46]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infi- nite nature: Perpetual view generation of natural scenes from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14458–14467 (2021)

2021
[47]

Realx3d: A physically-degraded 3d benchmark for multi-view visual restoration and recon- struction.arXiv preprint arXiv:2512.23437, 2025

Liu, S., Bao, C., Cui, Z., Liu, Y., Chu, X., Gu, L., Conde, M.V., Umagami, R., Hashimoto, T., Hu, Z., et al.: Realx3d: A physically-degraded 3d benchmark for multi-view visual restoration and reconstruction. arXiv preprint arXiv:2512.23437 (2025) 18 C.-Y. Lu et al

work page arXiv 2025
[48]

Liu, S., Chen, X., Chen, H., Xu, Q., Li, M.: Deraings: Gaussian splatting for enhancedscenereconstructioninrainyenvironments.In:ProceedingsoftheAAAI Conference on Artificial Intelligence. vol. 39, pp. 5558–5566 (2025)

2025
[49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lu, C.Y., Zhou, P., Xing, A., Pokhariya, C., Dey, A., Shah, I.N., Mavidipalli, R., Hu, D., Comport, A.I., Chen, K., et al.: Diva-360: The dynamic visual dataset for immersive neural fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22466–22476 (2024)

2024
[50]

arXiv preprint arXiv:2412.00155 (2024)

Markin, A., Pryadilshchikov, V., Komarichev, A., Rakhimov, R., Wonka, P., Bur- naev, E.: T-3dgs: Removing transient objects for 3d scene reconstruction. arXiv preprint arXiv:2412.00155 (2024)

work page arXiv 2024
[51]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: Nerf in the wild: Neural radiance fields for unconstrained photo collections. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7210–7219 (2021)

2021
[52]

Com- munications of the ACM65(1), 99–106 (2021)

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Com- munications of the ACM65(1), 99–106 (2021)

2021
[53]

arXiv preprint arXiv:2504.01960 (2025)

Mithun, N.C., Pham, T., Wang, Q., Southall, B., Minhas, K., Matei, B., Mandt, S., Samarasekera, S., Kumar, R.: Diffusion-guided gaussian splatting for large- scale unconstrained 3d reconstruction and novel view synthesis. arXiv preprint arXiv:2504.01960 (2025)

work page arXiv 2025
[54]

ACM transactions on graphics (TOG) 41(4), 1–15 (2022)

Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG) 41(4), 1–15 (2022)

2022
[55]

Transactions on Machine Learning Research Journal (2024)

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez,P.,Haziza,D.,Massa,F.,El-Nouby,A.,etal.:Dinov2:Learningrobust visual features without supervision. Transactions on Machine Learning Research Journal (2024)

2024
[56]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Otonari, T., Ikehata, S., Aizawa, K.: Entity-nerf: Detecting and removing mov- ing entities in urban scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20892–20901 (2024)

2024
[57]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Park, W., Nam, M., Kim, S., Jo, S., Lee, S.: Forestsplats: Deformable transient field for gaussian splatting in the wild. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 6978–6987 (2026)

2026
[58]

One-step image translation with text-to-image models.arXiv preprint arXiv:2403.12036, 2024

Parmar, G., Park, T., Narasimhan, S., Zhu, J.Y.: One-step image translation with text-to-image models. arXiv preprint arXiv:2403.12036 (2024)

work page arXiv 2024
[59]

arXiv preprint arXiv:2602.15516 (2026)

Prabakaran, A., Shukla, P.: Semantic-guided 3d gaussian splatting for transient object removal. arXiv preprint arXiv:2602.15516 (2026)

work page arXiv 2026
[60]

In: The Thirteenth International Conference on Learning Representations (2025)

Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. In: The Thirteenth International Conference on Learning Representations (2025)

2025
[61]

In: Proceedings of the IEEE/CVF international conference on computer vision

Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3d: Large-scale learning and evaluation of real-life 3d cat- egory reconstruction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 10901–10911 (2021)

2021
[62]

In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition

Rematas, K., Liu, A., Srinivasan, P.P., Barron, J.T., Tagliasacchi, A., Funkhouser, T., Ferrari, V.: Urban radiance fields. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 12932–12942 (2022)

2022
[63]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR) (2024) DF3DV-1K 19

Ren, W., Zhu, Z., Sun, B., Chen, J., Pollefeys, M., Peng, S.: Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR) (2024) DF3DV-1K 19

2024
[64]

ACM Transactions on Graphics44(2), 1–11 (2025)

Sabour, S., Goli, L., Kopanas, G., Matthews, M., Lagun, D., Guibas, L., Jacobson, A., Fleet, D., Tagliasacchi, A.: Spotlesssplats: Ignoring distractors in 3d gaussian splatting. ACM Transactions on Graphics44(2), 1–11 (2025)

2025
[65]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sabour, S., Vora, S., Duckworth, D., Krasin, I., Fleet, D.J., Tagliasacchi, A.: Robustnerf: Ignoring distractors with robust losses. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20626– 20636 (2023)

2023
[66]

In: European Conference on Computer Vision

Sauer, A., Lorenz, D., Blattmann, A., Rombach, R.: Adversarial diffusion distilla- tion. In: European Conference on Computer Vision. pp. 87–103. Springer (2024)

2024
[67]

In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

2016
[68]

In: European Conference on Computer Vision (ECCV) (2016)

Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise view selection for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV) (2016)

2016
[69]

IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

Song, L., Chen, A., Li, Z., Chen, Z., Chen, L., Yuan, J., Xu, Y., Geiger, A.: Nerfplayer: A streamable dynamic scene representation with decomposed neu- ral radiance fields. IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

2023
[70]

Sdd-4dgs: Static-dynamic aware decoupling in gaus- sian splatting for 4d scene reconstruction,

Sun, D., Guan, H., Zhang, K., Xie, X., Zhou, S.K.: Sdd-4dgs: Static-dynamic aware decoupling in gaussian splatting for 4d scene reconstruction. arXiv preprint arXiv:2503.09332 (2025)

work page arXiv 2025
[71]

Tang, J., Gao, Y., Yang, D., Yan, L., Yue, Y., Yang, Y.: Dronesplat: 3d gaussian splattingforrobust3dreconstructionfromin-the-wilddroneimagery.In:Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 833–843 (2025)

2025
[72]

Advances in neural information processing systems 36, 1363–1389 (2023)

Tang, L., Jia, M., Wang, Q., Phoo, C.P., Hariharan, B.: Emergent correspon- dence from image diffusion. Advances in neural information processing systems 36, 1363–1389 (2023)

2023
[73]

Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024

Tang, Y., Xu, D., Hou, Y., Wang, Z., Jiang, M.: Nexussplats: Efficient 3d gaussian splatting in the wild. arXiv preprint arXiv:2411.14514 (2024)

work page arXiv 2024
[74]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Trevithick, A., Paiss, R., Henzler, P., Verbin, D., Wu, R., Alzayer, H., Gao, R., Poole, B., Barron, J.T., Holynski, A., et al.: Simvs: Simulating world inconsis- tencies for robust view synthesis. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16464–16474 (2025)

2025
[75]

In: European conference on computer vision

Tung, J., Chou, G., Cai, R., Yang, G., Zhang, K., Wetzstein, G., Hariharan, B., Snavely, N.: Megascenes: Scene-level view synthesis at scale. In: European conference on computer vision. pp. 197–214. Springer (2024)

2024
[76]

In: DAGM German Conference on Pattern Recognition

Ungermann, P., Ettenhofer, A., Nießner, M., Roessle, B.: Robust 3d gaussian splatting for novel view synthesis in presence of distractors. In: DAGM German Conference on Pattern Recognition. pp. 153–167. Springer (2024)

2024
[77]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Wang, F., Tan, S., Li, X., Tian, Z., Song, Y., Liu, H.: Mixed neural voxels for fast multi-view video synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19706–19716 (2023)

2023
[78]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wang, Q., Wang, Z., Genova, K., Srinivasan, P.P., Zhou, H., Barron, J.T., Martin- Brualla, R., Snavely, N., Funkhouser, T.: Ibrnet: Learning multi-view image-based rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2021)

2021
[79]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision

Wang, R., Lohmeyer, Q., Meboldt, M., Tang, S.: Degauss: Dynamic-static decom- position with gaussian splatting for distractor-free 3d reconstruction. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6294–6303 (2025) 20 C.-Y. Lu et al

2025
[80]

Neurocomputing618, 129112 (2025)

Wang, S., Xu, H., Li, Y., Chen, J., Tan, G.: Ie-nerf: Exploring transient mask inpainting to enhance neural radiance fields in the wild. Neurocomputing618, 129112 (2025)

2025

Showing first 80 references.