pith. machine review for the scientific record. sign in

arxiv: 2604.13416 · v1 · submitted 2026-04-15 · 💻 cs.CV · cs.AI

Recognition: unknown

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:22 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords distractor-freenovel view synthesisradiance fieldsdatasetbenchmark3D reconstructionreal-world scenescasual capture
0
0 comments X

The pith

DF3DV-1K supplies 1,048 real scenes each with clean and cluttered image sets to benchmark distractor-free novel view synthesis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to close the data gap for methods that reconstruct scenes while ignoring unwanted objects in casual photographs. It releases DF3DV-1K, a collection of 1,048 scenes captured with ordinary cameras that supplies both clean and cluttered image sets for every scene, plus a focused 41-scene subset for hard cases. The authors run nine recent distractor-free radiance-field techniques plus 3D Gaussian Splatting on the data and measure which ones hold up across indoor and outdoor conditions. They further show that fine-tuning a diffusion enhancer on the new pairs raises PSNR by roughly one decibel on held-out scenes. A shared, large-scale resource of this kind would let researchers compare approaches on consistent real-world clutter instead of isolated small tests.

Core claim

DF3DV-1K comprises 1,048 real-world scenes, each furnished with paired clean and cluttered image sets totaling 89,924 frames taken by consumer cameras, covering 128 distractor types and 161 scene themes. A curated 41-scene subset, DF3DV-41, isolates challenging capture conditions. Benchmarking of nine distractor-free radiance field methods and 3D Gaussian Splatting identifies relative robustness, while fine-tuning a diffusion-based 2D enhancer on DF3DV-1K yields average gains of 0.96 dB PSNR and 0.057 LPIPS on held-out data including DF3DV-41 and the On-the-go dataset.

What carries the argument

The DF3DV-1K dataset itself, which pairs clean and cluttered photographs of the same 1,048 scenes to let methods be tested on their ability to suppress distractors during novel-view reconstruction.

If this is right

  • Researchers gain a standardized way to compare how different methods handle the same real-world clutter across hundreds of scenes.
  • Benchmark results highlight which current techniques cope best with particular distractor categories and environments.
  • The paired clean-cluttered images enable direct measurement of improvement when enhancers or filters are added to existing pipelines.
  • The scale supports training of more general models that do not require per-scene tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use of the dataset could shift evaluation away from controlled lab captures toward everyday uncontrolled photography.
  • The clean-cluttered pairs could be used to train models that remove distractors jointly with reconstruction rather than as a separate step.
  • Applications such as AR overlays or virtual tours from tourist photos would become more reliable if methods prove robust on this data.

Load-bearing premise

The collected scenes, distractor types, and capture conditions are representative enough of everyday casual photography to support general claims about method robustness.

What would settle it

A distractor-free method that scores high on DF3DV-1K but produces visibly degraded views on an independent collection of casual photos containing new distractors or scene types would show the benchmark does not support broad conclusions.

Figures

Figures reproduced from arXiv: 2604.13416 by Charlie Li-Ting Tsai, Cheng-You Lu, Chin-Teng Lin, Hao-Ping Wang, Thomas Do, Wei-Ling Chi, Yi-Shan Hung, Yu-Cheng Chang, Yu-Lun Liu.

Figure 1
Figure 1. Figure 1: Features of the DF3DV-1K dataset and benchmark. (a) DF3DV-1K [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of radiance fields. (a) Distractor-free tasks [63] use casually captured images within a short period. (b) In-the-wild tasks [24,51], with large temporal gaps, often target images collected across seasons. (c) Dynamic tasks [89] target 4D scene synthesis and assume densely captured sequential data. Their variants such as in-the-wild (with large temporal gap) [3, 10, 11, 14, 18, 30, 33, 37, 51, 5… view at source ↗
Figure 3
Figure 3. Figure 3: Number of distractor-free radiance field papers. The first distractor-free radiance field method [65], targeting images captured over short time spans, was introduced in 2023 together with a benchmark. The research area rapidly gained attention in 2024 with the release of the On-the-go benchmark [63]. Although the number of works doubled in 2025, a public, large-scale, challenging dataset and benchmark sys… view at source ↗
Figure 4
Figure 4. Figure 4: Samples of systematically designed scenarios in DF3DV-41. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Dataset difficulty analysis via per-image performance distributions. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the data collection and curation pipeline. (a) Scene de [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of DF3DV-1K by capture device, resolution, and month [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Scene count distribution. Clean images per scene skew slightly toward lower bins for efficient novel-view benchmarking, whereas cluttered images extend toward higher bins to capture diverse distractor conditions. Total images per scene remain comparable to typical non-sparse radiance field settings. acquisition and thereby aligns with our dataset design goals. Furthermore, col￾lecting data ourselves rather… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative results of the radiance field methods on DF3DV-41. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results of enhancers. Leveraging DF3DV-1K*, a large-scale and diverse dataset, DI2FIX effectively removes distractor artifacts (e.g., dynamic chess pieces and vegetable artifacts) while inpainting occluded regions in target views. GT Target (RobustSplat) DF3DV-250 DF3DV-500 DF3DV-750 DF3DV-1K GT Target (3DGS) DF3DV-250 DF3DV-500 DF3DV-750 DF3DV-1K [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative results of DI2FIX trained with different data scales. Increasing the amount and diversity of training data improves robustness. In particular, DI2FIX progressively removes distractor artifacts (e.g., the chess pieces) while avoiding incorrect modifications to static scene content (e.g., the animal toy). training data by reconstructing scenes from cluttered images in DF3DV-1K* (ex￾cluding DF3DV… view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative results of DI2FIX trained using different LPIPS filtering thresholds. Overly strict thresholds (e.g., 0.1) exclude many challenging image pairs, making artifacts difficult to remove (e.g., floaters around the cutting board). Overly loose thresholds (e.g., 0.9) introduce excessive noisy training samples, which may lead to undesired modifications of scene content (e.g., disappearing game cards).… view at source ↗
read the original abstract

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes with clean and cluttered image sets, totaling 89,924 images across 128 distractor types and 161 scene themes. Using this dataset, the authors benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identify robust methods and challenging scenarios, and demonstrate an application by fine-tuning a diffusion-based enhancer that achieves average improvements of 0.96 dB PSNR and 0.057 LPIPS on held-out sets including DF3DV-41 and the On-the-go dataset.

Significance. Should the dataset prove representative of real-world casual photography challenges, DF3DV-1K would be a significant contribution as the first large-scale benchmark specifically designed for distractor-free novel view synthesis. It enables comprehensive evaluation of method robustness, highlights challenging scenarios, and provides a resource for developing enhancers, potentially accelerating progress in handling distractors in radiance fields beyond controlled settings. The paired clean/cluttered design per scene is particularly valuable.

major comments (1)
  1. [Dataset Construction and DF3DV-41 Subset] The central assumption that the 1,048 scenes and the systematically designed DF3DV-41 subset are representative of real-world distractor challenges lacks supporting evidence. No quantitative analysis or external validation is provided for the distribution of distractor types, scene themes, capture conditions (e.g., lighting, camera motion), or potential selection biases. This is load-bearing for all benchmarking conclusions and generalizability claims.
minor comments (1)
  1. [Abstract] The total image count and scene numbers are clearly stated, but the paper could benefit from a brief mention of the capture protocol or camera types used to enhance reproducibility.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on the importance of dataset representativeness. We address the major comment below.

read point-by-point responses
  1. Referee: The central assumption that the 1,048 scenes and the systematically designed DF3DV-41 subset are representative of real-world distractor challenges lacks supporting evidence. No quantitative analysis or external validation is provided for the distribution of distractor types, scene themes, capture conditions (e.g., lighting, camera motion), or potential selection biases. This is load-bearing for all benchmarking conclusions and generalizability claims.

    Authors: We appreciate the referee highlighting this point, as the dataset's utility for benchmarking and generalizability does depend on its scope. DF3DV-1K was constructed by capturing 1,048 real scenes with consumer cameras to emulate casual photography, deliberately spanning 128 distractor types and 161 scene themes across indoor and outdoor environments, with each scene providing paired clean and cluttered image sets. The DF3DV-41 subset was curated to include challenging combinations of distractors, lighting, and motion. However, we agree that the manuscript currently lacks explicit quantitative breakdowns (e.g., frequency distributions or tables of distractor/scene coverage) and a dedicated discussion of selection biases or capture-condition statistics. We will revise the paper to include these: (1) summary statistics and visualizations of distractor-type and theme distributions, (2) details on capture variations where recorded, and (3) an expanded limitations section that clarifies the design process for DF3DV-41 and tempers generalizability claims to the observed diversity rather than asserting full real-world representativeness. External validation against independent large-scale statistics on casual photography distractors is not feasible within the current scope without new data collection, but the added internal analysis will make the benchmarking conclusions more transparent and defensible. revision: partial

standing simulated objections not resolved
  • External validation of representativeness against independent real-world statistics on distractor distributions and casual photography conditions

Circularity Check

0 steps flagged

No circularity: empirical dataset and benchmark with no derivations or self-referential predictions

full rationale

The paper introduces DF3DV-1K as a new real-world dataset of 1,048 scenes with clean/cluttered pairs and benchmarks nine prior distractor-free radiance field methods plus 3DGS on it, followed by a standard fine-tuning demonstration of a diffusion enhancer evaluated on held-out data. No equations, fitted parameters renamed as predictions, self-definitional claims, or load-bearing self-citations appear in the derivation chain; the work consists entirely of data collection, empirical evaluation against external methods, and an application that uses the dataset as training input with separate held-out testing. The representativeness assumption for general conclusions is an empirical limitation but does not create circularity by reducing any claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper contributes an empirical dataset and benchmark rather than a derivation; it rests on standard computer-vision assumptions about image formation and radiance-field applicability but introduces no new free parameters or invented entities.

axioms (2)
  • domain assumption Radiance fields enable photorealistic novel view synthesis from image collections
    Invoked in the opening sentence to motivate the need for distractor-free methods.
  • domain assumption Consumer-camera captures with casual distractors represent realistic usage scenarios
    Used to justify the dataset's relevance to practical applications.

pith-pipeline@v0.9.0 · 5616 in / 1443 out tokens · 53737 ms · 2026-05-10T13:22:00.776002+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

102 extracted references · 16 canonical work pages · 3 internal anchors

  1. [1]

    Remote Sensing17(9), 1609 (2025)

    Bai, N., Yang, A., Chen, H., Du, C.: Satgs: Remote sensing novel view synthe- sis using multi-temporal satellite images with appearance-adaptive 3dgs. Remote Sensing17(9), 1609 (2025)

  2. [2]

    In: The Fourteenth International Conference on Learning Representations (2026)

    Bao, Y., Liao, J., Huo, J., Gao, Y.: Distractor-free generalizable 3d gaussian splat- ting. In: The Fourteenth International Conference on Learning Representations (2026)

  3. [3]

    In: Proceedings of the 33rd ACM International Conference on Multimedia

    Bao, Y., Tang, C., Wang, Y., Li, H.: Seg-wild: Interactive segmentation based on 3d gaussian splatting for unconstrained image collections. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 8567–8576 (2025)

  4. [4]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6228–6237 (2018)

  5. [5]

    In: International Conference on Machine Learning

    Blau, Y., Michaeli, T.: Rethinking lossy compression: The rate-distortion- perception tradeoff. In: International Conference on Machine Learning. pp. 675–

  6. [6]

    Broxton, M., Flynn, J., Overbeck, R., Erickson, D., Hedman, P., Duvall, M., Dourgarian, J., Busch, J., Whalen, M., Debevec, P.: Immersive light field video withalayeredmeshrepresentation.ACMTransactionsonGraphics(TOG)39(4), 86–1 (2020)

  7. [7]

    In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition

    Cao, A., Johnson, J.: Hexplane: A fast representation for dynamic scenes. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 130–141 (2023)

  8. [8]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Cao, J., Wu, H., Feng, Z., Bao, H., Zhou, X., Peng, S.: Universe: Unleashing the scene prior of video diffusion models for robust radiance field reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 27031–27041 (2025)

  9. [9]

    In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

    Charatan, D., Li, S.L., Tagliasacchi, A., Sitzmann, V.: pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19457–19467 (2024)

  10. [10]

    CVPR (2024)

    Chen, J., Qin, Y., Liu, L., Lu, J., Li, G.: Nerf-hugs: Improved neural radiance fields in non-static scenes using heuristics-guided segmentation. CVPR (2024)

  11. [11]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Chen, X., Zhang, Q., Li, X., Chen, Y., Feng, Y., Wang, X., Wang, J.: Hallucinated neural radiance fields in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12943–12952 (2022)

  12. [12]

    Chen, X., Zhou, W., Cheng, Z.: Wildrayzer: Self-supervised large view synthesis in dynamicenvironments.In:ProceedingsoftheIEEE/CVFconferenceoncomputer vision and pattern recognition (2026)

  13. [13]

    In: European conference on computer vision

    Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In: European conference on computer vision. pp. 370–386. Springer (2024)

  14. [14]

    In: European Confer- ence on Computer Vision

    Dahmani, H., Bennehar, M., Piasco, N., Roldao, L., Tsishkou, D.: Swag: Splatting in the wild images with appearance-conditioned gaussians. In: European Confer- ence on Computer Vision. pp. 325–340. Springer (2024)

  15. [15]

    IEEE Transactions on Artificial Intelligence (2025) 16 C.-Y

    Dey, A., Lu, C.Y., Comport, A.I., Sridhar, S., Lin, C.T., Martinet, J.: Hfgaussian: Learning generalizable gaussian human with integrated human features. IEEE Transactions on Artificial Intelligence (2025) 16 C.-Y. Lu et al

  16. [16]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Du, S., Liu, J., Chen, Q., Chen, H.X., Mu, T.J., Yang, S.: Rge-gs: Reward-guided expansive driving scene reconstruction via diffusion priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25756–25764 (2025)

  17. [17]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: Explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12479– 12488 (2023)

  18. [18]

    arXiv preprint arXiv:2512.04815 (2025)

    Fu, C., Chen, G., Zhang, Y., Yao, K., Xiong, Y., Huang, C., Cui, S., Matsushita, Y., Cao, X.: Robustsplat++: Decoupling densification, dynamics, and illumina- tion for in-the-wild 3dgs. arXiv preprint arXiv:2512.04815 (2025)

  19. [19]

    In: ICCV (2025)

    Fu, C., Zhang, Y., Yao, K., Chen, G., Xiong, Y., Huang, C., Cui, S., Cao, X.: Robustsplat: Decoupling densification and dynamics for transient-free 3dgs. In: ICCV (2025)

  20. [20]

    Geometrically consistent generalizable gaus- sian splatting,

    Hosseinzadeh, M., Chng, S.F., Xu, Y., Lucey, S., Reid, I., Garg, R.: G3splat: Geometrically consistent generalizable gaussian splatting. arXiv preprint arXiv:2512.17547 (2025)

  21. [21]

    SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images

    Huang, X., Liu, X., Wan, Y., Zheng, Z., Zhang, B., Xiong, M., Pei, Y., Zhang, Y.: Skysplat: Generalizable 3d gaussian splatting from multi-temporal sparse satellite images. arXiv preprint arXiv:2508.09479 (2025)

  22. [22]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 406–413 (2014)

  23. [23]

    ACM Transactions on Graphics (TOG)44(6), 1–16 (2025)

    Jiang, L., Mao, Y., Xu, L., Lu, T., Ren, K., Jin, Y., Xu, X., Yu, M., Pang, J., Zhao, F., et al.: Anysplat: Feed-forward 3d gaussian splatting from unconstrained views. ACM Transactions on Graphics (TOG)44(6), 1–16 (2025)

  24. [24]

    International Journal of Computer Vision129(2), 517–547 (2021)

    Jin, Y., Mishkin, D., Mishchuk, A., Matas, J., Fua, P., Yi, K.M., Trulls, E.: Image matching across wide baselines: From paper to practice. International Journal of Computer Vision129(2), 517–547 (2021)

  25. [25]

    ilrm: An iterative large 3d reconstruction model.arXiv preprint arXiv:2507.23277, 2025

    Kang,G.,Nam,S.,Yang,S.,Sun,X.,Khamis,S.,Mohamed,A.,Park,E.:ilrm:An iterative large 3d reconstruction model. arXiv preprint arXiv:2507.23277 (2025)

  26. [26]

    In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

    Kästingschäfer, M., Gieruc, T., Bernhard, S., Campbell, D., Insafutdinov, E., Na- jafli, E., Brox, T.: Seed4d: A synthetic ego-exo dynamic 4d data generator, driving dataset and benchmark. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 7752–7764. IEEE (2025)

  27. [27]

    ACM Trans

    Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

  28. [28]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023)

  29. [29]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Kong, H., Yang, X., Wang, X.: Rogsplat: Robust gaussian splatting via generative priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25735–25745 (2025)

  30. [30]

    In: Proceedings of the 38th International Con- ference on Neural Information Processing Systems (2024)

    Kulhanek, J., Peng, S., Kukelova, Z., Pollefeys, M., Sattler, T.: WildGaussians: 3D gaussian splatting in the wild. In: Proceedings of the 38th International Con- ference on Neural Information Processing Systems (2024)

  31. [31]

    In: Proceedings of the 39th International Con- ference on Neural Information Processing Systems (NeurIPS 2025) (2025) DF3DV-1K 17

    Kulhanek, J., Sattler, T.: Nerfbaselines: Consistent and reproducible evaluation of novel view synthesis methods. In: Proceedings of the 39th International Con- ference on Neural Information Processing Systems (NeurIPS 2025) (2025) DF3DV-1K 17

  32. [32]

    IEEE Robotics and Automa- tion Letters11(1), 778–785 (2025)

    Lee, J., Yang, G., Ma, S., Cho, Y.: Freeze-frame with staticnerf: Uncertainty- guided nerf map reconstruction in dynamic scenes. IEEE Robotics and Automa- tion Letters11(1), 778–785 (2025)

  33. [33]

    In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

    Li, C., Shi, Z., Lu, Y., He, W., Xu, X.: Robust neural rendering in the wild with asymmetric dual 3d gaussian splatting. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

  34. [34]

    Egosplat: Open-vocabulary egocentric scene understanding with language embed- ded 3d gaussian splatting,

    Li, D., Feng, J., Chen, J., Dong, W., Li, G., Shi, G., Jiao, L.: Egosplat: Open- vocabulary egocentric scene understanding with language embedded 3d gaussian splatting. arXiv preprint arXiv:2503.11345 (2025)

  35. [35]

    In: International Conference on Machine Learning

    Li, J., Cao, J., Guo, Y., Li, W., Zhang, Y.: One diffusion step to real-world super- resolution via flow trajectory distillation. In: International Conference on Machine Learning. pp. 34044–34053. PMLR (2025)

  36. [36]

    Advances in Neural Information Processing Systems35, 13485–13498 (2022)

    Li, L., Shen, Z., Wang, Z., Shen, L., Tan, P.: Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems35, 13485–13498 (2022)

  37. [37]

    In: Proceedings of the 33rd ACM International Conference on Multimedia

    Li, M., Zhai, S., Zhao, Z., Sun, L., Wang, X., Li, D., Liu, S., Wang, H.: Wild3a: Novel view synthesis from any dynamic images in seconds. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 7472–7480 (2025)

  38. [38]

    In: Proceedings of the 33rd ACM International Conference on Multimedia

    Li, R., Cheung, Y.m.: Modeling and identifying distractors with curriculum for robust 3d gaussian splatting. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 10122–10131 (2025)

  39. [39]

    In: 2024 IEEE International Conference on Robotics and Au- tomation (ICRA)

    Li, Y., Wu, J., Zhao, L., Liu, P.: Derainnerf: 3d scene estimation with adhesive waterdrop removal. In: 2024 IEEE International Conference on Robotics and Au- tomation (ICRA). pp. 2787–2793. IEEE (2024)

  40. [40]

    In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

    Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: Dynibar: Neural dynamic image-based rendering. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 4273–4284 (2023)

  41. [41]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Lin, J., Gu, J., Fan, L., Wu, B., Lou, Y., Chen, R., Liu, L., Ye, J.: Hybridgs: De- coupling transients and statics with 2d and 3d gaussian splatting. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 788–797 (2025)

  42. [42]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Lin, K.E., Xiao, L., Liu, F., Yang, G., Ramamoorthi, R.: Deep 3d mask volume for view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1749–1758 (2021)

  43. [43]

    ACM Transactions on Graph- ics (TOG)44(6), 1–21 (2025)

    Lin, X., Yu, F., Hu, J., You, Z., Shi, W., Ren, J.S., Gu, J., Dong, C.: Harnessing diffusion-yielded score priors for image restoration. ACM Transactions on Graph- ics (TOG)44(6), 1–21 (2025)

  44. [44]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Ling, H., Xu, X., Sun, Y., Sun, Q.: Ocsplats: Observation completeness quan- tification and label noise separation in 3dgs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25680–25689 (2025)

  45. [45]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Ling, L., Sheng, Y., Tu, Z., Zhao, W., Xin, C., Wan, K., Yu, L., Guo, Q., Yu, Z., Lu, Y., et al.: Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22160–22169 (2024)

  46. [46]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infi- nite nature: Perpetual view generation of natural scenes from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14458–14467 (2021)

  47. [47]

    Realx3d: A physically-degraded 3d benchmark for multi-view visual restoration and recon- struction.arXiv preprint arXiv:2512.23437, 2025

    Liu, S., Bao, C., Cui, Z., Liu, Y., Chu, X., Gu, L., Conde, M.V., Umagami, R., Hashimoto, T., Hu, Z., et al.: Realx3d: A physically-degraded 3d benchmark for multi-view visual restoration and reconstruction. arXiv preprint arXiv:2512.23437 (2025) 18 C.-Y. Lu et al

  48. [48]

    Liu, S., Chen, X., Chen, H., Xu, Q., Li, M.: Deraings: Gaussian splatting for enhancedscenereconstructioninrainyenvironments.In:ProceedingsoftheAAAI Conference on Artificial Intelligence. vol. 39, pp. 5558–5566 (2025)

  49. [49]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Lu, C.Y., Zhou, P., Xing, A., Pokhariya, C., Dey, A., Shah, I.N., Mavidipalli, R., Hu, D., Comport, A.I., Chen, K., et al.: Diva-360: The dynamic visual dataset for immersive neural fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22466–22476 (2024)

  50. [50]

    arXiv preprint arXiv:2412.00155 (2024)

    Markin, A., Pryadilshchikov, V., Komarichev, A., Rakhimov, R., Wonka, P., Bur- naev, E.: T-3dgs: Removing transient objects for 3d scene reconstruction. arXiv preprint arXiv:2412.00155 (2024)

  51. [51]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: Nerf in the wild: Neural radiance fields for unconstrained photo collections. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7210–7219 (2021)

  52. [52]

    Com- munications of the ACM65(1), 99–106 (2021)

    Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Com- munications of the ACM65(1), 99–106 (2021)

  53. [53]

    arXiv preprint arXiv:2504.01960 (2025)

    Mithun, N.C., Pham, T., Wang, Q., Southall, B., Minhas, K., Matei, B., Mandt, S., Samarasekera, S., Kumar, R.: Diffusion-guided gaussian splatting for large- scale unconstrained 3d reconstruction and novel view synthesis. arXiv preprint arXiv:2504.01960 (2025)

  54. [54]

    ACM transactions on graphics (TOG) 41(4), 1–15 (2022)

    Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG) 41(4), 1–15 (2022)

  55. [55]

    Transactions on Machine Learning Research Journal (2024)

    Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez,P.,Haziza,D.,Massa,F.,El-Nouby,A.,etal.:Dinov2:Learningrobust visual features without supervision. Transactions on Machine Learning Research Journal (2024)

  56. [56]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Otonari, T., Ikehata, S., Aizawa, K.: Entity-nerf: Detecting and removing mov- ing entities in urban scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20892–20901 (2024)

  57. [57]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Park, W., Nam, M., Kim, S., Jo, S., Lee, S.: Forestsplats: Deformable transient field for gaussian splatting in the wild. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 6978–6987 (2026)

  58. [58]

    One-step image translation with text-to-image models.arXiv preprint arXiv:2403.12036, 2024

    Parmar, G., Park, T., Narasimhan, S., Zhu, J.Y.: One-step image translation with text-to-image models. arXiv preprint arXiv:2403.12036 (2024)

  59. [59]

    arXiv preprint arXiv:2602.15516 (2026)

    Prabakaran, A., Shukla, P.: Semantic-guided 3d gaussian splatting for transient object removal. arXiv preprint arXiv:2602.15516 (2026)

  60. [60]

    In: The Thirteenth International Conference on Learning Representations (2025)

    Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. In: The Thirteenth International Conference on Learning Representations (2025)

  61. [61]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3d: Large-scale learning and evaluation of real-life 3d cat- egory reconstruction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 10901–10911 (2021)

  62. [62]

    In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition

    Rematas, K., Liu, A., Srinivasan, P.P., Barron, J.T., Tagliasacchi, A., Funkhouser, T., Ferrari, V.: Urban radiance fields. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 12932–12942 (2022)

  63. [63]

    In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR) (2024) DF3DV-1K 19

    Ren, W., Zhu, Z., Sun, B., Chen, J., Pollefeys, M., Peng, S.: Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR) (2024) DF3DV-1K 19

  64. [64]

    ACM Transactions on Graphics44(2), 1–11 (2025)

    Sabour, S., Goli, L., Kopanas, G., Matthews, M., Lagun, D., Guibas, L., Jacobson, A., Fleet, D., Tagliasacchi, A.: Spotlesssplats: Ignoring distractors in 3d gaussian splatting. ACM Transactions on Graphics44(2), 1–11 (2025)

  65. [65]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Sabour, S., Vora, S., Duckworth, D., Krasin, I., Fleet, D.J., Tagliasacchi, A.: Robustnerf: Ignoring distractors with robust losses. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20626– 20636 (2023)

  66. [66]

    In: European Conference on Computer Vision

    Sauer, A., Lorenz, D., Blattmann, A., Rombach, R.: Adversarial diffusion distilla- tion. In: European Conference on Computer Vision. pp. 87–103. Springer (2024)

  67. [67]

    In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  68. [68]

    In: European Conference on Computer Vision (ECCV) (2016)

    Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise view selection for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV) (2016)

  69. [69]

    IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

    Song, L., Chen, A., Li, Z., Chen, Z., Chen, L., Yuan, J., Xu, Y., Geiger, A.: Nerfplayer: A streamable dynamic scene representation with decomposed neu- ral radiance fields. IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

  70. [70]

    Sdd-4dgs: Static-dynamic aware decoupling in gaus- sian splatting for 4d scene reconstruction,

    Sun, D., Guan, H., Zhang, K., Xie, X., Zhou, S.K.: Sdd-4dgs: Static-dynamic aware decoupling in gaussian splatting for 4d scene reconstruction. arXiv preprint arXiv:2503.09332 (2025)

  71. [71]

    Tang, J., Gao, Y., Yang, D., Yan, L., Yue, Y., Yang, Y.: Dronesplat: 3d gaussian splattingforrobust3dreconstructionfromin-the-wilddroneimagery.In:Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 833–843 (2025)

  72. [72]

    Advances in neural information processing systems 36, 1363–1389 (2023)

    Tang, L., Jia, M., Wang, Q., Phoo, C.P., Hariharan, B.: Emergent correspon- dence from image diffusion. Advances in neural information processing systems 36, 1363–1389 (2023)

  73. [73]

    Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024

    Tang, Y., Xu, D., Hou, Y., Wang, Z., Jiang, M.: Nexussplats: Efficient 3d gaussian splatting in the wild. arXiv preprint arXiv:2411.14514 (2024)

  74. [74]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Trevithick, A., Paiss, R., Henzler, P., Verbin, D., Wu, R., Alzayer, H., Gao, R., Poole, B., Barron, J.T., Holynski, A., et al.: Simvs: Simulating world inconsis- tencies for robust view synthesis. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16464–16474 (2025)

  75. [75]

    In: European conference on computer vision

    Tung, J., Chou, G., Cai, R., Yang, G., Zhang, K., Wetzstein, G., Hariharan, B., Snavely, N.: Megascenes: Scene-level view synthesis at scale. In: European conference on computer vision. pp. 197–214. Springer (2024)

  76. [76]

    In: DAGM German Conference on Pattern Recognition

    Ungermann, P., Ettenhofer, A., Nießner, M., Roessle, B.: Robust 3d gaussian splatting for novel view synthesis in presence of distractors. In: DAGM German Conference on Pattern Recognition. pp. 153–167. Springer (2024)

  77. [77]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Wang, F., Tan, S., Li, X., Tian, Z., Song, Y., Liu, H.: Mixed neural voxels for fast multi-view video synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19706–19716 (2023)

  78. [78]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wang, Q., Wang, Z., Genova, K., Srinivasan, P.P., Zhou, H., Barron, J.T., Martin- Brualla, R., Snavely, N., Funkhouser, T.: Ibrnet: Learning multi-view image-based rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2021)

  79. [79]

    In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision

    Wang, R., Lohmeyer, Q., Meboldt, M., Tang, S.: Degauss: Dynamic-static decom- position with gaussian splatting for distractor-free 3d reconstruction. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6294–6303 (2025) 20 C.-Y. Lu et al

  80. [80]

    Neurocomputing618, 129112 (2025)

    Wang, S., Xu, H., Li, Y., Chen, J., Tan, G.: Ie-nerf: Exploring transient mask inpainting to enhance neural radiance fields in the wild. Neurocomputing618, 129112 (2025)

Showing first 80 references.