DENSER: Depth-Guided Ensemble with Staged EFA-GS Reconstruction for Soccer Novel View Synthesis
Pith reviewed 2026-06-28 17:06 UTC · model grok-4.3
The pith
DENSER extends EFA-GS for soccer novel view synthesis by adding camera-height loss weighting, monocular depth supervision, and a three-model ensemble.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DENSER extends EFA-GS with camera-height-based loss weighting that prioritises ground-level broadcast views, monocular depth supervision from Depth-Anything-V2 to regularise geometry in textureless regions, and a three-model pixel-average ensemble whose members diverge from a shared base checkpoint by varying training length and Gaussian scale clamping. On five held-out challenge scenes we achieve a mean PSNR of 29.89 dB, SSIM of 0.791, and LPIPS of 0.366.
What carries the argument
The depth-guided ensemble with staged EFA-GS reconstruction, where camera-height loss weighting and monocular depth estimates regularize the scene geometry before pixel averaging across models.
If this is right
- Ground-level views receive stronger influence during optimization, aligning the reconstruction with common broadcast camera placements.
- Textureless areas such as field markings gain geometric constraints that reduce floating artifacts in synthesized frames.
- Averaging three models trained from the same checkpoint but with different lengths and scale clamps lowers variance in the final output.
- Staged reconstruction allows progressive refinement of both appearance and geometry without restarting from scratch.
Where Pith is reading between the lines
- The same height-weighting and depth terms could be tested on other multi-camera sports datasets to check transfer beyond soccer.
- Replacing the specific monocular depth model with alternatives would reveal how much the final metrics depend on the choice of depth estimator.
- The ensemble averaging step might be replaced by a single longer training run with scheduled scale clamping to measure whether multiple models are strictly necessary.
Load-bearing premise
Monocular depth estimates provide reliable regularization of scene geometry in textureless regions.
What would settle it
Running the base EFA-GS model versus the full DENSER pipeline on the same five held-out scenes and checking whether PSNR, SSIM, and LPIPS improve only when the depth supervision term is active.
Figures
read the original abstract
We propose DENSER, a Depth-guided ENSemble with Staged EFA-GS Reconstruction for soccer novel view synthesis. DENSER extends EFA-GS with three key contributions: (1) camera-height-based loss weighting that prioritises ground-level broadcast views, (2) monocular depth supervision from Depth-Anything-V2 to regularise geometry in textureless regions, and (3) a three-model pixel-average ensemble whose members diverge from a shared base checkpoint by varying training length and Gaussian scale clamping. On five held-out challenge scenes we achieve a mean PSNR of 29.89 dB, SSIM of 0.791, and LPIPS of 0.366.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DENSER, an extension of EFA-GS for soccer novel view synthesis. It adds three contributions: (1) camera-height-based loss weighting prioritizing ground-level views, (2) monocular depth supervision from Depth-Anything-V2 to regularize geometry in textureless regions, and (3) a three-model pixel-average ensemble with members diverging via training length and Gaussian scale clamping. On five held-out challenge scenes, it reports mean PSNR 29.89 dB, SSIM 0.791, and LPIPS 0.366.
Significance. If the depth supervision and ensemble claims are substantiated with ablations and baselines, the work could provide a practical engineering advance for geometry regularization in broadcast soccer NVS, where textureless pitch regions and varying camera heights are common. The staged reconstruction and ensemble approach are straightforward extensions that could be adopted if shown to be robust.
major comments (1)
- [Abstract, contribution (2)] Abstract, contribution (2): the claim that monocular depth supervision from Depth-Anything-V2 regularizes geometry where RGB losses are weak is load-bearing for the reported metrics, yet no depth-error quantification, GT comparison on soccer scenes, or ablation isolating this term is provided. If Depth-Anything-V2 exhibits systematic bias or high variance on uniform pitches and moving players, the geometry improvements and final PSNR/SSIM/LPIPS values cannot be attributed to this component.
minor comments (1)
- [Abstract] Abstract: no baselines, ablation studies, error bars, or dataset details are supplied, preventing verification that the stated metrics exceed prior EFA-GS results or other methods on the same five scenes.
Simulated Author's Rebuttal
We thank the editor and the referee for the constructive feedback on our manuscript. We address the major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract, contribution (2)] Abstract, contribution (2): the claim that monocular depth supervision from Depth-Anything-V2 regularizes geometry where RGB losses are weak is load-bearing for the reported metrics, yet no depth-error quantification, GT comparison on soccer scenes, or ablation isolating this term is provided. If Depth-Anything-V2 exhibits systematic bias or high variance on uniform pitches and moving players, the geometry improvements and final PSNR/SSIM/LPIPS values cannot be attributed to this component.
Authors: We agree that an ablation isolating the monocular depth supervision term is necessary to substantiate its contribution. In the revised manuscript we will add a controlled ablation that trains identical base models with and without the Depth-Anything-V2 depth loss on the same five held-out scenes and reports the resulting differences in PSNR, SSIM and LPIPS. This will directly quantify the performance impact of the term. We note, however, that the soccer broadcast dataset provides no ground-truth depth maps, so quantitative depth-error metrics or direct GT comparisons are not feasible; we will instead supply qualitative depth-map visualizations from novel views (with and without the supervision) to demonstrate regularization in textureless regions. These additions will allow readers to assess whether Depth-Anything-V2 introduces systematic bias on pitches or players. revision: yes
- Ground-truth depth maps are unavailable in the soccer scenes, preventing any quantitative depth-error quantification or GT comparison.
Circularity Check
No circularity: empirical extension evaluated on held-out data
full rationale
The paper describes an engineering extension of prior EFA-GS work via camera-height loss weighting, Depth-Anything-V2 depth supervision, and a three-model ensemble, with results reported on five held-out challenge scenes. No equations, derivations, or self-citation chains are present that reduce any claimed prediction or result to fitted inputs by construction. The central metrics (PSNR 29.89 dB etc.) are obtained from external test scenes and an off-the-shelf depth model, so the derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Low-frequency first: Eliminating floating artifacts in 3d gaus- 2 sian splatting, 2025
Jianchao Wang, Peng Zhou, Cen Li, Rong Quan, and Jie Qin. Low-frequency first: Eliminating floating artifacts in 3d gaus- 2 sian splatting, 2025. 1
2025
-
[2]
Mip-splatting: Alias-free 3d gaussian splat- ting.Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting.Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1
2024
-
[3]
3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), 2023. 1, 2
2023
-
[4]
SoccerNet novel view synthesis challenge
SoccerNet-NVS. SoccerNet novel view synthesis challenge. https://github.com/SoccerNet/sn-nvs, 2025. 1, 2
2025
-
[5]
Depth anything: Unleashing the power of large-scale unlabeled data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InCVPR, 2024. 1
2024
-
[6]
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2. arXiv:2406.09414, 2024. 1
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
Triangle splatting for real-time radiance field rendering.arXiv, 2025
Jan Held, Renaud Vandeghen, Adrien Deliege, Abdul- lah Hamdi, Anthony Cioppa, Silvio Giancola, Andrea Vedaldi, Bernard Ghanem, Andrea Tagliasacchi, and Marc Van Droogenbroeck. Triangle splatting for real-time radiance field rendering.arXiv, 2025. 2 3
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.