HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization
Pith reviewed 2026-05-19 16:58 UTC · model grok-4.3
The pith
Rotating view-specific gradients into orthogonal directions reduces conflicts and improves 3D Gaussian Splatting quality in scenes with transient objects and lighting changes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a conflict-aware 3DGS framework that addresses this problem from both image-space supervision and gradient-level optimization. Semantic Consistency-Guided Masking learns pixel-wise consistency scores to adaptively refine prior masks and suppress unreliable supervision before gradient formation. A dual-view Conflict-Aware Gradient Harmonization strategy further reconciles view-specific gradients by mutually rotating them into an orthogonal configuration, reducing negative directional interference across views. We also introduce conflict-aware densification and pruning to stabilize Gaussian growth and remove persistently conflicting primitives.
What carries the argument
Dual-view Conflict-Aware Gradient Harmonization, which rotates gradients from different input views into an orthogonal configuration to reduce negative directional interference during optimization.
If this is right
- Residual occlusions and illumination inconsistencies are suppressed before they form conflicting gradients.
- Gaussian primitives grow and are pruned according to conflict levels rather than raw density or opacity alone.
- Rendering quality improves on standard benchmarks containing transient distractors and cross-view appearance changes.
- Optimization remains stable even when prior masks leave some unreliable pixels.
Where Pith is reading between the lines
- The same orthogonal-gradient idea could be tested on other radiance-field representations that also optimize per-view contributions.
- If the rotation operation preserves gradient magnitude, it may extend naturally to multi-view consistency losses in dynamic scene capture.
- Scenes with extreme view-dependent effects might still require additional appearance modeling beyond gradient alignment.
Load-bearing premise
That rotating view-specific gradients into an orthogonal configuration reliably reduces negative directional interference without introducing new optimization instabilities or artifacts in the Gaussian primitives.
What would settle it
A controlled comparison on a scene with known illumination variation where the orthogonal rotation step is disabled and the resulting increase in visible artifacts or drop in PSNR is measured against the full method.
Figures
read the original abstract
In-the-wild 3D Gaussian Splatting remains challenging due to transient distractors and illumination-induced cross-view appearance inconsistencies. Existing methods mainly rely on image-level masking to suppress unreliable supervision, but masking alone cannot fully eliminate residual occlusions or resolve illumination-induced inconsistencies, both of which can introduce conflicting cross-view gradients. These unresolved conflicts may destabilize Gaussian optimization and lead to visible reconstruction artifacts. We propose a conflict-aware 3DGS framework that addresses this problem from both image-space supervision and gradient-level optimization. Semantic Consistency-Guided Masking learns pixel-wise consistency scores to adaptively refine prior masks and suppress unreliable supervision before gradient formation. A dual-view Conflict-Aware Gradient Harmonization strategy further reconciles view-specific gradients by mutually rotating them into an orthogonal configuration, reducing negative directional interference across views. We also introduce conflict-aware densification and pruning to stabilize Gaussian growth and remove persistently conflicting primitives. Extensive experiments on standard in-the-wild benchmarks demonstrate that our method achieves state-of-the-art rendering quality under complex transient distractors and cross-view inconsistencies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes HarmoGS, a conflict-aware 3D Gaussian Splatting framework for in-the-wild scenes. It introduces Semantic Consistency-Guided Masking to adaptively refine prior masks using pixel-wise consistency scores, a dual-view Conflict-Aware Gradient Harmonization strategy that mutually rotates view-specific gradients into an orthogonal configuration to reduce negative directional interference, and conflict-aware densification/pruning rules to stabilize Gaussian optimization. The authors claim this yields state-of-the-art rendering quality on standard in-the-wild benchmarks under transient distractors and cross-view inconsistencies.
Significance. If the empirical results and the gradient harmonization step hold up under scrutiny, the work offers a practical advance for 3DGS in real-world captures by moving beyond pure image-space masking to address gradient conflicts directly. The orthogonal rotation idea is a distinctive contribution that could generalize to other multi-view optimization settings, provided it is accompanied by reproducible code and clear ablation evidence.
major comments (2)
- [Method (Conflict-Aware Gradient Harmonization)] The central mechanism in Conflict-Aware Gradient Harmonization (described in the method section) rotates view-specific gradients to an orthogonal configuration without a derivation or analysis showing that the rotated vectors preserve the components necessary for stable updates to Gaussian means, covariances, and opacities. This assumption is load-bearing for the claim of reduced interference without new instabilities, yet no interaction with the photometric loss or the adaptive densification/pruning rules is examined.
- [Experiments] The abstract asserts SOTA rendering quality, but the provided description supplies no quantitative metrics, ablation tables, or error analysis comparing against baselines under controlled transient and illumination conditions. Without these, it is impossible to verify whether the gradient harmonization step delivers the claimed gains or merely correlates with other design choices.
minor comments (2)
- Notation for the consistency scores and the rotation operator should be defined explicitly with symbols and dimensions to avoid ambiguity when readers implement the dual-view harmonization step.
- Figure captions for qualitative results should include the specific benchmark scenes and the exact baselines shown for direct visual comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications and indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method (Conflict-Aware Gradient Harmonization)] The central mechanism in Conflict-Aware Gradient Harmonization (described in the method section) rotates view-specific gradients to an orthogonal configuration without a derivation or analysis showing that the rotated vectors preserve the components necessary for stable updates to Gaussian means, covariances, and opacities. This assumption is load-bearing for the claim of reduced interference without new instabilities, yet no interaction with the photometric loss or the adaptive densification/pruning rules is examined.
Authors: We thank the referee for this observation. The manuscript motivates the orthogonal rotation as a means to decorrelate conflicting directional components across views while preserving update magnitudes, but we acknowledge that an explicit derivation and interaction analysis were not provided. In the revised manuscript we will add a formal derivation in Section 3.2: the dual-view rotation is performed via a mutual orthogonalization operator that projects each gradient onto the orthogonal complement of the other, which mathematically preserves the component aligned with the photometric loss gradient for each view (i.e., the inner product with the original loss direction remains unchanged). We will also include a short analysis of the interaction with the photometric loss and the conflict-aware densification/pruning rules, showing that the reduced directional variance lowers the incidence of unstable densification events without altering the expected update scale for means, covariances, and opacities. revision: yes
-
Referee: [Experiments] The abstract asserts SOTA rendering quality, but the provided description supplies no quantitative metrics, ablation tables, or error analysis comparing against baselines under controlled transient and illumination conditions. Without these, it is impossible to verify whether the gradient harmonization step delivers the claimed gains or merely correlates with other design choices.
Authors: The full manuscript already contains quantitative results, ablation tables, and comparisons against baselines (3DGS, WildGaussians, etc.) on standard in-the-wild benchmarks in Section 4, reporting PSNR/SSIM/LPIPS improvements and isolating the contribution of gradient harmonization. However, to make these findings more immediately verifiable, we will revise the abstract to highlight key numerical gains and expand the experimental section with additional controlled ablation studies that separately vary transient distractors and cross-view illumination while measuring the incremental benefit of the harmonization module. revision: partial
Circularity Check
No significant circularity; claims rest on external data and standard 3DGS primitives
full rationale
The paper introduces Semantic Consistency-Guided Masking, dual-view Conflict-Aware Gradient Harmonization via orthogonal rotation of view-specific gradients, and conflict-aware densification/pruning. No equations, derivations, or fitted-parameter predictions appear in the provided sections that reduce any claimed output to the method's own inputs by construction. The central claims operate on external in-the-wild image data and benchmarks, with no load-bearing self-citations or uniqueness theorems imported from prior author work that would collapse the argument. The derivation chain remains self-contained against standard 3D Gaussian Splatting optimization and photometric losses.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
mutually rotating them into an orthogonal configuration, reducing negative directional interference across views
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tensorf: Tensorial radiance fields
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean conference on computer vision, pages 333–350. Springer, 2022
work page 2022
-
[2]
Nerf-hugs: Improved neural radiance fields in non-static scenes using heuristics-guided segmentation
Jiahao Chen, Yipeng Qin, Lingjie Liu, Jiangbo Lu, and Guanbin Li. Nerf-hugs: Improved neural radiance fields in non-static scenes using heuristics-guided segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19436–19446, 2024
work page 2024
-
[3]
Hallucinated neural radiance fields in the wild
Xingyu Chen, Qi Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, and Jue Wang. Hallucinated neural radiance fields in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943–12952, 2022
work page 2022
-
[4]
K-planes: Explicit radiance fields in space, time, and appearance
Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12479–12488, 2023
work page 2023
-
[5]
Plenoxels: Radiance fields without neural networks
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022
work page 2022
-
[6]
On quantiz- ing implicit neural representations
Cameron Gordon, Shin-Fang Chng, Lachlan MacDonald, and Simon Lucey. On quantiz- ing implicit neural representations. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 341–350, 2023
work page 2023
-
[7]
Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5354–5363, 2024
work page 2024
-
[8]
Efficientnerf efficient neural radiance fields
Tao Hu, Shu Liu, Yilun Chen, Tiancheng Shen, and Jiaya Jia. Efficientnerf efficient neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12902–12911, 2022
work page 2022
-
[9]
Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Jeremie Mary, and Valérie Gouet-Brunet. Refinedfields: Radiance fields refinement for unconstrained scenes.arXiv preprint arXiv:2312.00639, 3, 2023
-
[10]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023
work page 2023
-
[11]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023
work page 2023
-
[12]
Wildgaussians: 3d gaussian splatting in the wild,
Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024
-
[13]
Robust neural rendering in the wild with asymmetric dual 3d gaussian splatting
Chengqi Li, Zhihao Shi, Yangdi Lu, Wenbo He, and Xiangyu Xu. Robust neural rendering in the wild with asymmetric dual 3d gaussian splatting. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[14]
Segment and recognize anything at any granularity
Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianwei Yang, Lei Zhang, and Jianfeng Gao. Segment and recognize anything at any granularity. InEuropean Conference on Computer Vision, pages 467–484. Springer, 2024
work page 2024
-
[15]
Vastgaussian: Vast 3d gaussians for large scene reconstruction
Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, et al. Vastgaussian: Vast 3d gaussians for large scene reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5166–5175, 2024. 10
work page 2024
-
[16]
Hybridgs: Decoupling transients and statics with 2d and 3d gaussian splatting
Jingyu Lin, Jiaqi Gu, Lubin Fan, Bojian Wu, Yujing Lou, Renjie Chen, Ligang Liu, and Jieping Ye. Hybridgs: Decoupling transients and statics with 2d and 3d gaussian splatting. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 788–797, 2025
work page 2025
-
[17]
Maskgaussian: Adaptive 3d gaussian representation from probabilistic masks
Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, and Xiao Sun. Maskgaussian: Adaptive 3d gaussian representation from probabilistic masks. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 681–690, 2025
work page 2025
-
[18]
Nerf in the wild: Neural radiance fields for unconstrained photo collections
Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Doso- vitskiy, and Daniel Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7210–7219, 2021
work page 2021
-
[19]
Nerf: Representing scenes as neural radiance fields for view synthesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021
work page 2021
-
[20]
Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1– 15, 2022
work page 2022
-
[21]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[22]
Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps
Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. InProceedings of the IEEE/CVF international conference on computer vision, pages 14335–14345, 2021
work page 2021
-
[23]
Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild
Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8931–8940, 2024
work page 2024
-
[24]
Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David Fleet, and Andrea Tagliasacchi. Spotlesssplats: Ignoring distractors in 3d gaussian splatting.ACM Transactions on Graphics, 44(2):1–11, 2025
work page 2025
-
[25]
Robustnerf: Ignoring distractors with robust losses
Sara Sabour, Suhani V ora, Daniel Duckworth, Ivan Krasin, David J Fleet, and Andrea Tagliasac- chi. Robustnerf: Ignoring distractors with robust losses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20626–20636, 2023
work page 2023
-
[26]
Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction
Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5459–5469, 2022
work page 2022
-
[27]
Neural geometric level of detail: Real-time rendering with implicit 3d shapes
Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, and Sanja Fidler. Neural geometric level of detail: Real-time rendering with implicit 3d shapes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11358–11367, 2021
work page 2021
-
[28]
Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery
Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, and Yi Yang. Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 833–843, 2025
work page 2025
-
[29]
Robust 3d gaussian splatting for novel view synthesis in presence of distractors
Paul Ungermann, Armin Ettenhofer, Matthias Nießner, and Barbara Roessle. Robust 3d gaussian splatting for novel view synthesis in presence of distractors. InDAGM German Conference on Pattern Recognition, pages 153–167. Springer, 2024
work page 2024
-
[30]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600– 612, 2004. 11
work page 2004
-
[31]
Jiacong Xu, Yiqun Mei, and Vishal M Patel. Wild-gs: Real-time novel view synthesis from un- constrained photo collections.Advances in Neural Information Processing Systems, 37:103334– 103355, 2024
work page 2024
-
[32]
Point-nerf: Point-based neural radiance fields
Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. Point-nerf: Point-based neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022
work page 2022
-
[33]
Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections
Yifan Yang, Shuhai Zhang, Zixiong Huang, Yubing Zhang, and Mingkui Tan. Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15901–15911, 2023
work page 2023
-
[34]
Mip-splatting: Alias-free 3d gaussian splatting
Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19447–19456, 2024
work page 2024
-
[35]
Gaussian in the wild: 3d gaussian splatting for unconstrained image collections
Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, and Haoqian Wang. Gaussian in the wild: 3d gaussian splatting for unconstrained image collections. InEuropean Conference on Computer Vision, pages 341–359. Springer, 2024
work page 2024
-
[36]
The unrea- sonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 12 Ground TruthOursDroneSplat Figure 6: Additional qualitative comparisons with DroneSplat [28]. Table ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.