pith. sign in

arxiv: 2605.07287 · v2 · pith:UPRKJDSOnew · submitted 2026-05-08 · 💻 cs.CV

SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

Pith reviewed 2026-05-22 10:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian SplattingGeneralizable Novel View SynthesisDynamic Primitive AllocationFeed-forward RenderingHigh-frequency GuidanceExpert RoutingAdaptive Scene Representation
0
0 comments X

The pith

SplatWeaver learns to assign varying numbers of 3D Gaussians to different scene regions from uncalibrated images in a single forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the uniform primitive budget used in most feed-forward 3D Gaussian Splatting methods for generalizable novel view synthesis. Instead of giving every pixel or voxel the same number of primitives, SplatWeaver predicts a location-specific count so that smooth areas receive few or zero primitives while fine structures and textured regions receive more. It does this with a set of cardinality experts, each responsible for a fixed primitive count from zero to M, plus a learned pixel-level router that selects among them. A high-frequency prior together with a guidance module and routing regularization biases the router toward complexity-aware choices. Experiments show the resulting representations produce higher-fidelity renderings of unseen views while using fewer total primitives than prior uniform-allocation baselines.

Core claim

SplatWeaver introduces cardinality Gaussian experts and a pixel-level routing scheme that together allow the model to predict, for every spatial location, how many Gaussian primitives to instantiate, with a high-frequency prior and guidance module that stabilize the routing toward higher counts in complex regions and lower counts in smooth ones.

What carries the argument

Cardinality Gaussian experts (each producing a fixed number of primitives from 0 to M) coordinated by a pixel-level routing network, stabilized by a high-frequency prior and guidance module plus routing regularization.

If this is right

  • Fewer total primitives suffice for the same or better rendering quality because capacity is concentrated where scene complexity is highest.
  • Feed-forward inference becomes viable for scenes whose detail varies sharply across space without requiring later per-scene refinement.
  • The same routing logic can be applied to any primitive-based scene representation whose local density can be adjusted at inference time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The expert-routing pattern may transfer to other adaptive representations such as neural radiance fields or voxel grids where local resolution should vary with content.
  • Real-time rendering pipelines could exploit the resulting sparsity by skipping empty or low-cardinality regions entirely during splatting.
  • If the router generalizes across datasets, it could serve as a learned complexity estimator for downstream tasks such as view selection or compression of 3D captures.

Load-bearing premise

The high-frequency prior with its guidance module and routing regularization can reliably drive the router to assign more primitives to complex regions and fewer to smooth ones from uncalibrated input images without any per-scene optimization.

What would settle it

A controlled ablation in which the high-frequency guidance module is removed and the router is observed to revert to near-uniform primitive counts across textured and smooth regions on the same test scenes.

Figures

Figures reproduced from arXiv: 2605.07287 by Fan Li, Mingwen Shao, Wangmeng Zuo, Yecong Wan.

Figure 1
Figure 1. Figure 1: Comparison of paradigms for generalizable novel view synthesis. In contrast to prior methods that struggle with redundant primitives, fixed budgets, or rigid allocation, SplatWeaver adaptively allocates a dynamic number of Gaus￾sian primitives according to scene complexity, enabling a more principled and flexible distribution of scene representations. Earlier paradigms aimed to directly reconstruct scene g… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of predicted Gaussian distributions and novel view synthesis performance. SplatWeaver dynamically distributes [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SplatWeaver achieves consistent state-of-the-art per [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overall framework of SplatWeaver. Given N uncalibrated images, a geometry transformer first estimates camera poses and extracts pixel-level features {Fn} N n=1. Subsequently, guided by a frequency prior injection module, a router assigns each pixel to the most suitable cardinality Gaussian expert Ee, which predicts a set of hidden Gaussians comprising spatial positions µ and latent features Fl . After gath… view at source ↗
Figure 5
Figure 5. Figure 5: Left: Illustration of the proposed high-frequency prior, where the high-frequency energy map, derived from the discrete wavelet transform with ( √ HH2+LH2+HL2)↑2, exhibits strong alignment with the Gaussian distribution obtained from full scene reconstruction via 3DGS. Right: Diagram of the proposed frequency prior guidance module and the pixel-level Gaussian expert router. remaining expert Ee is implement… view at source ↗
Figure 6
Figure 6. Figure 6: Diagram of the proposed frequency prior-guided routing [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparative analysis of rendering quality versus Gaussian complexity across benchmarks under varying view settings. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparisons on the DL3DV [76] dataset. From top to bottom, every two rows correspond to rendering results under 4, 8, 16, and 24 view settings, respectively. Our method yields more coherent fine structures and sharper details. B. Comparison with State-of-the-Art Models To rigorously evaluate the effectiveness of SplatWeaver, we conduct a comprehensive comparative analysis against several state-… view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative comparisons on the RealEstate10K [ [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative comparisons on the the Mip-NeRF 360 [ [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of the cardinality Gaussian expert routing and the resulting Gaussian distribution with or without the [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visualization of Gaussian scales predicted across [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visualization of scene geometry and novel view [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗
read the original abstract

Generalizable novel view synthesis aims to render unseen views from uncalibrated input images without requiring per-scene optimization. Recent feed-forward approaches based on 3D Gaussian Splatting have achieved promising efficiency and rendering quality. However, most of them assign a fixed number of Gaussians to each pixel or voxel, ignoring the spatially varying complexity of real-world scenes. Such uniform allocation often wastes Gaussian primitives in smooth regions while providing insufficient capacity for fine structures, complex geometry, and high-frequency details. This motivates us to predict region-dependent primitive cardinalities rather than impose a fixed primitive budget everywhere, enabling a more expressive 3D scene representation. Therefore, we propose SplatWeaver, a generalizable novel view synthesis framework that is able to dynamically allocate Gaussian primitives over different regions in a feed-forward manner. Specifically, SplatWeaver introduces cardinality Gaussian experts and a pixel-level routing scheme, wherein each expert specializes in producing a specific number of primitives from 0 to M, and the routing scheme coordinates these experts to adaptively determine how many Gaussian primitives should be allocated to each spatial location. Moreover, SplatWeaver incorporates a high-frequency prior with attendant guidance module and routing regularization to stabilize expert selection and promote complexity-aware allocation. By leveraging high-frequency cues, the routing process is encouraged to assign more Gaussian primitives to fine structures and textured regions, while suppressing redundancy in smooth areas. Extensive experiments across diverse scenarios show that SplatWeaver consistently outperforms state-of-the-art methods, delivering more faithful novel-view renderings with fewer Gaussian primitives. Project Page: https://yecongwan.github.io/SplatWeaver/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SplatWeaver, a feed-forward framework for generalizable novel view synthesis from uncalibrated images. It replaces uniform Gaussian primitive allocation with a dynamic scheme using cardinality Gaussian experts (each specialized for a fixed count from 0 to M) and a pixel-level router. A high-frequency prior, guidance module, and routing regularization are added to bias allocation toward textured and geometrically complex regions. The central claim is that this yields higher-fidelity novel-view renderings than prior feed-forward 3DGS methods while using fewer total primitives.

Significance. If the routing and high-frequency guidance prove stable across views, the work offers a principled way to make generalizable NVS more efficient by matching primitive density to local scene complexity rather than imposing a global budget. The expert-plus-router architecture is a clear architectural contribution that could influence subsequent feed-forward splatting pipelines.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (method description): the claim that the high-frequency prior plus guidance module and routing regularization produce stable, complexity-aware expert selection rests on the untested assumption that 2D frequency content reliably signals view-consistent 3D geometric complexity. Occlusions, specularities, and depth discontinuities can generate misleading cues; without ablations that isolate the prior's effect on routing decisions or visualizations of per-expert activation maps on held-out views, it is unclear whether the mechanism actually prevents over- or under-allocation that only appears after novel-view rendering.
  2. [§4] §4 (experiments): the abstract asserts consistent outperformance with fewer primitives, yet no quantitative tables, per-scene primitive counts, or cross-method comparisons (e.g., PSNR/SSIM deltas versus fixed-budget baselines) are referenced. Without these data it is impossible to judge whether the reported gains are load-bearing for the central claim or whether the reduction in primitives comes at the cost of quality in high-complexity regions.
minor comments (1)
  1. [§3] Notation for the maximum primitives per expert (M) and the precise form of the routing regularization loss should be defined explicitly with equations rather than left at the level of the abstract description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. Below we respond point-by-point to the major comments, clarifying our design rationale and experimental evidence while committing to targeted revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (method description): the claim that the high-frequency prior plus guidance module and routing regularization produce stable, complexity-aware expert selection rests on the untested assumption that 2D frequency content reliably signals view-consistent 3D geometric complexity. Occlusions, specularities, and depth discontinuities can generate misleading cues; without ablations that isolate the prior's effect on routing decisions or visualizations of per-expert activation maps on held-out views, it is unclear whether the mechanism actually prevents over- or under-allocation that only appears after novel-view rendering.

    Authors: We agree that the link between 2D high-frequency content and view-consistent 3D complexity is an inductive bias rather than a rigorously proven mapping, and that occlusions or specularities can produce noisy cues. In §3.3 we motivate the prior by noting that high-frequency 2D regions typically correspond to geometric detail that benefits from additional primitives; §4.3 reports ablation results showing that removing the guidance module and routing regularization degrades both PSNR and the adaptivity of primitive counts. However, these ablations measure end-to-end rendering quality rather than isolating routing decisions per se. We will therefore add (i) per-expert activation visualizations on held-out views and (ii) an ablation that disables only the high-frequency prior while keeping the router intact, to directly demonstrate stability of expert selection across views. revision: yes

  2. Referee: [§4] §4 (experiments): the abstract asserts consistent outperformance with fewer primitives, yet no quantitative tables, per-scene primitive counts, or cross-method comparisons (e.g., PSNR/SSIM deltas versus fixed-budget baselines) are referenced. Without these data it is impossible to judge whether the reported gains are load-bearing for the central claim or whether the reduction in primitives comes at the cost of quality in high-complexity regions.

    Authors: We regret that the main-text narrative did not explicitly point readers to the supporting numbers. Table 1 already reports average PSNR/SSIM/LPIPS together with mean primitive counts for SplatWeaver versus prior feed-forward 3DGS methods; supplementary material contains per-scene breakdowns. To make the efficiency claim fully transparent, we will insert a new column in Table 1 showing PSNR/SSIM deltas relative to fixed-budget baselines (e.g., 64 or 128 primitives per pixel) and will add a short paragraph in §4.2 that quantifies quality retention in high-complexity regions (measured by local PSNR on edge/texture masks). These additions will allow direct assessment of whether dynamic allocation preserves or improves fidelity where it matters most. revision: yes

Circularity Check

0 steps flagged

No circularity detected in architectural derivation

full rationale

The paper introduces an independent neural architecture (cardinality experts + pixel routing + high-frequency guidance) trained end-to-end for feed-forward allocation. No equations or claims reduce by construction to fitted inputs, self-definitions, or self-citation chains; the high-frequency prior is an explicit design choice justified by empirical motivation rather than tautology. Central performance claims rest on external benchmark comparisons, not internal re-derivations.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that high-frequency image cues can guide primitive allocation and that a mixture of experts can be trained to specialize in different cardinalities without instability.

free parameters (1)
  • M (maximum primitives per expert)
    Upper bound on the number of Gaussians each expert can produce; treated as a design choice.
axioms (1)
  • domain assumption High-frequency image features reliably indicate regions that require more Gaussian primitives for accurate reconstruction.
    Invoked to justify the high-frequency prior guidance module.
invented entities (1)
  • Cardinality Gaussian experts no independent evidence
    purpose: Specialized modules that each output a fixed number of Gaussian primitives from 0 to M.
    New architectural component introduced to enable discrete cardinality choices.

pith-pipeline@v0.9.0 · 5832 in / 1225 out tokens · 48285 ms · 2026-05-22T10:40:34.028681+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 8 internal anchors

  1. [1]

    Nerf: Representing scenes as neural radiance fields for view synthesis,

    B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

  2. [2]

    3d gaussian splatting for real-time radiance field rendering

    B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

  3. [3]

    Tensorf: Tensorial radiance fields,

    A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance fields,” inEuropean conference on computer vision. Springer, 2022, pp. 333–350

  4. [4]

    Plenoxels: Radiance fields without neural networks,

    S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neural networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5501–5510

  5. [5]

    Fastnerf: High-fidelity neural rendering at 200fps,

    S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. Valentin, “Fastnerf: High-fidelity neural rendering at 200fps,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 14 346–14 355

  6. [6]

    Instant neural graphics primitives with a multiresolution hash encoding,

    T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,”ACM transactions on graphics (TOG), vol. 41, no. 4, pp. 1–15, 2022

  7. [7]

    Mip-splatting: Alias- free 3d gaussian splatting,

    Z. Yu, A. Chen, B. Huang, T. Sattler, and A. Geiger, “Mip-splatting: Alias- free 3d gaussian splatting,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19 447–19 456

  8. [8]

    Gaussianpro: 3d gaussian splatting with progressive propa- gation,

    K. Cheng, X. Long, K. Yang, Y . Yao, W. Yin, Y . Ma, W. Wang, and X. Chen, “Gaussianpro: 3d gaussian splatting with progressive propa- gation,” inForty-first International Conference on Machine Learning, 2024

  9. [9]

    4d gaussian splatting for real-time dynamic scene rendering,

    G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and X. Wang, “4d gaussian splatting for real-time dynamic scene rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 310–20 320

  10. [10]

    Scaffold-gs: Structured 3d gaussians for view-adaptive rendering,

    T. Lu, M. Yu, L. Xu, Y . Xiangli, L. Wang, D. Lin, and B. Dai, “Scaffold-gs: Structured 3d gaussians for view-adaptive rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 654–20 664

  11. [11]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction,

    D. Charatan, S. L. Li, A. Tagliasacchi, and V . Sitzmann, “pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 19 457–19 467

  12. [12]

    Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images,

    Y . Chen, H. Xu, C. Zheng, B. Zhuang, M. Pollefeys, A. Geiger, T.- J. Cham, and J. Cai, “Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images,” inEuropean conference on computer vision. Springer, 2024, pp. 370–386

  13. [13]

    Long-lrm: Long-sequence large reconstruction model for wide- coverage gaussian splats,

    C. Ziwen, H. Tan, K. Zhang, S. Bi, F. Luan, Y . Hong, L. Fuxin, and Z. Xu, “Long-lrm: Long-sequence large reconstruction model for wide- coverage gaussian splats,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4349–4359

  14. [14]

    Anysplat: Feed-forward 3d gaussian splatting from unconstrained views,

    L. Jiang, Y . Mao, L. Xu, T. Lu, K. Ren, Y . Jin, X. Xu, M. Yu, J. Pang, F. Zhaoet al., “Anysplat: Feed-forward 3d gaussian splatting from unconstrained views,”ACM Transactions on Graphics (TOG), vol. 44, no. 6, pp. 1–16, 2025

  15. [15]

    Wavenerf: Wavelet-based generalizable neural radiance fields,

    M. Xu, F. Zhan, J. Zhang, Y . Yu, X. Zhang, C. Theobalt, L. Shao, and S. Lu, “Wavenerf: Wavelet-based generalizable neural radiance fields,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 195–18 204

  16. [16]

    Depthsplat: Connecting gaussian splatting and depth,

    H. Xu, S. Peng, F. Wang, H. Blum, D. Barath, A. Geiger, and M. Pollefeys, “Depthsplat: Connecting gaussian splatting and depth,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 16 453–16 463

  17. [17]

    Gs-lrm: Large reconstruction model for 3d gaussian splatting,

    K. Zhang, S. Bi, H. Tan, Y . Xiangli, N. Zhao, K. Sunkavalli, and Z. Xu, “Gs-lrm: Large reconstruction model for 3d gaussian splatting,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 1–19

  18. [18]

    Epipolar-free 3d gaussian splatting for generalizable novel view synthesis,

    Z. Min, Y . Luo, J. Sun, and Y . Yang, “Epipolar-free 3d gaussian splatting for generalizable novel view synthesis,”Advances in Neural Information Processing Systems, vol. 37, pp. 39 573–39 596, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15

  19. [19]

    Hisplat: Hierarchical 3d gaussian splatting for generalizable sparse-view reconstruction,

    S. Tang, W. Ye, P. Ye, W. Lin, Y . Zhou, T. Chen, and W. Ouyang, “Hisplat: Hierarchical 3d gaussian splatting for generalizable sparse-view reconstruction,”arXiv preprint arXiv:2410.06245, 2024

  20. [20]

    Zpressor: Bottleneck-aware compression for scalable feed-forward 3dgs,

    W. Wang, D. Y . Chen, Z. Zhang, D. Shi, A. Liu, and B. Zhuang, “Zpressor: Bottleneck-aware compression for scalable feed-forward 3dgs,”arXiv preprint arXiv:2505.23734, 2025

  21. [21]

    Yonosplat: You only need one model for feedforward 3d gaussian splatting,

    B. Ye, B. Chen, H. Xu, D. Barath, and M. Pollefeys, “Yonosplat: You only need one model for feedforward 3d gaussian splatting,” inInternational Conference on Learning Representations (ICLR), 2026

  22. [22]

    No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images

    B. Ye, S. Liu, H. Xu, X. Li, M. Pollefeys, M.-H. Yang, and S. Peng, “No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images,”arXiv preprint arXiv:2410.24207, 2024

  23. [23]

    Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,

    S. Zhang, J. Wang, Y . Xu, N. Xue, C. Rupprecht, X. Zhou, Y . Shen, and G. Wetzstein, “Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 21 936– 21 947

  24. [24]

    Pf3plat: Pose-free feed-forward 3d gaussian splatting, 2025

    S. Hong, J. Jung, H. Shin, J. Han, J. Yang, C. Luo, and S. Kim, “Pf3plat: Pose-free feed-forward 3d gaussian splatting,”arXiv preprint arXiv:2410.22128, 2024

  25. [25]

    Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

    B. Smart, C. Zheng, I. Laina, and V . A. Prisacariu, “Splatt3r: Zero- shot gaussian splatting from uncalibrated image pairs,”arXiv preprint arXiv:2408.13912, 2024

  26. [26]

    Evolsplat: Efficient volume-based gaussian splatting for urban view synthesis,

    S. Miao, J. Huang, D. Bai, X. Yan, H. Zhou, Y . Wang, B. Liu, A. Geiger, and Y . Liao, “Evolsplat: Efficient volume-based gaussian splatting for urban view synthesis,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 11 286–11 296

  27. [27]

    V olsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned prediction,

    W. Wang, Y . Chen, Z. Zhang, H. Liu, H. Wang, Z. Feng, W. Qin, Z. Zhu, D. Y . Chen, and B. Zhuang, “V olsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned prediction,”arXiv preprint arXiv:2509.19297, 2025

  28. [28]

    Tokensplat: Token- aligned 3d gaussian splatting for feed-forward pose-free reconstruction,

    Y . Li, C. Lv, Z. Tang, H. Yang, and D. Huang, “Tokensplat: Token- aligned 3d gaussian splatting for feed-forward pose-free reconstruction,” arXiv preprint arXiv:2603.00697, 2026

  29. [29]

    Worldmirror: Universal 3d world reconstruction with any-prior prompting,

    Y . Liu, Z. Min, Z. Wang, J. Wu, T. Wang, Y . Yuan, Y . Luo, and C. Guo, “Worldmirror: Universal 3d world reconstruction with any-prior prompting,”arXiv preprint arXiv:2510.10726, 2025

  30. [30]

    Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images,

    S. Zhang, X. Fei, F. Liu, H. Song, and Y . Duan, “Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images,”Advances in Neural Information Processing Systems, vol. 37, pp. 50 361–50 380, 2024

  31. [31]

    Ecosplat: Efficiency-controllable feed-forward 3d gaussian splatting from multi-view images,

    J. Park, M.-Q. V . Bui, J. L. G. Bello, J. Moon, J. Oh, and M. Kim, “Ecosplat: Efficiency-controllable feed-forward 3d gaussian splatting from multi-view images,”arXiv preprint arXiv:2512.18692, 2025

  32. [32]

    arXiv preprint arXiv:2512.15508 (2025)

    A. Moreau, R. Shaw, M. Nazarczuk, J. Shin, T. Tanay, Z. Zhang, S. Xu, and E. Pérez-Pellitero, “Off the grid: Detection of primitives for feed- forward 3d gaussian splatting,”arXiv preprint arXiv:2512.15508, 2025

  33. [33]

    Gaus- siantrim3r: Controllable 3d gaussians pruning for feedforward models

    B. Singhal, K. Srihari, A. Dhiman, and V . B. Radhakrishnan, “Gaus- siantrim3r: Controllable 3d gaussians pruning for feedforward models.”

  34. [34]

    C3G: Learning Compact 3D Representations with 2K Gaussians

    H. An, J. Jung, M. Kim, S. Hong, C. Kim, K. Fukuda, M. Jeon, J. Han, T. Narihira, H. Koet al., “C3g: Learning compact 3d representations with 2k gaussians,”arXiv preprint arXiv:2512.04021, 2025

  35. [35]

    Tokengs: Decoupling 3d gaussian prediction from pixels with learnable tokens,

    J. Ren, M. Tyszkiewicz, J. Huang, and Z. Gojcic, “Tokengs: Decoupling 3d gaussian prediction from pixels with learnable tokens,”Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

  36. [36]

    Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,

    J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5855–5864

  37. [37]

    Mip-nerf 360: Unbounded anti-aliased neural radiance fields,

    J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-nerf 360: Unbounded anti-aliased neural radiance fields,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5470–5479

  38. [38]

    Zip-nerf: Anti-aliased grid-based neural radiance fields,

    ——, “Zip-nerf: Anti-aliased grid-based neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 697–19 705

  39. [39]

    Ref-nerf: Structured view-dependent appearance for neural radiance fields,

    D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan, “Ref-nerf: Structured view-dependent appearance for neural radiance fields,” in2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022, pp. 5481–5490

  40. [40]

    Nerfies: Deformable neural radiance fields,

    K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. B. Goldman, S. M. Seitz, and R. Martin-Brualla, “Nerfies: Deformable neural radiance fields,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5865–5874

  41. [41]

    Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields,

    K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman, R. Martin-Brualla, and S. M. Seitz, “Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields,”arXiv preprint arXiv:2106.13228, 2021

  42. [42]

    Masked space-time hash encoding for efficient dynamic scene reconstruction,

    F. Wang, Z. Chen, G. Wang, Y . Song, and H. Liu, “Masked space-time hash encoding for efficient dynamic scene reconstruction,”Advances in neural information processing systems, vol. 36, pp. 70 497–70 510, 2023

  43. [43]

    Fast dynamic radiance fields with time-aware neural voxels,

    J. Fang, T. Yi, X. Wang, L. Xie, X. Zhang, W. Liu, M. Nießner, and Q. Tian, “Fast dynamic radiance fields with time-aware neural voxels,” inSIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–9

  44. [44]

    Robust dynamic radiance fields,

    Y .-L. Liu, C. Gao, A. Meuleman, H.-Y . Tseng, A. Saraf, C. Kim, Y .-Y . Chuang, J. Kopf, and J.-B. Huang, “Robust dynamic radiance fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13–23

  45. [45]

    Forward flow for novel view synthesis of dynamic scenes,

    X. Guo, J. Sun, Y . Dai, G. Chen, X. Ye, X. Tan, E. Ding, Y . Zhang, and J. Wang, “Forward flow for novel view synthesis of dynamic scenes,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 022–16 033

  46. [46]

    Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering,

    R. Shao, Z. Zheng, H. Tu, B. Liu, H. Zhang, and Y . Liu, “Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 632–16 642

  47. [47]

    DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

    J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng, “Dreamgaussian: Generative gaussian splatting for efficient 3d content creation,”arXiv preprint arXiv:2309.16653, 2023

  48. [48]

    Gps-gaussian+: Generalizable pixel-wise 3d gaussian splatting for real- time human-scene rendering from sparse views,

    B. Zhou, S. Zheng, H. Tu, R. Shao, B. Liu, S. Zhang, L. Nie, and Y . Liu, “Gps-gaussian+: Generalizable pixel-wise 3d gaussian splatting for real- time human-scene rendering from sparse views,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  49. [49]

    Efficient scene modeling via structure-aware and region-prioritized 3d gaussians,

    G. Fang and B. Wang, “Efficient scene modeling via structure-aware and region-prioritized 3d gaussians,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  50. [50]

    Gir: 3d gaussian inverse rendering for relightable scene factorization,

    Y . Shi, Y . Wu, C. Wu, X. Liu, C. Zhao, H. Feng, J. Zhang, B. Zhou, E. Ding, and J. Wang, “Gir: 3d gaussian inverse rendering for relightable scene factorization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  51. [51]

    Stylizedgs: Controllable stylization for 3d gaussian splatting,

    D. Zhang, Y .-J. Yuan, Z. Chen, F.-L. Zhang, Z. He, S. Shan, and L. Gao, “Stylizedgs: Controllable stylization for 3d gaussian splatting,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  52. [52]

    Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo,

    A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su, “Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 14 124–14 133

  53. [53]

    Is attention all that nerf needs?

    P. Wang, X. Chen, T. Chen, S. Venugopalan, Z. Wanget al., “Is attention all that nerf needs?”arXiv preprint arXiv:2207.13298, 2022

  54. [54]

    Skipnet: Learning dynamic routing in convolutional networks,

    X. Wang, F. Yu, Z.-Y . Dou, T. Darrell, and J. E. Gonzalez, “Skipnet: Learning dynamic routing in convolutional networks,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 409–424

  55. [55]

    Convolutional networks with adaptive inference graphs,

    A. Veit and S. Belongie, “Convolutional networks with adaptive inference graphs,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–18

  56. [56]

    Dynamic filter networks,

    X. Jia, B. De Brabandere, T. Tuytelaars, and L. V . Gool, “Dynamic filter networks,”Advances in neural information processing systems, vol. 29, 2016

  57. [57]

    Deformable convolutional networks,

    J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei, “Deformable convolutional networks,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 764–773

  58. [58]

    Spatio-temporal filter adaptive network for video deblurring,

    S. Zhou, J. Zhang, J. Pan, H. Xie, W. Zuo, and J. Ren, “Spatio-temporal filter adaptive network for video deblurring,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2482–2491

  59. [59]

    Deformable kernels: Adapt- ing effective receptive fields for object deformation,

    H. Gao, X. Zhu, S. Lin, and J. Dai, “Deformable kernels: Adapt- ing effective receptive fields for object deformation,”arXiv preprint arXiv:1910.02940, 2019

  60. [60]

    Leaving some stones unturned: dynamic feature prioritization for activity detection in streaming video,

    Y .-C. Su and K. Grauman, “Leaving some stones unturned: dynamic feature prioritization for activity detection in streaming video,” in European Conference on Computer Vision. Springer, 2016, pp. 783–800

  61. [61]

    Adaframe: Adaptive frame selection for fast video recognition,

    Z. Wu, C. Xiong, C.-Y . Ma, R. Socher, and L. S. Davis, “Adaframe: Adaptive frame selection for fast video recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1278–1287

  62. [62]

    Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

    Y . Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,”arXiv preprint arXiv:1308.3432, 2013

  63. [63]

    From sparse to soft mixtures of experts

    J. Puigcerver, C. Riquelme, B. Mustafa, and N. Houlsby, “From sparse to soft mixtures of experts,”arXiv preprint arXiv:2308.00951, 2023. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16

  64. [64]

    Scaling vision with sparse mixture of experts,

    C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, A. Susano Pinto, D. Keysers, and N. Houlsby, “Scaling vision with sparse mixture of experts,”Advances in Neural Information Processing Systems, vol. 34, pp. 8583–8595, 2021

  65. [65]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,”arXiv preprint arXiv:1701.06538, 2017

  66. [66]

    Uni-moe: Scaling unified multimodal llms with mixture of experts,

    Y . Li, S. Jiang, B. Hu, L. Wang, W. Zhong, W. Luo, L. Ma, and M. Zhang, “Uni-moe: Scaling unified multimodal llms with mixture of experts,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  67. [67]

    Mome: Mixture of multimodal experts for generalist multimodal large language models,

    L. Shen, G. Chen, R. Shao, W. Guan, and L. Nie, “Mome: Mixture of multimodal experts for generalist multimodal large language models,” Advances in neural information processing systems, vol. 37, pp. 42 048– 42 070, 2024

  68. [68]

    Mixture-of-shape-experts (mose): End-to-end shape dictionary framework to prompt sam for generalizable medical segmentation,

    J. Wei, X. Zhao, J. Woo, J. Ouyang, G. El Fakhri, Q. Chen, and X. Liu, “Mixture-of-shape-experts (mose): End-to-end shape dictionary framework to prompt sam for generalizable medical segmentation,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 6448–6458

  69. [69]

    Sam-med3d-moe: Towards a non-forgetting segment anything model via mixture of experts for 3d medical image segmentation,

    G. Wang, J. Ye, J. Cheng, T. Li, Z. Chen, J. Cai, J. He, and B. Zhuang, “Sam-med3d-moe: Towards a non-forgetting segment anything model via mixture of experts for 3d medical image segmentation,” inInterna- tional Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 552–561

  70. [70]

    Complexity experts are task-discriminative learners for any image restoration,

    E. Zamfir, Z. Wu, N. Mehta, Y . Tan, D. P. Paudel, Y . Zhang, and R. Timofte, “Complexity experts are task-discriminative learners for any image restoration,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 753–12 763

  71. [71]

    Unirestorer: Universal image restoration via adaptively estimating image degradation at proper granularity,

    J. Lin, Z. Zhang, W. Li, R. Pei, H. Xu, H. Zhang, and W. Zuo, “Unirestorer: Universal image restoration via adaptively estimating image degradation at proper granularity,”arXiv preprint arXiv:2412.20157, 2024

  72. [72]

    DINOv2: Learning Robust Visual Features without Supervision

    M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

  73. [73]

    Vggt: Visual geometry grounded transformer,

    J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 5294–5306

  74. [74]

    Vision transformers for dense prediction,

    R. Ranftl, A. Bochkovskiy, and V . Koltun, “Vision transformers for dense prediction,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 179–12 188

  75. [75]

    Billion-scale similarity search with gpus,

    J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,”IEEE Transactions on Big Data, 2019

  76. [76]

    Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision,

    L. Ling, Y . Sheng, Z. Tu, W. Zhao, C. Xin, K. Wan, L. Yu, Q. Guo, Z. Yu, Y . Luet al., “Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 160–22 169

  77. [77]

    Stereo Magnification: Learning View Synthesis using Multiplane Images

    T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, “Stereo magnification: Learning view synthesis using multiplane images,”arXiv preprint arXiv:1805.09817, 2018

  78. [78]

    No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views,

    R. Huang and K. Mikolajczyk, “No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 27 947–27 957

  79. [79]

    Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps,

    Z. Fan, K. Wang, K. Wen, Z. Zhu, D. Xu, and Z. Wang, “Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps,” Advances in neural information processing systems, vol. 37, pp. 140 138– 140 158, 2024

  80. [80]

    Point transformer,

    H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V . Koltun, “Point transformer,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 16 259–16 268

Showing first 80 references.