DLGStream: Dynamic Language-embedded Guassian Splatting for Open-vocabulary Enabled Free-viewpoint Video Streaming

Tie Qiu; Xiaobo Zhou; Yuyang Liu; Zhihui Ke

arxiv: 2606.28840 · v1 · pith:MLF65BF4new · submitted 2026-06-27 · 💻 cs.CV

DLGStream: Dynamic Language-embedded Guassian Splatting for Open-vocabulary Enabled Free-viewpoint Video Streaming

Zhihui Ke , Yuyang Liu , Xiaobo Zhou , Tie Qiu This is my paper

Pith reviewed 2026-06-30 10:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D Gaussian SplattingFree-viewpoint VideoOpen-vocabulary SegmentationLanguage EmbeddingDynamic Scene ReconstructionVideo Streaming4D Gaussian Representation

0 comments

The pith

DLGStream streams language features with 3D Gaussians for open-vocabulary free-viewpoint video at 43 KB per frame using dual opacity and deformation fields.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DLGStream to support streaming of time-varying language features embedded in 3D Gaussian splats for free-viewpoint video. It maintains separate opacity attributes for color and language modalities to prevent quality loss during joint optimization. An interpolation-based deformation field compresses temporal changes across frames while also enabling upsampling to higher frame rates. Experiments show gains in both open-vocabulary segmentation accuracy and visual reconstruction quality under the strict size and speed constraints of FVV streaming. A reader would care because this makes interactive editing and querying of dynamic 3D scenes feasible over limited bandwidth.

Core claim

DLGStream proposes a dual-opacity dynamic language Gaussian representation that maintains two separate opacity attributes for color and language features to avoid performance degradation during joint optimization, paired with an interpolation-based deformation field that reduces temporal redundancy in the Gaussian attributes and language features while supporting 4D frame interpolation to increase FPS.

What carries the argument

Dual-opacity dynamic language Gaussian representation that keeps independent opacity values for the color and language modalities, together with an interpolation-based deformation field that models temporal changes across frames.

If this is right

Open-vocabulary queries become available on streamed FVV without inflating frame size beyond 43 KB on average.
The deformation field enables both compression of temporal redundancy and conversion of low-FPS sequences to high-FPS output.
Scene editing and spatial intelligence applications can operate directly on the streamed language-embedded Gaussians.
Joint color-language optimization no longer forces a tradeoff between reconstruction quality and segmentation performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual-opacity split could generalize to other auxiliary feature channels such as depth or semantic labels in dynamic Gaussian representations.
Low per-frame size may support real-time mobile AR/VR use cases where both visual quality and queryability are required.
The deformation field might be combined with existing 3DGS compression methods to achieve even smaller transmission sizes.

Load-bearing premise

The claim that two separate opacity attributes fully prevent quality loss when colors and language features are optimized together, and that an interpolation deformation field can cut temporal redundancy without adding artifacts or reducing fidelity in either channel.

What would settle it

Measure open-vocabulary segmentation mIoU and reconstruction PSNR on the same FVV test sequences using a single shared opacity versus the dual-opacity version; also compare artifact visibility and fidelity in frames generated by the deformation field at different interpolation ratios.

Figures

Figures reproduced from arXiv: 2606.28840 by Tie Qiu, Xiaobo Zhou, Yuyang Liu, Zhihui Ke.

**Figure 1.** Figure 1: Overview of the DLGStream framework. We adopt a GOP-by-GOP training strategy. In the first GOP, static and dynamic Gaussians are decoupled, while in subsequent GOPs, the residuals of these Gaussians are learned from the previous GOP. Besides, we interpolate key time features and use a deformation field to predict deformed dynamic Gaussian attributes. Finally, dual-opacity language Gaussian splatting is a… view at source ↗

**Figure 2.** Figure 2: Qualitative comparisons of 4D open-vocabulary querying. 4.2 4D Open-vocabulary Querying Results Quantitative Comparisons . We perform 4D open-vocabulary querying on N3DV dataset, with mIoU(%) and mAcc(%) results detailed in [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: RD-Cave results of N3DV and MeetRoom datasets [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of opacity ablation. (a) Opacity of Cook Spinach (b) Opacity of Flame Salmon (c) Opacity of Sear Steak (d) Opacity time difference of Cook Spinach (e) Opacity time difference of Flame Salmon [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Opacity statistical distributions of color and language feature. sharing opacity can interfere with the learning of color and semantics [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results of the long video on Flame Salmon scene of N3DV dataset. marked with ∗, represent our re-implementation [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative result of Ours(3DGS)’s frame interpolation on Coffee Martini scene of N3DV dataset. (a) 70th - reconstructed (b) 71th - interpolated (c) 72th - reconstructed (d) 73th - interpolated (e) 70th - reconstructed (f ) 71th - interpolated (g) 72th - interpolated (h) 73th - interpolated [PITH_FULL_IMAGE:figures/full_fig_p031_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative result of Ours(3DGS)’s frame interpolation on Cut Roasted Beef scene of N3DV dataset. (a) 261th - reconstructed (b) 262th - interpolated (c) 263th - reconstructed (d) 264th - interpolated (e) 261th - reconstructed (f ) 262th - interpolated (g) 263th - interpolated (h) 264th - interpolated [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative result of Ours(Scaffold-GS)’s frame interpolation on Flame Steak scene of N3DV dataset [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparison on Cook Spinach scene of N3DV dataset. FVV Qualitative Comparisons From [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative comparison on Flame Salmon scene of N3DV dataset. (a) StreamSTGS [29] (b) QUEEN [17] (c) HiCoM [15] (d) Ours (3DGS) (e) GIFStream [37] (f ) iFVC [71] (g) Ours(ScaffoldGS) (h) Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗

**Figure 12.** Figure 12: Qualitative comparison on Flame Steak scene of N3DV dataset. (a) StreamSTGS [29] (b) QUEEN [17] (c) HiCoM [15] (d) Ours (3DGS) (e) GIFStream [37] (f ) iFVC [71] (g) Ours(ScaffoldGS) (h) Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p033_12.png] view at source ↗

**Figure 13.** Figure 13: Qualitative comparison on Sear Steak scene of N3DV dataset. (a) StreamSTGS [29] (b) QUEEN [17] (c) HiCoM [15] (d) Ours (3DGS) (e) GIFStream [37] (f ) iFVC [71] (g) Ours(ScaffoldGS) (h) Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗

**Figure 14.** Figure 14: Qualitative comparison on Discussion scene of MeetRoom dataset [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗

**Figure 15.** Figure 15: Qualitative comparison on Trimming scene of MeetRoom dataset. (a) StreamSTGS [29] (b) QUEEN [17] (c) HiCoM [15] (d) Ours (3DGS) (e) GIFStream [37] (f ) iFVC [71] (g) Ours(ScaffoldGS) (h) Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 16.** Figure 16: Qualitative comparison on Vrheadset scene of MeetRoom dataset. (a) iFVC [71] (b) Ours (Scaffold-GS) (c) Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p034_16.png] view at source ↗

**Figure 17.** Figure 17: Qualitative comparison on Dance boy city scene of WideRange4D dataset. (a) iFVC [71] (b) Ours (Scaffold-GS) (c) Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p034_17.png] view at source ↗

**Figure 18.** Figure 18: Qualitative comparison on Findfood rhinoceros village scene of WideRange4D dataset [PITH_FULL_IMAGE:figures/full_fig_p034_18.png] view at source ↗

read the original abstract

3D Gaussian Splatting~(3DGS) has emerged as a promising paradigm for reconstructing streamable free-viewpoint video~(FVV) from multi-view videos. However, 3DGS-based FVVs typically lack user interaction and editing capabilities, which diminishes the immersive experience. Recent research has integrated language features from CLIP into 3DGS via distillation, enabling open-vocabulary queries and supporting many downstream applications. Nevertheless, the stringent requirements of FVV, low frame size and high FPS, make current language Gaussian representations unsuitable for language-embedded FVV. In this paper, we propose DLGStream, a novel language-embedded FVV representation that streams time-varying language features alongside Gaussian attributes to support 4D environment interaction, scene editing, and spatial intelligence. Specifically, we propose a dual-opacity dynamic language Gaussian representation, which maintains two opacity attributes for color and language features to deal with performance degradation that occurs when colors and features are jointly optimized. Furthermore, we introduce an interpolation-based deformation field to reduce temporal redundancy. This deformation field can also be used for 4D frame interpolation, boosting FVV sequences from low to high FPS. Experimental results demonstrate that DLGStream achieves superior performance in both on open-vocabulary segmentation and reconstruction quality with an average frame size of merely 43 KB. The code is available on \href{https://github.com/kkkzh/DLGStream}{https://github.com/kkkzh/DLGStream}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Dual-opacity Gaussians plus an interpolation deformation field is the concrete new piece, but the abstract supplies no numbers or ablations to show the mechanisms actually deliver the claimed gains.

read the letter

The new elements are the dual-opacity attributes (separate for color and language) to sidestep joint-optimization degradation, and the shared interpolation-based deformation field that both compresses temporal redundancy and supports frame upsampling. These are presented as direct responses to the low-frame-size and high-FPS demands of language-embedded FVV.

The paper does a clear job naming the practical constraints that existing language Gaussian methods hit when moved to streaming video. The dual-opacity choice is a straightforward engineering response to the color-feature conflict, and reusing the deformation field for interpolation is efficient on paper.

The main weakness is that the abstract asserts superior open-vocabulary segmentation, better reconstruction, and 43 KB frames without any tables, baselines, ablations, or per-modality consistency checks. The stress-test concern lands: the headline results depend on the two opacities fully decoupling the objectives and the deformation field introducing no view- or time-inconsistent artifacts in the language features, yet nothing in the text demonstrates either. Without those measurements it is impossible to tell whether residual cross-talk or drift remains.

This is aimed at people working on practical 3D video streaming and language-guided interaction in graphics and vision. A reader who needs the specific dual-opacity and deformation tricks for their own pipeline could extract value once the experiments are filled in. The work is coherent on its own terms and the ideas are grounded enough to warrant referee time, even if heavy revision on the validation side is likely.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces DLGStream, a dynamic language-embedded 3D Gaussian Splatting representation for open-vocabulary free-viewpoint video (FVV) streaming. It proposes maintaining separate opacity attributes for color and language features to avoid degradation from joint optimization, plus an interpolation-based deformation field to compress temporal redundancy and enable 4D frame interpolation. The central claim is that this yields superior open-vocabulary segmentation and reconstruction quality at an average 43 KB per frame, with code released publicly.

Significance. If the performance claims are substantiated, the approach would enable practical language-queryable, editable FVV streaming at low bandwidth, with direct relevance to interactive AR/VR and spatial applications. The public code release is a clear strength for reproducibility.

major comments (3)

[Abstract] Abstract: The headline claim of 'superior performance in both open-vocabulary segmentation and reconstruction quality' with a 43 KB average frame size is presented without any quantitative tables, baselines, ablation studies, or error metrics, rendering the central empirical contribution impossible to assess from the manuscript text.
[Abstract] Abstract: The dual-opacity mechanism is asserted to 'deal with performance degradation' from joint color-language optimization, yet no comparison to a shared-opacity baseline, no per-modality consistency metrics, and no ablation isolating cross-talk are supplied; this assumption is load-bearing for the claim of no fidelity loss in either modality.
[Abstract] Abstract: The interpolation-based deformation field is stated to reduce temporal redundancy 'without introducing artifacts' while supporting high-FPS interpolation, but the text provides no view-consistency, time-consistency, or reconstruction-error metrics to verify this, which directly underpins the 43 KB/frame and high-FPS claims.

minor comments (1)

[Abstract] Abstract: The phrase 'both on open-vocabulary segmentation' contains an apparent typo and should be corrected for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments focused on the abstract. The full manuscript contains experimental results in Section 4, but we agree that the abstract presentation can be strengthened by better referencing these details and by adding targeted ablations and metrics where they are currently limited.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of 'superior performance in both open-vocabulary segmentation and reconstruction quality' with a 43 KB average frame size is presented without any quantitative tables, baselines, ablation studies, or error metrics, rendering the central empirical contribution impossible to assess from the manuscript text.

Authors: The abstract is a high-level summary; the supporting quantitative tables, baseline comparisons, ablation studies, and error metrics for segmentation and reconstruction quality (including the 43 KB/frame figure) appear in Section 4 of the manuscript. We will revise the abstract to explicitly reference key metrics and point readers to the experimental section for assessment. revision: yes
Referee: [Abstract] Abstract: The dual-opacity mechanism is asserted to 'deal with performance degradation' from joint color-language optimization, yet no comparison to a shared-opacity baseline, no per-modality consistency metrics, and no ablation isolating cross-talk are supplied; this assumption is load-bearing for the claim of no fidelity loss in either modality.

Authors: Section 3.2 describes the dual-opacity design and its motivation. We acknowledge that an explicit shared-opacity baseline comparison and cross-talk metrics are not currently present and will add a dedicated ablation study with per-modality consistency metrics in the revised manuscript. revision: yes
Referee: [Abstract] Abstract: The interpolation-based deformation field is stated to reduce temporal redundancy 'without introducing artifacts' while supporting high-FPS interpolation, but the text provides no view-consistency, time-consistency, or reconstruction-error metrics to verify this, which directly underpins the 43 KB/frame and high-FPS claims.

Authors: Section 3.3 introduces the interpolation deformation field. While some supporting results exist in the experiments, we agree that dedicated view-consistency, time-consistency, and reconstruction-error metrics are needed to fully substantiate the claims and will include them in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical validation rather than self-referential definitions or fitted predictions.

full rationale

The paper introduces dual-opacity attributes and an interpolation-based deformation field as architectural choices to address joint optimization issues and temporal redundancy in language-embedded 3DGS for FVV. These are presented as design decisions, with performance claims supported by experimental results on open-vocabulary segmentation and reconstruction quality (average 43 KB/frame). No equations, self-definitional reductions, fitted-input predictions, or load-bearing self-citations appear in the provided text that would make outputs equivalent to inputs by construction. The derivation chain is self-contained against external benchmarks and does not reduce the reported gains to tautologies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method description implies standard neural-network hyperparameters and CLIP features but does not enumerate them.

pith-pipeline@v0.9.1-grok · 5806 in / 1043 out tokens · 26717 ms · 2026-06-30T10:09:26.564398+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

114 extracted references · 6 canonical work pages

[1]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition

Attal, B., Huang, J.B., Richardt, C., Zollhoefer, M., Kopf, J., O’Toole, M., Kim, C.: Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 16610–16620 (2023)

2023
[2]

In: European Conference on Computer Vision

Bae, J., Kim, S., Yun, Y., Lee, H., Bang, G., Uh, Y.: Per-gaussian embedding- based deformation for deformable 3d gaussian splatting. In: European Conference on Computer Vision. pp. 321–335. Springer (2025)

2025
[3]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition

Cao, A., Johnson, J.: Hexplane: A fast representation for dynamic scenes. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 130–141 (2023)

2023
[4]

Chen, J., Mao, Q., Bao, Y., Meng, X., Meng, F., Wang, R., Liang, Y.: Motion matters:Compactgaussianstreamingforfree-viewpointvideoreconstruction.Ad- vances in Neural Information Processing Systems38, 120385–120409 (2026)

2026
[5]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Chen,J.,Hu,Z.,Wu,P.,Zhu,H.,Li,H.,Sun,X.:Dash:4dhashencodingwithself- supervised decomposition for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26349–26359 (2025)

2025
[6]

In: European Conference on Computer Vision

Chen, Y., Wu, Q., Lin, W., Harandi, M., Cai, J.: Hac: Hash-grid assisted context for 3d gaussian splatting compression. In: European Conference on Computer Vision. pp. 422–438. Springer (2024)

2024
[7]

In: 2021 IEEE/CVF International Con- ference on Computer Vision (ICCV)

Du, Y., Zhang, Y., Yu, H.X., Tenenbaum, J.B., Wu, J.: Neural radiance flow for 4d view synthesis and video processing. In: 2021 IEEE/CVF International Con- ference on Computer Vision (ICCV). pp. 14304–14314. IEEE Computer Society (2021)

2021
[8]

In: ACM SIG- GRAPH 2024 Conference Papers

Duan, Y., Wei, F., Dai, Q., He, Y., Chen, W., Chen, B.: 4d-rotor gaussian splat- ting: towards efficient novel view synthesis for dynamic scenes. In: ACM SIG- GRAPH 2024 Conference Papers. pp. 1–11 (2024)

2024
[9]

arXiv preprint arXiv:2312.00583 (2023)

Duisterhof, B.P., Mandi, Z., Yao, Y., Liu, J.W., Shou, M.Z., Song, S., Ichnowski, J.: Md-splatting: Learning metric deformation from 4d gaussians in highly de- formable scenes. arXiv preprint arXiv:2312.00583 (2023)

work page arXiv 2023
[10]

Advances in neural information processing systems37, 140138–140158 (2024)

Fan, Z., Wang, K., Wen, K., Zhu, Z., Xu, D., Wang, Z., et al.: Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. Advances in neural information processing systems37, 140138–140158 (2024)

2024
[11]

In: SIGGRAPH Asia 2022 Conference Papers

Fang, J., Yi, T., Wang, X., Xie, L., Zhang, X., Liu, W., Nießner, M., Tian, Q.: Fast dynamic radiance fields with time-aware neural voxels. In: SIGGRAPH Asia 2022 Conference Papers. pp. 1–9 (2022)

2022
[12]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: Explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12479– 12488 (2023)

2023
[13]

Advances in Neural Information Processing Systems38, 134751– 134778 (2026)

Fu, J., Gao, Q., Wen, C., Wu, Y., Ma, S., Zhang, J., Zhang, J.: Recon-gs: Continuum-preserved gaussian streaming for fast and compact reconstruction of dynamic scenes. Advances in Neural Information Processing Systems38, 134751– 134778 (2026)

2026
[14]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Gao, C., Saraf, A., Kopf, J., Huang, J.B.: Dynamic view synthesis from dynamic monocular video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5712–5721 (2021) DLGStream 17

2021
[15]

Advances in Neural Information Processing Systems (2024)

Gao, Q., Meng, J., Wen, C., Chen, J., Zhang, J.: Hicom: Hierarchical coherent motion for streamable dynamic scene with 3d gaussian splatting. Advances in Neural Information Processing Systems (2024)

2024
[16]

In: European Conference on Computer Vision

Girish, S., Gupta, K., Shrivastava, A.: Eagles: Efficient accelerated 3d gaussians with lightweight encodings. In: European Conference on Computer Vision. pp. 54–71. Springer (2024)

2024
[17]

Advances in Neural Information Processing Systems (2024)

Girish,S.,Li,T.,Mazumdar,A.,Shrivastava,A.,Luebke,D.,DeMello,S.:Queen: Quantized efficient encoding of dynamic gaussians for streaming free-viewpoint videos. Advances in Neural Information Processing Systems (2024)

2024
[18]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Guo, X., Sun, J., Dai, Y., Chen, G., Ye, X., Tan, X., Ding, E., Zhang, Y., Wang, J.: Forward flow for novel view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 16022–16033 (2023)

2023
[19]

Advances in Neural Information Processing Systems38, 114028–114053 (2026)

He, B., Chen, Y., Lu, G., Wang, Q., Gu, Q., Xie, R., Song, L., Zhang, W.: H3d- dgs:Exploringheterogeneous3dmotionrepresentationfordeformable3dgaussian splatting. Advances in Neural Information Processing Systems38, 114028–114053 (2026)

2026
[20]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

He, J., Li, C., Wang, S., Kwong, S.: Joint semantic and rendering enhance- ments in 3d gaussian modeling with anisotropic local encoding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 28354–28363 (2025)

2025
[21]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

Hu, Q., Zheng, Z., Zhong, H., Fu, S., Song, L., Zhai, G., Wang, Y., et al.: 4dgc: Rate-aware 4d gaussian compression for efficient streamable free-viewpoint video. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

2025
[22]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Hu, Q., Zhong, H., Zheng, Z., Zhang, X., Cheng, Z., Song, L., Zhai, G., Wang, Y.: Vrvvc: Variable-rate nerf-based volumetric video compression. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 3563–3571 (2025)

2025
[23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, Y.H., Sun, Y.T., Yang, Z., Lyu, X., Cao, Y.P., Qi, X.: Sc-gs: Sparse- controlled gaussian splatting for editable dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4220– 4230 (2024)

2024
[24]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Jain, S., Watson, D., Tabellion, E., Poole, B., Kontkanen, J., et al.: Video inter- polation with diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7341–7351 (2024)

2024
[25]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Jiang, D., Hou, Z., Ke, Z., Yang, X., Zhou, X., Qiu, T.: Timeformer: Capturing temporal relationships of deformable 3d gaussians for robust reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8721–8732 (2025)

2025
[26]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision

Jiao, S., Dong, H., Yin, Y., Jie, Z., Qian, Y., Zhao, Y., Shi, H., Wei, Y.: Clip- gs: Unifying vision-language representation with 3d gaussian splatting. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4670–4680 (2025)

2025
[27]

splat: Directly referring 3d gaussian splatting via direct language embedding registra- tion.In:ProceedingsoftheComputerVisionandPatternRecognitionConference

Jun-Seong, K., Kim, G., Yu-Ji, K., Wang, Y.C.F., Choe, J., Oh, T.H.: Dr. splat: Directly referring 3d gaussian splatting via direct language embedding registra- tion.In:ProceedingsoftheComputerVisionandPatternRecognitionConference. pp. 14137–14146 (2025)

2025
[28]

In: European Conference on Computer Vision

Katsumata, K., Vo, D.M., Nakayama, H.: A compact dynamic 3d gaussian rep- resentation for real-time dynamic view synthesis. In: European Conference on Computer Vision. pp. 394–412. Springer (2025) 18 Zhihui Ke et al

2025
[29]

Ke, Z., Liu, Y., Zhou, X., Qiu, T.: Streamstgs: Streaming spatial and temporal gaussian grids for real-time free-viewpoint video (2025)

2025
[30]

ACM transactions on graphics (TOG)42(4), 139–1 (2023)

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM transactions on graphics (TOG)42(4), 139–1 (2023)

2023
[31]

In: Proceedings of the IEEE/CVF international con- ference on computer vision

Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: Language embedded radiance fields. In: Proceedings of the IEEE/CVF international con- ference on computer vision. pp. 19729–19739 (2023)

2023
[32]

Advances in Neural Information Processing Systems (2024)

Kim, M., Lim, J., Han, B.: 4d gaussian splatting in the wild with uncertainty- aware regularization. Advances in Neural Information Processing Systems (2024)

2024
[33]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4015–4026 (2023)

2023
[34]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Kong, H., Yang, X., Wang, X.: Efficient gaussian splatting for monocular dynamic scene rendering via sparse time-variant attribute modeling. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 4374–4382 (2025)

2025
[35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lee, J.C., Rho, D., Sun, X., Ko, J.H., Park, E.: Compact 3d gaussian representa- tion for radiance field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21719–21728 (2024)

2024
[36]

Advances in Neural Information Processing Systems (2024)

Lee, J., Won, C.Y., Jung, H., Bae, I., Jeon, H.G.: Fully explicit dynamic gaussian splatting. Advances in Neural Information Processing Systems (2024)

2024
[37]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Li, H., Li, S., Gao, X., Batuer, A., Yu, L., Liao, Y.: Gifstream: 4d gaussian-based immersive video with feature stream. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 21761–21770 (2025)

2025
[38]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, J., Song, Z., Yang, B.: Trace: Learning 3d gaussian physical dynamics from multi-view videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8820–8829 (2025)

2025
[39]

Advances in Neural Information Processing Systems35, 13485–13498 (2022)

Li, L., Shen, Z., Wang, Z., Shen, L., Tan, P.: Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems35, 13485–13498 (2022)

2022
[40]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, T., Slavcheva, M., Zollhoefer, M., Green, S., Lassner, C., Kim, C., Schmidt, T., Lovegrove, S., Goesele, M., Newcombe, R., et al.: Neural 3d video synthesis from multi-view video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5521–5531 (2022)

2022
[41]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Li, W., Zhou, R., Zhou, J., Song, Y., Herter, J., Qin, M., Huang, G., Pfister, H.: 4d langsplat: 4d language gaussian splatting via multimodal large language models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22001–22011 (2025)

2025
[42]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Chen, Z., Li, Z., Xu, Y.: Spacetime gaussian feature splatting for real- time dynamic view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8508–8520 (2024)

2024
[43]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6498–6508 (2021)

2021
[44]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: Dynibar: Neural dynamic image-based rendering. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 4273–4284 (2023)

2023
[45]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025) DLGStream 19

Liang, Y., Xu, T., Kikuchi, Y.: Himor: Monocular deformable gaussian recon- struction with hierarchical motion representation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025) DLGStream 19

2025
[46]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lin, Y., Dai, Z., Zhu, S., Yao, Y.: Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21136–21145 (2024)

2024
[47]

In: Proceedings of the 32nd ACM International Conference on Multimedia

Liu, X., Wu, X., Zhang, P., Wang, S., Li, Z., Kwong, S.: Compgs: Efficient 3d scene representation via compressed gaussian splatting. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 2936–2944 (2024)

2024
[48]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, Y.L., Gao, C., Meuleman, A., Tseng, H.Y., Saraf, A., Kim, C., Chuang, Y.Y., Kopf, J., Huang, J.B.: Robust dynamic radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13–23 (2023)

2023
[49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lou, A., Planche, B., Gao, Z., Li, Y., Luan, T., Ding, H., Chen, T., Noble, J., Wu, Z.: Darenerf: Direction-aware representation for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5031–5042 (2024)

2024
[50]

Advances in Neural Information Processing Systems (2024)

Lu, J., Deng, J., Zhu, R., Liang, Y., Yang, W., Zhang, T., Zhou, X.: Dn-4dgs: De- noised deformable network with temporal-spatial aggregation for dynamic scene rendering. Advances in Neural Information Processing Systems (2024)

2024
[51]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lu, T., Yu, M., Xu, L., Xiangli, Y., Wang, L., Lin, D., Dai, B.: Scaffold-gs: Struc- tured 3d gaussians for view-adaptive rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20654–20664 (2024)

2024
[52]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

Lu,Y.,Zhou,Y.,Liu,D.,Liang,T.,Yin,Y.:Bard-gs:Blur-awarereconstructionof dynamic scenes via gaussian splatting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

2025
[53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lu, Z., Guo, X., Hui, L., Chen, T., Yang, M., Tang, X., Zhu, F., Dai, Y.: 3d geometry-aware deformable gaussian splatting for dynamic view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8900–8910 (2024)

2024
[54]

In: 2024 International Conference on 3D Vision (3DV)

Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In: 2024 International Conference on 3D Vision (3DV). pp. 800–809. IEEE (2024)

2024
[55]

In: SIGGRAPH Asia 2024 Conference Papers

Mallick,S.S.,Goel,R.,Kerbl,B.,Steinberger,M.,Carrasco,F.V.,DeLaTorre,F.: Taming 3dgs: High-quality radiance fields with limited resources. In: SIGGRAPH Asia 2024 Conference Papers. pp. 1–11 (2024)

2024
[56]

arXiv preprint arXiv:2309.03160 (2023)

Mihajlovic, M., Prokudin, S., Pollefeys, M., Tang, S.: Resfields: Residual neural fields for spatiotemporal signals. arXiv preprint arXiv:2309.03160 (2023)

work page arXiv 2023
[57]

In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

Niedermayr, S., Stumpfegger, J., Westermann, R.: Compressed 3d gaussian splat- ting for accelerated novel view synthesis. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 10349–10358 (2024)

2024
[58]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin-Brualla, R.: Nerfies: Deformable neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5865–5874 (2021)

2021
[59]

ACM transactions on graphics (TOG)40(6) (dec 2021)

Park, K., Sinha, U., Hedman, P., Barron, J.T., Bouaziz, S., Goldman, D.B., Martin-Brualla, R., Seitz, S.M.: Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM transactions on graphics (TOG)40(6) (dec 2021)

2021
[60]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Park, S., Son, M., Jang, S., Ahn, Y.C., Kim, J.Y., Kang, N.: Temporal inter- polation is all you need for dynamic neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4212–4221 (2023) 20 Zhihui Ke et al

2023
[61]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Peng, S., Yan, Y., Shuai, Q., Bao, H., Zhou, X.: Representing volumetric videos as dynamic mlp maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4252–4262 (2023)

2023
[62]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: Neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10318–10327 (2021)

2021
[63]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20051–20060 (2024)

2024
[64]

In: International conference on machine learning

Radford,A.,Kim,J.W.,Hallacy,C.,Ramesh,A.,Goh,G.,Agarwal,S.,Sastry,G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

2021
[65]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16632–16642 (2023)

2023
[66]

In: European Conference on Computer Vision

Shaw, R., Nazarczuk, M., Song, J., Moreau, A., Catley-Chandar, S., Dhamo, H., Pérez-Pellitero, E.: Swings: sliding windows for dynamic 3d gaussian splatting. In: European Conference on Computer Vision. Springer (2024)

2024
[67]

Advances in neural information process- ing systems36, 55919–55931 (2023)

Shin, S., Park, J.: Binary radiance fields. Advances in neural information process- ing systems36, 55919–55931 (2023)

2023
[68]

IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

Song, L., Chen, A., Li, Z., Chen, Z., Chen, L., Yuan, J., Xu, Y., Geiger, A.: Nerfplayer: A streamable dynamic scene representation with decomposed neu- ral radiance fields. IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

2023
[69]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Sun, J., Jiao, H., Li, G., Zhang, Z., Zhao, L., Xing, W.: 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20675–20685 (2024)

2024
[70]

In: Proceedings of the 32nd ACM International Conference on Multimedia

Sun, X., Lee, J.C., Rho, D., Ko, J.H., Ali, U., Park, E.: F-3dgs: Factorized coor- dinates and representations for 3d gaussian splatting. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 7957–7965 (2024)

2024
[71]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Tang, L., Yang, J., Peng, R., Zhai, Y., Shen, S., Wang, R.: Compressing stream- able free-viewpoint videos to 0.1 mb per frame. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 7257–7265 (2025)

2025
[72]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Tian, F., Du, S., Duan, Y.: Mononerf: Learning a generalizable dynamic radiance field from monocular videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 17903–17913 (2023)

2023
[73]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Tian, L., Li, X., Ma, L., Yin, H., Zheng, Z., Huang, H., Li, T., Lu, H., Jia, X.: Ccl-lgs: Contrastive codebook learning for 3d language gaussian splatting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9855–9864 (2025)

2025
[74]

In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 12959–12970 (2021)

2021
[75]

In: Forty-first International Conference on Machine Learning (2024) DLGStream 21

Wan,D.,Lu,R.,Zeng,G.:Superpointgaussiansplattingforreal-timehigh-fidelity dynamic scene reconstruction. In: Forty-first International Conference on Machine Learning (2024) DLGStream 21

2024
[76]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang,C.,MacDonald,L.E.,Jeni,L.A.,Lucey,S.:Flowsupervisionfordeformable nerf. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21128–21137 (2023)

2023
[77]

Advances in Neural Information Processing Systems36(2024)

Wang, F., Chen, Z., Wang, G., Song, Y., Liu, H.: Masked space-time hash encod- ing for efficient dynamic scene reconstruction. Advances in Neural Information Processing Systems36(2024)

2024
[78]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Wang, F., Tan, S., Li, X., Tian, Z., Song, Y., Liu, H.: Mixed neural voxels for fast multi-view video synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19706–19716 (2023)

2023
[79]

In: European Conference on Computer Vision

Wang, H., Zhu, H., He, T., Feng, R., Deng, J., Bian, J., Chen, Z.: End-to-end rate-distortion optimized 3d gaussian representation. In: European Conference on Computer Vision. pp. 76–92. Springer (2024)

2024
[80]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, L., Hu, Q., He, Q., Wang, Z., Yu, J., Tuytelaars, T., Xu, L., Wu, M.: Neural residual radiance fields for streamably free-viewpoint videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 76–87 (2023)

2023

Showing first 80 references.

[1] [1]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition

Attal, B., Huang, J.B., Richardt, C., Zollhoefer, M., Kopf, J., O’Toole, M., Kim, C.: Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 16610–16620 (2023)

2023

[2] [2]

In: European Conference on Computer Vision

Bae, J., Kim, S., Yun, Y., Lee, H., Bang, G., Uh, Y.: Per-gaussian embedding- based deformation for deformable 3d gaussian splatting. In: European Conference on Computer Vision. pp. 321–335. Springer (2025)

2025

[3] [3]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition

Cao, A., Johnson, J.: Hexplane: A fast representation for dynamic scenes. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 130–141 (2023)

2023

[4] [4]

Chen, J., Mao, Q., Bao, Y., Meng, X., Meng, F., Wang, R., Liang, Y.: Motion matters:Compactgaussianstreamingforfree-viewpointvideoreconstruction.Ad- vances in Neural Information Processing Systems38, 120385–120409 (2026)

2026

[5] [5]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Chen,J.,Hu,Z.,Wu,P.,Zhu,H.,Li,H.,Sun,X.:Dash:4dhashencodingwithself- supervised decomposition for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26349–26359 (2025)

2025

[6] [6]

In: European Conference on Computer Vision

Chen, Y., Wu, Q., Lin, W., Harandi, M., Cai, J.: Hac: Hash-grid assisted context for 3d gaussian splatting compression. In: European Conference on Computer Vision. pp. 422–438. Springer (2024)

2024

[7] [7]

In: 2021 IEEE/CVF International Con- ference on Computer Vision (ICCV)

Du, Y., Zhang, Y., Yu, H.X., Tenenbaum, J.B., Wu, J.: Neural radiance flow for 4d view synthesis and video processing. In: 2021 IEEE/CVF International Con- ference on Computer Vision (ICCV). pp. 14304–14314. IEEE Computer Society (2021)

2021

[8] [8]

In: ACM SIG- GRAPH 2024 Conference Papers

Duan, Y., Wei, F., Dai, Q., He, Y., Chen, W., Chen, B.: 4d-rotor gaussian splat- ting: towards efficient novel view synthesis for dynamic scenes. In: ACM SIG- GRAPH 2024 Conference Papers. pp. 1–11 (2024)

2024

[9] [9]

arXiv preprint arXiv:2312.00583 (2023)

Duisterhof, B.P., Mandi, Z., Yao, Y., Liu, J.W., Shou, M.Z., Song, S., Ichnowski, J.: Md-splatting: Learning metric deformation from 4d gaussians in highly de- formable scenes. arXiv preprint arXiv:2312.00583 (2023)

work page arXiv 2023

[10] [10]

Advances in neural information processing systems37, 140138–140158 (2024)

Fan, Z., Wang, K., Wen, K., Zhu, Z., Xu, D., Wang, Z., et al.: Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. Advances in neural information processing systems37, 140138–140158 (2024)

2024

[11] [11]

In: SIGGRAPH Asia 2022 Conference Papers

Fang, J., Yi, T., Wang, X., Xie, L., Zhang, X., Liu, W., Nießner, M., Tian, Q.: Fast dynamic radiance fields with time-aware neural voxels. In: SIGGRAPH Asia 2022 Conference Papers. pp. 1–9 (2022)

2022

[12] [12]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: Explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12479– 12488 (2023)

2023

[13] [13]

Advances in Neural Information Processing Systems38, 134751– 134778 (2026)

Fu, J., Gao, Q., Wen, C., Wu, Y., Ma, S., Zhang, J., Zhang, J.: Recon-gs: Continuum-preserved gaussian streaming for fast and compact reconstruction of dynamic scenes. Advances in Neural Information Processing Systems38, 134751– 134778 (2026)

2026

[14] [14]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Gao, C., Saraf, A., Kopf, J., Huang, J.B.: Dynamic view synthesis from dynamic monocular video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5712–5721 (2021) DLGStream 17

2021

[15] [15]

Advances in Neural Information Processing Systems (2024)

Gao, Q., Meng, J., Wen, C., Chen, J., Zhang, J.: Hicom: Hierarchical coherent motion for streamable dynamic scene with 3d gaussian splatting. Advances in Neural Information Processing Systems (2024)

2024

[16] [16]

In: European Conference on Computer Vision

Girish, S., Gupta, K., Shrivastava, A.: Eagles: Efficient accelerated 3d gaussians with lightweight encodings. In: European Conference on Computer Vision. pp. 54–71. Springer (2024)

2024

[17] [17]

Advances in Neural Information Processing Systems (2024)

Girish,S.,Li,T.,Mazumdar,A.,Shrivastava,A.,Luebke,D.,DeMello,S.:Queen: Quantized efficient encoding of dynamic gaussians for streaming free-viewpoint videos. Advances in Neural Information Processing Systems (2024)

2024

[18] [18]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Guo, X., Sun, J., Dai, Y., Chen, G., Ye, X., Tan, X., Ding, E., Zhang, Y., Wang, J.: Forward flow for novel view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 16022–16033 (2023)

2023

[19] [19]

Advances in Neural Information Processing Systems38, 114028–114053 (2026)

He, B., Chen, Y., Lu, G., Wang, Q., Gu, Q., Xie, R., Song, L., Zhang, W.: H3d- dgs:Exploringheterogeneous3dmotionrepresentationfordeformable3dgaussian splatting. Advances in Neural Information Processing Systems38, 114028–114053 (2026)

2026

[20] [20]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

He, J., Li, C., Wang, S., Kwong, S.: Joint semantic and rendering enhance- ments in 3d gaussian modeling with anisotropic local encoding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 28354–28363 (2025)

2025

[21] [21]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

Hu, Q., Zheng, Z., Zhong, H., Fu, S., Song, L., Zhai, G., Wang, Y., et al.: 4dgc: Rate-aware 4d gaussian compression for efficient streamable free-viewpoint video. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

2025

[22] [22]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Hu, Q., Zhong, H., Zheng, Z., Zhang, X., Cheng, Z., Song, L., Zhai, G., Wang, Y.: Vrvvc: Variable-rate nerf-based volumetric video compression. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 3563–3571 (2025)

2025

[23] [23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, Y.H., Sun, Y.T., Yang, Z., Lyu, X., Cao, Y.P., Qi, X.: Sc-gs: Sparse- controlled gaussian splatting for editable dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4220– 4230 (2024)

2024

[24] [24]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Jain, S., Watson, D., Tabellion, E., Poole, B., Kontkanen, J., et al.: Video inter- polation with diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7341–7351 (2024)

2024

[25] [25]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Jiang, D., Hou, Z., Ke, Z., Yang, X., Zhou, X., Qiu, T.: Timeformer: Capturing temporal relationships of deformable 3d gaussians for robust reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8721–8732 (2025)

2025

[26] [26]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision

Jiao, S., Dong, H., Yin, Y., Jie, Z., Qian, Y., Zhao, Y., Shi, H., Wei, Y.: Clip- gs: Unifying vision-language representation with 3d gaussian splatting. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4670–4680 (2025)

2025

[27] [27]

splat: Directly referring 3d gaussian splatting via direct language embedding registra- tion.In:ProceedingsoftheComputerVisionandPatternRecognitionConference

Jun-Seong, K., Kim, G., Yu-Ji, K., Wang, Y.C.F., Choe, J., Oh, T.H.: Dr. splat: Directly referring 3d gaussian splatting via direct language embedding registra- tion.In:ProceedingsoftheComputerVisionandPatternRecognitionConference. pp. 14137–14146 (2025)

2025

[28] [28]

In: European Conference on Computer Vision

Katsumata, K., Vo, D.M., Nakayama, H.: A compact dynamic 3d gaussian rep- resentation for real-time dynamic view synthesis. In: European Conference on Computer Vision. pp. 394–412. Springer (2025) 18 Zhihui Ke et al

2025

[29] [29]

Ke, Z., Liu, Y., Zhou, X., Qiu, T.: Streamstgs: Streaming spatial and temporal gaussian grids for real-time free-viewpoint video (2025)

2025

[30] [30]

ACM transactions on graphics (TOG)42(4), 139–1 (2023)

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM transactions on graphics (TOG)42(4), 139–1 (2023)

2023

[31] [31]

In: Proceedings of the IEEE/CVF international con- ference on computer vision

Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: Language embedded radiance fields. In: Proceedings of the IEEE/CVF international con- ference on computer vision. pp. 19729–19739 (2023)

2023

[32] [32]

Advances in Neural Information Processing Systems (2024)

Kim, M., Lim, J., Han, B.: 4d gaussian splatting in the wild with uncertainty- aware regularization. Advances in Neural Information Processing Systems (2024)

2024

[33] [33]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4015–4026 (2023)

2023

[34] [34]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Kong, H., Yang, X., Wang, X.: Efficient gaussian splatting for monocular dynamic scene rendering via sparse time-variant attribute modeling. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 4374–4382 (2025)

2025

[35] [35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lee, J.C., Rho, D., Sun, X., Ko, J.H., Park, E.: Compact 3d gaussian representa- tion for radiance field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21719–21728 (2024)

2024

[36] [36]

Advances in Neural Information Processing Systems (2024)

Lee, J., Won, C.Y., Jung, H., Bae, I., Jeon, H.G.: Fully explicit dynamic gaussian splatting. Advances in Neural Information Processing Systems (2024)

2024

[37] [37]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Li, H., Li, S., Gao, X., Batuer, A., Yu, L., Liao, Y.: Gifstream: 4d gaussian-based immersive video with feature stream. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 21761–21770 (2025)

2025

[38] [38]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, J., Song, Z., Yang, B.: Trace: Learning 3d gaussian physical dynamics from multi-view videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8820–8829 (2025)

2025

[39] [39]

Advances in Neural Information Processing Systems35, 13485–13498 (2022)

Li, L., Shen, Z., Wang, Z., Shen, L., Tan, P.: Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems35, 13485–13498 (2022)

2022

[40] [40]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, T., Slavcheva, M., Zollhoefer, M., Green, S., Lassner, C., Kim, C., Schmidt, T., Lovegrove, S., Goesele, M., Newcombe, R., et al.: Neural 3d video synthesis from multi-view video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5521–5531 (2022)

2022

[41] [41]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Li, W., Zhou, R., Zhou, J., Song, Y., Herter, J., Qin, M., Huang, G., Pfister, H.: 4d langsplat: 4d language gaussian splatting via multimodal large language models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22001–22011 (2025)

2025

[42] [42]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Chen, Z., Li, Z., Xu, Y.: Spacetime gaussian feature splatting for real- time dynamic view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8508–8520 (2024)

2024

[43] [43]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6498–6508 (2021)

2021

[44] [44]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: Dynibar: Neural dynamic image-based rendering. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 4273–4284 (2023)

2023

[45] [45]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025) DLGStream 19

Liang, Y., Xu, T., Kikuchi, Y.: Himor: Monocular deformable gaussian recon- struction with hierarchical motion representation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025) DLGStream 19

2025

[46] [46]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lin, Y., Dai, Z., Zhu, S., Yao, Y.: Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21136–21145 (2024)

2024

[47] [47]

In: Proceedings of the 32nd ACM International Conference on Multimedia

Liu, X., Wu, X., Zhang, P., Wang, S., Li, Z., Kwong, S.: Compgs: Efficient 3d scene representation via compressed gaussian splatting. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 2936–2944 (2024)

2024

[48] [48]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, Y.L., Gao, C., Meuleman, A., Tseng, H.Y., Saraf, A., Kim, C., Chuang, Y.Y., Kopf, J., Huang, J.B.: Robust dynamic radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13–23 (2023)

2023

[49] [49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lou, A., Planche, B., Gao, Z., Li, Y., Luan, T., Ding, H., Chen, T., Noble, J., Wu, Z.: Darenerf: Direction-aware representation for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5031–5042 (2024)

2024

[50] [50]

Advances in Neural Information Processing Systems (2024)

Lu, J., Deng, J., Zhu, R., Liang, Y., Yang, W., Zhang, T., Zhou, X.: Dn-4dgs: De- noised deformable network with temporal-spatial aggregation for dynamic scene rendering. Advances in Neural Information Processing Systems (2024)

2024

[51] [51]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lu, T., Yu, M., Xu, L., Xiangli, Y., Wang, L., Lin, D., Dai, B.: Scaffold-gs: Struc- tured 3d gaussians for view-adaptive rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20654–20664 (2024)

2024

[52] [52]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

Lu,Y.,Zhou,Y.,Liu,D.,Liang,T.,Yin,Y.:Bard-gs:Blur-awarereconstructionof dynamic scenes via gaussian splatting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025)

2025

[53] [53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lu, Z., Guo, X., Hui, L., Chen, T., Yang, M., Tang, X., Zhu, F., Dai, Y.: 3d geometry-aware deformable gaussian splatting for dynamic view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8900–8910 (2024)

2024

[54] [54]

In: 2024 International Conference on 3D Vision (3DV)

Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In: 2024 International Conference on 3D Vision (3DV). pp. 800–809. IEEE (2024)

2024

[55] [55]

In: SIGGRAPH Asia 2024 Conference Papers

Mallick,S.S.,Goel,R.,Kerbl,B.,Steinberger,M.,Carrasco,F.V.,DeLaTorre,F.: Taming 3dgs: High-quality radiance fields with limited resources. In: SIGGRAPH Asia 2024 Conference Papers. pp. 1–11 (2024)

2024

[56] [56]

arXiv preprint arXiv:2309.03160 (2023)

Mihajlovic, M., Prokudin, S., Pollefeys, M., Tang, S.: Resfields: Residual neural fields for spatiotemporal signals. arXiv preprint arXiv:2309.03160 (2023)

work page arXiv 2023

[57] [57]

In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

Niedermayr, S., Stumpfegger, J., Westermann, R.: Compressed 3d gaussian splat- ting for accelerated novel view synthesis. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 10349–10358 (2024)

2024

[58] [58]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin-Brualla, R.: Nerfies: Deformable neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5865–5874 (2021)

2021

[59] [59]

ACM transactions on graphics (TOG)40(6) (dec 2021)

Park, K., Sinha, U., Hedman, P., Barron, J.T., Bouaziz, S., Goldman, D.B., Martin-Brualla, R., Seitz, S.M.: Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM transactions on graphics (TOG)40(6) (dec 2021)

2021

[60] [60]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Park, S., Son, M., Jang, S., Ahn, Y.C., Kim, J.Y., Kang, N.: Temporal inter- polation is all you need for dynamic neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4212–4221 (2023) 20 Zhihui Ke et al

2023

[61] [61]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Peng, S., Yan, Y., Shuai, Q., Bao, H., Zhou, X.: Representing volumetric videos as dynamic mlp maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4252–4262 (2023)

2023

[62] [62]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: Neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10318–10327 (2021)

2021

[63] [63]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20051–20060 (2024)

2024

[64] [64]

In: International conference on machine learning

Radford,A.,Kim,J.W.,Hallacy,C.,Ramesh,A.,Goh,G.,Agarwal,S.,Sastry,G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

2021

[65] [65]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16632–16642 (2023)

2023

[66] [66]

In: European Conference on Computer Vision

Shaw, R., Nazarczuk, M., Song, J., Moreau, A., Catley-Chandar, S., Dhamo, H., Pérez-Pellitero, E.: Swings: sliding windows for dynamic 3d gaussian splatting. In: European Conference on Computer Vision. Springer (2024)

2024

[67] [67]

Advances in neural information process- ing systems36, 55919–55931 (2023)

Shin, S., Park, J.: Binary radiance fields. Advances in neural information process- ing systems36, 55919–55931 (2023)

2023

[68] [68]

IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

Song, L., Chen, A., Li, Z., Chen, Z., Chen, L., Yuan, J., Xu, Y., Geiger, A.: Nerfplayer: A streamable dynamic scene representation with decomposed neu- ral radiance fields. IEEE Transactions on Visualization and Computer Graphics 29(5), 2732–2742 (2023)

2023

[69] [69]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Sun, J., Jiao, H., Li, G., Zhang, Z., Zhao, L., Xing, W.: 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20675–20685 (2024)

2024

[70] [70]

In: Proceedings of the 32nd ACM International Conference on Multimedia

Sun, X., Lee, J.C., Rho, D., Ko, J.H., Ali, U., Park, E.: F-3dgs: Factorized coor- dinates and representations for 3d gaussian splatting. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 7957–7965 (2024)

2024

[71] [71]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Tang, L., Yang, J., Peng, R., Zhai, Y., Shen, S., Wang, R.: Compressing stream- able free-viewpoint videos to 0.1 mb per frame. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 7257–7265 (2025)

2025

[72] [72]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Tian, F., Du, S., Duan, Y.: Mononerf: Learning a generalizable dynamic radiance field from monocular videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 17903–17913 (2023)

2023

[73] [73]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Tian, L., Li, X., Ma, L., Yin, H., Zheng, Z., Huang, H., Li, T., Lu, H., Jia, X.: Ccl-lgs: Contrastive codebook learning for 3d language gaussian splatting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9855–9864 (2025)

2025

[74] [74]

In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 12959–12970 (2021)

2021

[75] [75]

In: Forty-first International Conference on Machine Learning (2024) DLGStream 21

Wan,D.,Lu,R.,Zeng,G.:Superpointgaussiansplattingforreal-timehigh-fidelity dynamic scene reconstruction. In: Forty-first International Conference on Machine Learning (2024) DLGStream 21

2024

[76] [76]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang,C.,MacDonald,L.E.,Jeni,L.A.,Lucey,S.:Flowsupervisionfordeformable nerf. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21128–21137 (2023)

2023

[77] [77]

Advances in Neural Information Processing Systems36(2024)

Wang, F., Chen, Z., Wang, G., Song, Y., Liu, H.: Masked space-time hash encod- ing for efficient dynamic scene reconstruction. Advances in Neural Information Processing Systems36(2024)

2024

[78] [78]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Wang, F., Tan, S., Li, X., Tian, Z., Song, Y., Liu, H.: Mixed neural voxels for fast multi-view video synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19706–19716 (2023)

2023

[79] [79]

In: European Conference on Computer Vision

Wang, H., Zhu, H., He, T., Feng, R., Deng, J., Bian, J., Chen, Z.: End-to-end rate-distortion optimized 3d gaussian representation. In: European Conference on Computer Vision. pp. 76–92. Springer (2024)

2024

[80] [80]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, L., Hu, Q., He, Q., Wang, Z., Yu, J., Tuytelaars, T., Xu, L., Wu, M.: Neural residual radiance fields for streamably free-viewpoint videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 76–87 (2023)

2023