Recognition: no theorem link
CWRNN-INVR: A Coupled WarpRNN based Implicit Neural Video Representation
Pith reviewed 2026-05-10 18:36 UTC · model grok-4.3
The pith
Separating video into regular structure captured by a Coupled WarpRNN and irregular residuals captured by a mixed grid yields higher-fidelity implicit neural representations than either approach alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A Coupled WarpRNN-based multi-scale motion representation and compensation module explicitly encodes the regular and structured information in video, while a mixed residual grid encodes the remaining irregular appearance and motion; the two components are fused through network reuse to form an INVR that outperforms prior grid-only or network-only baselines.
What carries the argument
The Coupled WarpRNN multi-scale motion representation and compensation module, which extracts and compensates regular structured video content so it can be reused across frames.
If this is right
- The hybrid model can be used for video compression at lower bit rates than current INVR codecs while preserving the same reconstruction quality.
- Downstream tasks that rely on accurate motion fields, such as frame interpolation or video prediction, gain accuracy because the WarpRNN component supplies explicit multi-scale motion.
- The residual grid size can be scaled independently of the neural-network capacity, allowing flexible trade-offs between model size and fidelity for different video content types.
Where Pith is reading between the lines
- The same regular-versus-irregular split could be tested on other temporal signals such as audio or 3-D motion capture to see whether the architecture pattern generalizes.
- If the separation holds, future neural codecs might allocate fixed network capacity only to the structured component and let a lightweight grid handle content-specific residuals.
- A practical extension would be to learn the decision boundary between regular and irregular content on the fly rather than fixing it at design time.
Load-bearing premise
Video content can be cleanly partitioned into regular structured parts that a neural network captures without loss and irregular residual parts that a grid captures without loss, and the two parts can be added back together through simple network reuse.
What would settle it
On the UVG dataset, a pure grid INVR or a pure neural-network INVR of identical total parameter count (3 M) that achieves higher average PSNR than 33.73 dB would falsify the claimed advantage of the separation.
Figures
read the original abstract
Implicit Neural Video Representation (INVR) has emerged as a novel approach for video representation and compression, using learnable grids and neural networks. Existing methods focus on developing new grid structures efficient for latent representation and neural network architectures with large representation capability, lacking the study on their roles in video representation. In this paper, the difference between INVR based on neural network and INVR based on grid is first investigated from the perspective of video information composition to specify their own advantages, i.e., neural network for general structure while grid for specific detail. Accordingly, an INVR based on mixed neural network and residual grid framework is proposed, where the neural network is used to represent the regular and structured information and the residual grid is used to represent the remaining irregular information in a video. A Coupled WarpRNN-based multi-scale motion representation and compensation module is specifically designed to explicitly represent the regular and structured information, thus terming our method as CWRNN-INVR. For the irregular information, a mixed residual grid is learned where the irregular appearance and motion information are represented together. The mixed residual grid can be combined with the coupled WarpRNN in a way that allows for network reuse. Experiments show that our method achieves the best reconstruction results compared with the existing methods, with an average PSNR of 33.73 dB on the UVG dataset under the 3M model and outperforms existing INVR methods in other downstream tasks. The code can be found at https://github.com/yiyang-sdu/CWRNN-INVR.git}{https://github.com/yiyang-sdu/CWRNN-INVR.git.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CWRNN-INVR, a mixed implicit neural video representation framework that assigns regular/structured video content to a Coupled WarpRNN module for multi-scale motion representation and compensation, while assigning irregular appearance and motion residuals to a learnable mixed residual grid. The two components are combined to permit network reuse. The authors first analyze differences between pure NN-based and grid-based INVR approaches, then report that the method achieves the highest reconstruction quality among compared INVR techniques, with an average PSNR of 33.73 dB on the UVG dataset at a 3M-parameter budget, and also improves performance on downstream tasks.
Significance. If the claimed separation of video information into orthogonal regular and irregular components is shown to hold and the reported gains are not simply due to increased capacity or training details, the work would offer a concrete design principle for allocating representational roles between neural networks and grids in INVR. This could improve parameter efficiency in video compression and support better generalization in downstream applications such as interpolation or editing. The public code release is a positive factor for reproducibility.
major comments (3)
- [§3 (framework description)] The central premise that video content cleanly factors into regular structure best captured by the Coupled WarpRNN and irregular residuals best captured by the mixed grid (with recombination preserving fidelity) is load-bearing for the superiority claim, yet no quantitative validation—such as separate PSNR or motion-compensation error for each component, or an orthogonality metric—is provided in the method or experiments sections.
- [§4.1 and Table 1] Table 1 (UVG results) and the abstract report 33.73 dB average PSNR at 3M parameters as outperforming prior INVR methods, but the manuscript does not list the specific baseline numbers, model sizes, or training protocols for those methods, nor any ablation removing the coupling or reuse mechanism; without these, attribution of gains to the proposed roles versus capacity increases cannot be verified.
- [§4.2] The claim of improved performance on downstream tasks is stated without accompanying tables, metrics, or experimental protocols; this leaves the generalization benefit unsupported and prevents assessment of whether the mixed-grid reuse introduces artifacts in tasks such as frame interpolation or editing.
minor comments (2)
- [Abstract] The GitHub URL in the abstract is duplicated with a stray closing brace, indicating a LaTeX formatting error.
- [§3.2] Notation for the multi-scale motion compensation inside the Coupled WarpRNN (e.g., how warp fields at different scales are aggregated) is introduced without an accompanying equation or diagram clarifying the reuse path with the residual grid.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which highlights important areas for strengthening the manuscript's claims and experimental rigor. We address each major comment below and will incorporate revisions to provide the requested validations and details.
read point-by-point responses
-
Referee: [§3 (framework description)] The central premise that video content cleanly factors into regular structure best captured by the Coupled WarpRNN and irregular residuals best captured by the mixed grid (with recombination preserving fidelity) is load-bearing for the superiority claim, yet no quantitative validation—such as separate PSNR or motion-compensation error for each component, or an orthogonality metric—is provided in the method or experiments sections.
Authors: We acknowledge that Section 3 offers a qualitative investigation into the differing strengths of neural-network-based and grid-based INVR approaches for structured versus irregular video content, but does not include direct quantitative metrics such as component-wise PSNR, motion-compensation errors, or an orthogonality measure. In the revised manuscript we will add an ablation study that reports reconstruction PSNR and motion-compensation accuracy for the Coupled WarpRNN module alone, the mixed residual grid alone, and the full combined model. This will supply the empirical validation requested for the proposed factorization. revision: yes
-
Referee: [§4.1 and Table 1] Table 1 (UVG results) and the abstract report 33.73 dB average PSNR at 3M parameters as outperforming prior INVR methods, but the manuscript does not list the specific baseline numbers, model sizes, or training protocols for those methods, nor any ablation removing the coupling or reuse mechanism; without these, attribution of gains to the proposed roles versus capacity increases cannot be verified.
Authors: We appreciate this observation. While Table 1 presents our 33.73 dB result at the 3 M parameter budget, we will expand the table to list the exact PSNR values, parameter counts, and citations to the original training protocols of all compared INVR baselines. In addition, we will insert a new ablation subsection that disables the coupling within WarpRNN and the network-reuse mechanism, reporting the resulting performance drop to isolate the contribution of these design elements from any capacity differences. revision: yes
-
Referee: [§4.2] The claim of improved performance on downstream tasks is stated without accompanying tables, metrics, or experimental protocols; this leaves the generalization benefit unsupported and prevents assessment of whether the mixed-grid reuse introduces artifacts in tasks such as frame interpolation or editing.
Authors: We agree that the downstream-task claims require explicit experimental support. In the revision we will add a dedicated subsection to §4.2 that details the protocols for frame interpolation and editing tasks, presents quantitative tables (PSNR, perceptual metrics, and artifact analysis), and compares against the same baselines. This will substantiate the generalization benefit and allow evaluation of any potential artifacts arising from mixed-grid reuse. revision: yes
Circularity Check
No circularity: empirical architecture proposal validated by experiments
full rationale
The paper introduces CWRNN-INVR as a mixed NN-plus-residual-grid INVR framework after investigating NN vs. grid roles in video content, but presents no mathematical derivation chain, equations, or predictions that reduce to fitted inputs or self-definitions by construction. Central performance claims (e.g., 33.73 dB PSNR on UVG) rest on direct empirical comparisons to prior INVR methods rather than any self-referential logic, uniqueness theorem, or ansatz smuggled via citation. The decomposition premise is stated as an investigative finding leading to design choices, not as a tautological input-output equivalence. This is a standard empirical ML paper with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Nerf: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. “Nerf: Representing scenes as neural radiance fields for view synthesis,” inCommunications of the ACM, New York, USA, pp. 99–106, 2021
2021
-
[2]
Nerv: Neural representations for videos,
H. Chen, B. He, H. Wang, Y. Ren, S. N. Lim, and A. Shrivastava,“Nerv: Neural representations for videos,” inAdvances in Neural Information Processing Systems, vol. 34,pp. 21557–21568, 2021. 10
2021
-
[3]
Hnerv: A hybrid neural representation for videos,
H. Chen, and M. Gwilliam, S. N. Lim, and A. Shrivastava. “Hnerv: A hybrid neural representation for videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10270– 10279, 2023
2023
-
[4]
Tensorf: Tensorial radiance fields,
A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su. “Tensorf: Tensorial radiance fields,” inEuropean conference on computer vision, Springer, pp. 333–350, 2022
2022
-
[5]
Deep contextual video compression,
J. Li, B. Li, and Y. Lu. “Deep contextual video compression,” inAdvances in Neural Information Processing Systems, vol. 34, pp. 18114–18125, 2021
2021
-
[6]
Towards scalable neural representation for diverse videos,
B. He, X. Yang, H. Wang, Z. Wu, H. Chen, S. Huang, Y. Ren, S. N. Lim, and A. Shrivastava. “Towards scalable neural representation for diverse videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6132–6142, 2023
2023
-
[7]
E-nerv: Expedite neural video representation with disentangled spatial-temporal context,
Z. Li, M. Wang, H. Pi, K. Xu, J. Mei, and Y. Liu. “E-nerv: Expedite neural video representation with disentangled spatial-temporal context,” in European Conference on Computer Vision, pp. 267–284, 2022
2022
-
[8]
Implicit neural representations with periodic activation functions,
V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein. “Implicit neural representations with periodic activation functions,” inAdvances in neural information processing systems, vol. 33, pp. 7462–7473, 2020
2020
-
[9]
Nirvana: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling,
S. R. Maiya, S. Girish, M. Ehrlich, H. Wang, K. S. Lee, P. Poirson, P. Wu, C. Wang, and A. Shrivastava. “Nirvana: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14378–14387, 2023
2023
-
[10]
Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity,
X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao. “Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity,” inIEEE transactions on image processing, vol. 22, no. 12, pp. 4613–4626, 2013
2013
-
[11]
Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation,
T. Zhao, Y. Huang, W. Feng, Y. Xu, and S. Kwong. “Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation,” in IEEE Transactions on Multimedia, vol. 25, pp. 6411-6421, 2023
2023
-
[12]
Coin: Compression with implicit neural representations.arXiv preprint arXiv:2103.03123, 2021
E. Dupont, A. Goli ´nski, M. Alizadeh, Y. W. Teh, and A. Doucet. “Coin: Compression with implicit neural representations,” inarXiv preprint arXiv:2103.03123, 2021
-
[13]
Dnerv: Modeling inherent dynamics via difference neural representation for videos,
Q. Zhao, M. S. Asif, and Z. Ma. “Dnerv: Modeling inherent dynamics via difference neural representation for videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2031–2040, 2023
2031
-
[14]
Width- Adaptive CNN: Fast CU Partition Prediction for VVC Screen Content Coding,
C. Jiao, H. Zeng, J. Chen, C.-H. Hsia, T. Wang, and K. K. Ma. “Width- Adaptive CNN: Fast CU Partition Prediction for VVC Screen Content Coding,” inIEEE Transactions on Multimedia, vol.26, pp. 9372-9382, 2024
2024
-
[15]
Ffnerv: Flow-guided frame- wise neural representations for videos,
J. C. Lee, D. Rho, J. H. Ko, and E. Park. “Ffnerv: Flow-guided frame- wise neural representations for videos,” inProceedings of the 31st ACM International Conference on Multimedia, pp. 7859–7870, 2023
2023
-
[16]
K-planes: Explicit radiance fields in space, time, and appearance,
S. Fridovich-Keil, G. Meanti, F. R. Warburg, B. Recht, and A. Kanazawa. “K-planes: Explicit radiance fields in space, time, and appearance,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Chicago, Illinois, USA, pp. 12479–12488, 2023
2023
-
[17]
Ps-nerv: Patch-wise stylized neural representations for videos,
Y. Bai, C. Dong, C. Wang, and C. Yuan. “Ps-nerv: Patch-wise stylized neural representations for videos,” in2023 IEEE International Conference on Image Processing (ICIP), pp. 41–45, 2023
2023
-
[18]
RDVC: Efficient Deep Video Compression with Regulable Rate and Complexity Optimization,
X. Wei, J. Lin, J. Xu, W. Gao, and T. Zhao. “RDVC: Efficient Deep Video Compression with Regulable Rate and Complexity Optimization,” inIEEE Transactions on Multimedia, pp. 1-12, 2025
2025
-
[19]
Entropy-constrained implicit neural representations for deep image compression,
S. Lee, J. B. Jeong, and E. S. Ryu. “Entropy-constrained implicit neural representations for deep image compression,” inIEEE Signal Processing Letters, vol. 30, pp. 663–667, 2023
2023
-
[20]
Fast Intra Mode Decision Algo- rithm for Versatile Video Coding,
X. Dong, L. Shen, M. Yu, and H. Yang. “Fast Intra Mode Decision Algo- rithm for Versatile Video Coding,” inIEEE Transactions on Multimedia, vol.24, pp. 400-414, 2022
2022
-
[21]
Overview of the high efficiency video coding (HEVC) standard,
G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand. “Overview of the high efficiency video coding (HEVC) standard,” inIEEE Transactions on circuits and systems for video technology, vol. 22, no.12, pp. 1649–1668, 2012
2012
-
[22]
Instant neural graphics primitives with a multiresolution hash encoding,
T. M¨ uller, A. Evans, C. Schied, and A. Keller. “Instant neural graphics primitives with a multiresolution hash encoding,” inACM transactions on graphics (TOG), New York, USA, vol. 41, no.4, pp. 1–15, 2022
2022
-
[23]
Enhanced Context Mining and Filtering for Learned Video Compression,
H. Guo, S. Kwong, D. Ye, and S. Wang. “Enhanced Context Mining and Filtering for Learned Video Compression,” inIEEE Transactions on Multimedia, vol.26, pp. 3814-3826, 2024
2024
-
[24]
UVG dataset: 50/120fps 4K sequences for video codec analysis and development,
A. Mercat, M. Viitanen, and J. Vanne. “UVG dataset: 50/120fps 4K sequences for video codec analysis and development,” inProceedings of the 11th ACM Multimedia Systems Conference, pp. 297–302, 2020
2020
-
[25]
A benchmark dataset and evaluation methodology for video object segmentation,
F. Perazzi, J. P. Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. S. Hornung. “A benchmark dataset and evaluation methodology for video object segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 724–732, 2016
2016
-
[26]
Big buck bunny,
T. Roosendaal. “Big buck bunny,” inACM SIGGRAPH ASIA 2008 computer animation festival, pp. 62–62, 2008
2008
-
[27]
Efficient Chroma Intra Prediction via Exemplar Colorization Network for Versatile Video Coding,
Z. Pan, J. Chen, B. Peng, J. Lei, F. L. Wang, N. Ling, and S. Kwong. “Efficient Chroma Intra Prediction via Exemplar Colorization Network for Versatile Video Coding,” inIEEE Transactions on Multimedia, pp. 1-13, 2025
2025
-
[28]
Scale-space flow for end-to-end optimized video compression,
E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici. “Scale-space flow for end-to-end optimized video compression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8503–8512, 2020
2020
-
[29]
Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,
J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 5855–5864, 2021
2021
-
[30]
Scene matters: Model-based deep video compression,
L. Tang, X. Zhang, G. Zhang, and X. Ma. “Scene matters: Model-based deep video compression,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12481–12491, 2023
2023
-
[31]
Overview of the versatile video coding (VVC) standard and its applications,
B. Bross, Y. K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J. R. Ohm. “Overview of the versatile video coding (VVC) standard and its applications,” inIEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021
2021
-
[32]
Image and video compression with neural networks: A review,
S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wang. “Image and video compression with neural networks: A review,” inIEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1683– 1698, 2019
2019
-
[33]
Content- aware convolutional neural network for in-loop filtering in high efficiency video coding,
C. Jia, S. Wang, X. Zhang, S. Wang, J. Liu, S. Pu, and S. Ma. “Content- aware convolutional neural network for in-loop filtering in high efficiency video coding,” inIEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3343–3356, 2019
2019
-
[34]
DMVC: Decomposed motion modeling for learned video compression,
K. Lin, C. Jia, X. Zhang, S. Wang, S. Ma, and W. Gao. “DMVC: Decomposed motion modeling for learned video compression,” inIEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 7, pp. 3502–3515, 2022
2022
-
[35]
Temporal Context Mining for Learned Video Compression,
X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu. “Temporal Context Mining for Learned Video Compression,” inIEEE Transactions on Multimedia, vol. 25, pp. 7311-7322, 2023
2023
-
[36]
Independently recurrent neural network (indrnn): Building a longer and deeper rnn,
S. Li, W. Li, C. Cook, C. Zhu, and Y. Gao. “Independently recurrent neural network (indrnn): Building a longer and deeper rnn,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 5457–5466, 2018
2018
-
[37]
Learned Video Com- pression With Efficient Temporal Context Learning,
D. Jin, J. Lei, B. Peng, Z. Pan, L. Li, and N. Ling. “Learned Video Com- pression With Efficient Temporal Context Learning,” inIEEE Transactions on Image Processing, vol. 32, pp. 3188-3198, 2023
2023
-
[38]
Gradient-based early termination of CU partition in VVC intra coding,
J. Cui, T. Zhang, C. Gu, X. Zhang, and S. Ma. “Gradient-based early termination of CU partition in VVC intra coding,” in2020 Data Compression Conference (DCC), pp. 103–112, 2020
2020
-
[39]
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. “Empirical evaluation of gated recurrent neural networks on sequence modeling,” inarXiv preprint arXiv:1412.3555, 2014
work page internal anchor Pith review arXiv 2014
-
[40]
Dvc: An end-to- end deep video compression framework,
G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao. “Dvc: An end-to- end deep video compression framework,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11006–11015, 2019
2019
-
[41]
PNeRV: A Polynomial Neural Representation for Videos,
S. Gupta, S. S. Tomar, G. G. Chrysos, S. Das, and A. N. Rajagopalan. “PNeRV: A Polynomial Neural Representation for Videos,” inTransactions on Machine Learning Research, 2024
2024
-
[42]
Depth Video Inter Coding Based on Deep Frame Generation,
G. Li, J. Lei, Z. Pan, B. Peng, and N. Ling. “Depth Video Inter Coding Based on Deep Frame Generation,” inIEEE Transactions on Broadcasting, vol. 70, no. 2, pp. 708-718, 2024
2024
-
[43]
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes,
H. Yan, Z. Ke, X. Zhou, T. Qiu, X. Shi, and D. Jiang. “DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23019–23029, 2024
2024
-
[44]
JND-LIC: Learned Image Compression via Just Noticeable Difference for Human Visual Perception,
Z. Pan, G. Zhang, B. Peng, J. Lei, H. Xie, F. L. Wang, and N. Ling. “JND-LIC: Learned Image Compression via Just Noticeable Difference for Human Visual Perception,” inIEEE Transactions on Broadcasting, pp. 1-12, 2024
2024
-
[45]
Learning for video compression with hierarchical quality and recurrent enhancement,
R. Yang, F. Mentzer, L. V. Gool, and R. Timofte. “Learning for video compression with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6628–6637, 2020
2020
-
[46]
An overview of core coding tools in the A V1 video codec,
Y. Chen, D. Murherjee, J. Han, A. Grange, Y. Xu, Z. Liu, S. Parker, C. Chen, H. Su, U. Joshi, and others. “An overview of core coding tools in the A V1 video codec,” in2018 picture coding symposium (PCS), pp. 41–45, 2018. 11
2018
-
[47]
𝜆-Domain Rate Control via Wavelet-Based Residual Neural Network for VVC HDR Intra Coding,
F. Yuan, J. Lei, Z. Pan, B. Peng, and H. Xie. “𝜆-Domain Rate Control via Wavelet-Based Residual Neural Network for VVC HDR Intra Coding,” in IEEE Transactions on Image Processing, vol. 33, pp. 6189-6203, 2024
2024
-
[48]
Advancing Generalizable Occlusion Modeling for Neural Human Radiance Field,
B. Liu, J. Lei, B. Peng, Z. Zhang, J. Zhu, and Q. Huang. “Advancing Generalizable Occlusion Modeling for Neural Human Radiance Field,” in IEEE Transactions on Multimedia, pp. 1-12, 2024
2024
-
[49]
Long short-term memory,
A. Graves and A. Graves. “Long short-term memory,” inSupervised sequence labelling with recurrent neural networks, pp. 37–45, 2012
2012
-
[50]
Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization,
Z. Zhang, G. Lu, H. Liang, A. Tang, Q. Hu, and L. Song. “Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization,” inarXiv preprint arXiv:2402.01380, 2024
-
[51]
Long Short-term Memory,
Hochreiter, S. “Long Short-term Memory,” inNeural Computation MIT- Press, 1997
1997
-
[52]
Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,
L. Song, A. Chen, Z. Li, Z. Chen, L. Chen, J. Yuan, Y. Xu, and A. Geiger. “Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,” inIEEE Transactions on Visualization and Computer Graphics, vol.29, no. 5, pp. 2732–2742, 2023
2023
-
[53]
Tinc: Tree-structured implicit neural compression,
R. Yang. “Tinc: Tree-structured implicit neural compression,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18517–18526, 2023
2023
-
[54]
Adam: A method for stochastic optimization 3rd International Conference on Learning Representations,
D. P. Kingma and J. L. Ba. “Adam: A method for stochastic optimization 3rd International Conference on Learning Representations,” inICLR 2015- Conference Track Proceedings, vol. 1, 2015
2015
-
[55]
Stochastic gradient descent with warm restarts,
I. Loshchilov and F. Hutter. “Stochastic gradient descent with warm restarts,” inProceedings of the 5th International Conference on Learning Representations, pp. 1–16
-
[56]
D-nerf: Neural radiance fields for dynamic scenes,
A. Pumarola, E. Corona, G. P. Moll, and F. M. Noguer. “D-nerf: Neural radiance fields for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10318– 10327, 2021
2021
-
[57]
Convolutional LSTM network: A machine learning approach for precipitation nowcasting,
X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo. “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” inAdvances in neural information processing systems, vol. 28, 2015
2015
-
[58]
FVC: A new framework towards deep video compression in feature space,
Z. Hu, G. Lu, and D. Xu. “FVC: A new framework towards deep video compression in feature space,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1502–1511, 2021
2021
-
[59]
arXiv preprint arXiv:2010.00951
T. K. Rusch and S. Mishra. “Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies,” inarXiv preprint arXiv:2010.00951, 2020
-
[60]
Neural residual radiance fields for streamably free-viewpoint videos,
L. Wang, Q. Hu, Q. He, Z. Wang, J. Yu, T. Tuytelaars, L. Xu, and M. Wu. “Neural residual radiance fields for streamably free-viewpoint videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 76–87, 2023
2023
-
[61]
Combining Frame and GOP Embeddings for Neural Video Representation,
J. E. Saethre, R. Azevedo, and C. Schroers. “Combining Frame and GOP Embeddings for Neural Video Representation,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9253-9263, 2024
2024
-
[62]
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams,
L. Wang, K. Yao, C. Guo, Z. Zhang, Q. Hu, J. Yu, L. Xu, and M. Wu. “VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 470–481, 2024
2024
-
[63]
Deepcoder: A deep neural network based video compression,
T. Chen, H. Liu, Q. Shen, T. Yue, X. Cao, and Z. Ma. “Deepcoder: A deep neural network based video compression,” in2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, 2017
2017
-
[64]
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos,
Q. Zhao, M. S. Asif, and Z. Ma. “PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19103–19112, 2024
2024
-
[65]
A video compression standard for multimedia applications,
D. LEGALL. “A video compression standard for multimedia applications,” inCommun. ACM, vol. 34, pp. 226–252, 1993
1993
-
[66]
Hinerv: Video compression with hierarchical encoding-based neural representation,
H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull. “Hinerv: Video compression with hierarchical encoding-based neural representation,” in Advances in Neural Information Processing Systems, vol. 36, 2024
2024
-
[67]
Neural residual radiance fields for streamably free-viewpoint videos,
H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull. “Neural residual radiance fields for streamably free-viewpoint videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.76–87, 2023
2023
-
[68]
Deep learning for precipitation nowcasting: A benchmark and a new model,
X. Shi, Z. Gao, L. Lausen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo. “Deep learning for precipitation nowcasting: A benchmark and a new model,” inAdvances in neural information processing systems, vol. 30, 2017
2017
-
[69]
Latent-INR: A Flexible Framework for Implicit Representa- tions of Videos with Discriminative Semantics,
S. R. Maiya, A. Gupta, M. Gwilliam, M. Ehrlich, and A. Shrivastava“Latent-INR: A Flexible Framework for Implicit Representa- tions of Videos with Discriminative Semantics,” inEuropean Conference on Computer Vision, pp. 285–302, 2024
2024
-
[70]
Elf-vc: Efficient learned flexible-rate video coding,
O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev. “Elf-vc: Efficient learned flexible-rate video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14479–14488, 2021
2021
-
[71]
Learning for video compression,
Z. Chen, T. He, X. Jin, and F. Wu. “Learning for video compression,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 566–576, 2019
2019
-
[72]
M-LVC: Multiple frames prediction for learned video compression,
J. Lin, D. Liu, H. Li, and F. Wu. “M-LVC: Multiple frames prediction for learned video compression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3546–3554, 2020
2020
-
[73]
Raft: Recurrent all-pairs field transforms for optical flow,
Z. Teed and J. Deng. “Raft: Recurrent all-pairs field transforms for optical flow,” inComputer Vision–ECCV 2020: 16th European Conference, pp. 402–419, 2020
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.