RayFormer: Modeling Inter- and Intra-Ray Similarity for NeRF-Based Video Snapshot Compressive Imaging

Anqi Li; Danhua Liu; Yubo Dong; Zhenyuan Lin

arxiv: 2604.27702 · v1 · submitted 2026-04-30 · 💻 cs.CV

RayFormer: Modeling Inter- and Intra-Ray Similarity for NeRF-Based Video Snapshot Compressive Imaging

Yubo Dong , Danhua Liu , Anqi Li , Zhenyuan Lin This is my paper

Pith reviewed 2026-05-07 07:20 UTC · model grok-4.3

classification 💻 cs.CV

keywords video snapshot compressive imagingNeRFRayFormerinter-ray similarityintra-ray correlationpatch-level ray samplingtotal variation priordynamic scene reconstruction

0 comments

The pith

A transformer that models similarities between neighboring rays and along each ray improves NeRF reconstruction of videos from single-shot compressive measurements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Video snapshot compressive imaging captures dynamic scenes in one measurement but demands accurate reconstruction of motion and detail. Existing NeRF-based methods rely on random ray sampling and therefore miss local structural patterns, which limits image quality. The paper replaces random sampling with a patch-level strategy and introduces RayFormer, a transformer that attends to inter-ray similarities among neighboring points at the same depth as well as intra-ray correlations along each viewing ray. A total variation term is added to the loss to promote spatial smoothness. Experiments on simulated and real scenes show this combination produces state-of-the-art results.

Core claim

We first propose a patch-level ray sampling strategy to enable the modeling of content structure. Then, we propose an Inter- and Intra-Ray Transformer (RayFormer) to capture the structural similarities, modeling both inter-ray similarities among spatially neighboring points at the same depth and intra-ray correlations between adjacent points along the viewing ray. Finally, benefiting from the patch-level sampling strategy, the total variation prior is incorporated into the objective function to enhance spatial smoothness and suppress artifacts.

What carries the argument

RayFormer, a transformer that jointly attends to inter-ray similarities among spatially neighboring points at the same depth and intra-ray correlations along individual viewing rays, made possible by patch-level rather than random ray sampling.

If this is right

Patch-level sampling makes local structural patterns available for attention, enabling the model to exploit content correlations that random sampling ignores.
Modeling both inter-ray and intra-ray relations together produces higher-fidelity reconstructions of dynamic scenes than methods that treat rays independently.
Incorporating the total variation prior on the sampled patches reduces spatial artifacts while preserving motion detail.
The resulting pipeline reaches state-of-the-art performance on both simulated and real-world video snapshot compressive imaging benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inter- and intra-ray attention pattern could be applied to other ray-based rendering tasks, such as light-field or plenoptic video reconstruction, where neighboring rays share similar scene content.
Structured patch sampling may prove beneficial in additional compressive-sensing settings beyond SCI, suggesting that random sampling is often suboptimal when scene geometry is locally coherent.
Because the method is built on top of existing NeRF pipelines, it can be combined with future improvements in radiance-field representations without redesigning the core sampling and attention logic.

Load-bearing premise

That patch-level ray sampling combined with the specific inter- and intra-ray attention in RayFormer will reliably extract scene structure more effectively than random sampling, and that adding the total variation term will improve quality without introducing bias or new artifacts.

What would settle it

An ablation experiment in which random sampling plus a standard transformer replaces the patch sampling and RayFormer, yet still matches or exceeds the reported PSNR and SSIM on the same simulated and real test sets, would show the proposed similarity modeling is not required for the claimed gains.

read the original abstract

Video snapshot compressive imaging (SCI) enables the reconstruction of dynamic scenes from a single snapshot measurement. Recently, NeRF-based methods have shown promising reconstruction performance. However, such methods typically adopt random ray sampling strategies and fail to capture content structural similarities, resulting in limited reconstruction quality. To address these issues, we first propose a patch-level ray sampling strategy to enable the modeling of content structure. Then, we propose an Inter- and Intra-Ray Transformer (RayFormer) to capture the structural similarities, modeling both inter-ray similarities among spatially neighboring points at the same depth and intra-ray correlations between adjacent points along the viewing ray. Finally, benefiting from the patch-level sampling strategy, the total variation prior is incorporated into the objective function to enhance spatial smoothness and suppress artifacts. Experiments in both simulated and real-world scenes demonstrate that the proposed method achieves state-of-the-art (SOTA) reconstruction performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RayFormer swaps random ray sampling for patches in NeRF-based video SCI, adds a split inter/intra-ray transformer, and tacks on TV regularization, but the evidence that this beats standard sampling without bias or over-smoothing is still thin.

read the letter

The paper's main move is to replace random ray sampling with patch-level sampling so that a transformer can explicitly attend to neighboring rays at the same depth (inter-ray) and to points along each individual ray (intra-ray), then add a total-variation term to the loss. This is a direct response to the observation that prior NeRF-SCI methods ignore local structure and therefore leave reconstruction artifacts. The split attention and the patch strategy that makes it possible are the concrete novelties; the rest is standard NeRF volume rendering plus a common regularizer. The authors show results on both simulated and real captured data, which is the right test bed for this problem. That part is useful and worth having in the literature. The soft spots sit where the stress-test note flags them. Patch sampling introduces spatial correlation that random sampling deliberately avoids; nothing in the abstract or the high-level description shows that convergence or coverage stays unbiased. The TV term can suppress compressive artifacts, but it can also remove high-frequency temporal detail in real dynamic scenes, and the paper does not appear to quantify that trade-off with targeted ablations. Without seeing error bars, per-scene breakdowns, or direct comparisons that isolate the sampling change from network capacity, the SOTA claim is hard to judge. Readers working on snapshot compressive imaging or on NeRF adaptations for inverse problems will find the architecture description and the real-data experiments worth skimming. The work is coherent on its own terms and engages the relevant prior literature, so it clears the bar for peer review. I would send it out, but with explicit requests for ablations on the sampling strategy and for checks that the TV term does not trade one artifact for another.

Referee Report

3 major / 3 minor

Summary. The manuscript proposes RayFormer for NeRF-based video snapshot compressive imaging (SCI). It introduces a patch-level ray sampling strategy to enable modeling of content structural similarities, an Inter- and Intra-Ray Transformer (RayFormer) that captures inter-ray similarities among spatially neighboring points at the same depth and intra-ray correlations between adjacent points along each viewing ray, and incorporates a total variation prior into the objective function to enhance spatial smoothness and suppress artifacts. Experiments on simulated and real-world scenes are claimed to achieve state-of-the-art reconstruction performance.

Significance. If the empirical results hold, the work could advance NeRF-based SCI by replacing random ray sampling with structured patch sampling and geometry-aware attention, addressing a recognized limitation in prior methods that under-exploit ray similarities. The combination of transformer modeling with TV regularization is a plausible extension, but significance depends on whether gains are attributable to the proposed components rather than capacity or optimization details.

major comments (3)

[Abstract and §4] Abstract and §4 (Experiments): The SOTA claim is central to the paper but the abstract supplies no quantitative metrics, error bars, ablation studies, or dataset details. The experiments section must include direct comparisons (e.g., PSNR/SSIM tables) against recent NeRF-SCI baselines with statistical significance to substantiate the claim.
[§3.2] §3.2 (Patch-level ray sampling): The strategy is presented as necessary to capture structural similarities, yet the manuscript must include an ablation comparing patch-level sampling directly to standard random sampling (the unbiased baseline in NeRF). Without this, it remains unclear whether the restriction improves structure capture or introduces spatial correlation that slows convergence or leaves regions under-sampled; this is load-bearing for the central claim.
[§4.3] §4.3 (Real-world experiments): The total variation prior is added to suppress compressive artifacts, but in real dynamic scenes lacking ground truth the paper should quantify risks of over-smoothing high-frequency or temporally varying detail. Visual inspection alone is insufficient to rule out bias introduced by the prior, which could undermine the reported SOTA gains.

minor comments (3)

[§2] §2 (Related work): Additional citations to recent transformer-based NeRF variants and SCI reconstruction methods would better position the contribution and avoid potential gaps in the literature review.
[Figure 1] Figure 1 (Architecture diagram): The inter-ray and intra-ray attention blocks would benefit from explicit labeling of query/key/value definitions and how patch sampling feeds into the transformer to improve clarity.
[§3.1] Notation in §3.1: The definitions of ray points, depth sampling, and the combined loss (including the TV term) could be made more precise with an additional equation or table summarizing the symbols.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): The SOTA claim is central to the paper but the abstract supplies no quantitative metrics, error bars, ablation studies, or dataset details. The experiments section must include direct comparisons (e.g., PSNR/SSIM tables) against recent NeRF-SCI baselines with statistical significance to substantiate the claim.

Authors: We agree that the abstract should explicitly report key quantitative results to support the SOTA claim. We will revise the abstract to include average PSNR and SSIM improvements over the compared NeRF-SCI baselines. In §4, direct PSNR/SSIM tables against recent baselines are already present for both simulated and real scenes; we will add per-scene standard deviations (error bars) across multiple runs and include a statistical significance analysis (e.g., paired t-tests) to substantiate the reported gains. Dataset details appear in §4.1 but will be cross-referenced more clearly in the abstract and tables. revision: yes
Referee: [§3.2] §3.2 (Patch-level ray sampling): The strategy is presented as necessary to capture structural similarities, yet the manuscript must include an ablation comparing patch-level sampling directly to standard random sampling (the unbiased baseline in NeRF). Without this, it remains unclear whether the restriction improves structure capture or introduces spatial correlation that slows convergence or leaves regions under-sampled; this is load-bearing for the central claim.

Authors: We acknowledge that a direct ablation against the standard random ray sampling baseline is necessary to isolate the benefit of patch-level sampling. The current manuscript demonstrates the end-to-end gains of RayFormer but does not isolate this component. We will add a dedicated ablation study in the revised §4 that compares patch-level sampling versus random sampling under otherwise identical conditions, reporting reconstruction PSNR/SSIM, convergence behavior, and qualitative sampling coverage to address concerns about spatial correlation or under-sampling. revision: yes
Referee: [§4.3] §4.3 (Real-world experiments): The total variation prior is added to suppress compressive artifacts, but in real dynamic scenes lacking ground truth the paper should quantify risks of over-smoothing high-frequency or temporally varying detail. Visual inspection alone is insufficient to rule out bias introduced by the prior, which could undermine the reported SOTA gains.

Authors: We agree that visual inspection alone is limited for real scenes without ground truth. We will expand §4.3 with an ablation varying the TV weight, presenting side-by-side reconstructions with and without the prior to illustrate preservation of high-frequency and temporal details. We will also add a discussion of how the regularization strength is selected to balance artifact suppression against over-smoothing. While fully quantitative metrics for over-smoothing are not possible without ground truth, these controlled ablations and qualitative evidence will provide stronger substantiation that the prior does not introduce systematic bias. revision: partial

Circularity Check

0 steps flagged

No circularity: architectural proposal with independent empirical claims

full rationale

The paper introduces a patch-level ray sampling strategy, a RayFormer transformer module for inter- and intra-ray attention, and a total-variation term enabled by the sampling choice. These are presented as design decisions whose value is assessed via reconstruction experiments on simulated and real scenes. No equations, uniqueness theorems, or first-principles derivations appear that reduce the claimed performance gain to a fitted parameter, self-definition, or self-citation chain. The central modeling claim (that the proposed attention captures structural similarities better than random sampling) is an empirical hypothesis, not a tautological restatement of inputs. Self-citations are absent from the provided text, and the method remains falsifiable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that NeRF can represent dynamic scenes from compressive measurements and that structural similarities exist and can be exploited by the proposed attention patterns; no explicit free parameters or invented physical entities are named in the abstract.

axioms (1)

domain assumption NeRF-based methods can represent dynamic scenes from single snapshot compressive measurements
Stated as the starting point for the recent promising performance mentioned in the abstract.

invented entities (1)

RayFormer no independent evidence
purpose: Transformer module to capture inter-ray and intra-ray structural similarities
New module introduced to address the random sampling limitation.

pith-pipeline@v0.9.0 · 5460 in / 1379 out tokens · 70519 ms · 2026-05-07T07:20:06.072162+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 3 canonical work pages · 1 internal anchor

[1]

INTRODUCTION Video Snapshot Compressive Imaging (SCI) [1, 2] has emerged as a promising computational imaging paradigm that enables the acquisition of high-speed video through a single 2D measurement. By encoding temporal informa- tion into spatially multiplexed patterns via designed coded masks—such as modulated patterns across time—video SCI effectively...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

airplants

METHODOLOGY 2.1. Preliminaries 2.1.1. Imaging Model of Video SCI In video SCI, the high-dimensional spatio-temporal informa- tion of a scene is compressed into a single two-dimensional measurement. This acquisition process is physically real- ized by employing programmable optical devices—most commonly Digital Micromirror Devices (DMD) or liquid crystal-b...

work page arXiv
[3]

Experimental Settings Datasets

EXPERIMENTS 3.1. Experimental Settings Datasets. Following [8], we evaluate on six synthetic scenes: Airplants [16], Hotdog [17], Cozy2room, Tanabata, Factory, and Vendor [18]. To assess generalization, we further test on real-world SCI data captured by the setup in [8]. Compared methods and evaluation metrics. We com- pare with several SOTA SCI reconstru...
[4]

CONCLUSION In this paper, we proposed patch-level ray sampling and the Inter- and Intra-Ray Transformer (RayFormer) to capture content structural similarities for NeRF-based Video SCI. Ad- ditionally, benefiting from the patch-level sampling strategy, we incorporated the total variation prior into the objective function to enhance spatial smoothness and r...
[5]

Coded aperture compressive tempo- ral imaging,

Patrick Llull, Xuejun Liao, Xin Yuan, Jianbo Yang, David Kittle, Lawrence Carin, Guillermo Sapiro, and David J. Brady, “Coded aperture compressive tempo- ral imaging,”Opt. Express, vol. 21, no. 9, pp. 10526– 10545, May 2013

2013
[6]

Snapshot compressive imaging: Theory, algorithms, and applications,

Xin Yuan, David J Brady, and Aggelos K Katsaggelos, “Snapshot compressive imaging: Theory, algorithms, and applications,”IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 65–88, 2021

2021
[7]

Generalized alternating projection based to- tal variation minimization for compressive sensing,

Xin Yuan, “Generalized alternating projection based to- tal variation minimization for compressive sensing,” in 2016 IEEE International conference on image process- ing (ICIP). IEEE, 2016, pp. 2539–2543

2016
[8]

Rank minimization for snapshot com- pressive imaging,

Yang Liu, Xin Yuan, Jinli Suo, David J Brady, and Qionghai Dai, “Rank minimization for snapshot com- pressive imaging,”IEEE transactions on pattern analy- sis and machine intelligence, vol. 41, no. 12, pp. 2990– 3006, 2018

2018
[9]

Plug-and-play algorithms for large-scale snapshot compressive imaging,

Xin Yuan, Yang Liu, Jinli Suo, and Qionghai Dai, “Plug-and-play algorithms for large-scale snapshot compressive imaging,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2020, pp. 1447–1457

2020
[10]

Efficientsci: Densely connected network with space-time factoriza- tion for large-scale video snapshot compressive imag- ing,

Lishun Wang, Miao Cao, and Xin Yuan, “Efficientsci: Densely connected network with space-time factoriza- tion for large-scale video snapshot compressive imag- ing,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 18477–18486

2023
[11]

Spatial-temporal transformer for video snapshot com- pressive imaging,

Lishun Wang, Miao Cao, Yong Zhong, and Xin Yuan, “Spatial-temporal transformer for video snapshot com- pressive imaging,”IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, vol. 45, no. 7, pp. 9072– 9089, 2022

2022
[12]

Scinerf: Neural radiance fields from a snapshot compressive image,

Yunhao Li, Xiaodong Wang, Ping Wang, Xin Yuan, and Peidong Liu, “Scinerf: Neural radiance fields from a snapshot compressive image,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 10542–10552

2024
[13]

Attention is all you need,

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,”Ad- vances in neural information processing systems, vol. 30, 2017

2017
[14]

Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging,

Yubo Dong, Dahua Gao, Tian Qiu, Yuyan Li, Minxi Yang, and Guangming Shi, “Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2023, pp. 22262– 22271

2023
[15]

Deep gaussian scale mixture prior for image reconstruction,

Tao Huang, Xin Yuan, Weisheng Dong, Jinjian Wu, and Guangming Shi, “Deep gaussian scale mixture prior for image reconstruction,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10778–10794, 2023

2023
[16]

Alternating direction unfolding with a cross spectral attention prior for dual-camera compres- sive hyperspectral imaging,

Yubo Dong, Dahua Gao, Danhua Liu, Yanli Liu, and Guangming Shi, “Alternating direction unfolding with a cross spectral attention prior for dual-camera compres- sive hyperspectral imaging,”IEEE Transactions on Im- age Processing, vol. 34, pp. 5325–5340, 2025

2025
[17]

Barf: Bundle-adjusting neural radiance fields,

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey, “Barf: Bundle-adjusting neural radiance fields,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5741–5751

2021
[18]

Bad-nerf: Bundle adjusted deblur neural radiance fields,

Peng Wang, Lingzhe Zhao, Ruijie Ma, and Peidong Liu, “Bad-nerf: Bundle adjusted deblur neural radiance fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4170–4179

2023
[19]

NeRF −−: Neural radiance fields without known camera parameters,

Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu, “Nerf–: Neural radiance fields without known camera parameters,”arXiv preprint arXiv:2102.07064, 2021

work page arXiv 2021
[20]

Local light field fu- sion: Practical view synthesis with prescriptive sam- pling guidelines,

Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz- Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar, “Local light field fu- sion: Practical view synthesis with prescriptive sam- pling guidelines,”ACM Transactions on Graphics (ToG), vol. 38, no. 4, pp. 1–14, 2019

2019
[21]

Nerf: Representing scenes as neural radiance fields for view synthesis,

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

2021
[22]

Deblur-nerf: Neural radi- ance fields from blurry images,

Li Ma, Xiaoyu Li, Jing Liao, Qi Zhang, Xuan Wang, Jue Wang, and Pedro V Sander, “Deblur-nerf: Neural radi- ance fields from blurry images,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12861–12870

2022

[1] [1]

INTRODUCTION Video Snapshot Compressive Imaging (SCI) [1, 2] has emerged as a promising computational imaging paradigm that enables the acquisition of high-speed video through a single 2D measurement. By encoding temporal informa- tion into spatially multiplexed patterns via designed coded masks—such as modulated patterns across time—video SCI effectively...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

airplants

METHODOLOGY 2.1. Preliminaries 2.1.1. Imaging Model of Video SCI In video SCI, the high-dimensional spatio-temporal informa- tion of a scene is compressed into a single two-dimensional measurement. This acquisition process is physically real- ized by employing programmable optical devices—most commonly Digital Micromirror Devices (DMD) or liquid crystal-b...

work page arXiv

[3] [3]

Experimental Settings Datasets

EXPERIMENTS 3.1. Experimental Settings Datasets. Following [8], we evaluate on six synthetic scenes: Airplants [16], Hotdog [17], Cozy2room, Tanabata, Factory, and Vendor [18]. To assess generalization, we further test on real-world SCI data captured by the setup in [8]. Compared methods and evaluation metrics. We com- pare with several SOTA SCI reconstru...

[4] [4]

CONCLUSION In this paper, we proposed patch-level ray sampling and the Inter- and Intra-Ray Transformer (RayFormer) to capture content structural similarities for NeRF-based Video SCI. Ad- ditionally, benefiting from the patch-level sampling strategy, we incorporated the total variation prior into the objective function to enhance spatial smoothness and r...

[5] [5]

Coded aperture compressive tempo- ral imaging,

Patrick Llull, Xuejun Liao, Xin Yuan, Jianbo Yang, David Kittle, Lawrence Carin, Guillermo Sapiro, and David J. Brady, “Coded aperture compressive tempo- ral imaging,”Opt. Express, vol. 21, no. 9, pp. 10526– 10545, May 2013

2013

[6] [6]

Snapshot compressive imaging: Theory, algorithms, and applications,

Xin Yuan, David J Brady, and Aggelos K Katsaggelos, “Snapshot compressive imaging: Theory, algorithms, and applications,”IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 65–88, 2021

2021

[7] [7]

Generalized alternating projection based to- tal variation minimization for compressive sensing,

Xin Yuan, “Generalized alternating projection based to- tal variation minimization for compressive sensing,” in 2016 IEEE International conference on image process- ing (ICIP). IEEE, 2016, pp. 2539–2543

2016

[8] [8]

Rank minimization for snapshot com- pressive imaging,

Yang Liu, Xin Yuan, Jinli Suo, David J Brady, and Qionghai Dai, “Rank minimization for snapshot com- pressive imaging,”IEEE transactions on pattern analy- sis and machine intelligence, vol. 41, no. 12, pp. 2990– 3006, 2018

2018

[9] [9]

Plug-and-play algorithms for large-scale snapshot compressive imaging,

Xin Yuan, Yang Liu, Jinli Suo, and Qionghai Dai, “Plug-and-play algorithms for large-scale snapshot compressive imaging,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2020, pp. 1447–1457

2020

[10] [10]

Efficientsci: Densely connected network with space-time factoriza- tion for large-scale video snapshot compressive imag- ing,

Lishun Wang, Miao Cao, and Xin Yuan, “Efficientsci: Densely connected network with space-time factoriza- tion for large-scale video snapshot compressive imag- ing,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 18477–18486

2023

[11] [11]

Spatial-temporal transformer for video snapshot com- pressive imaging,

Lishun Wang, Miao Cao, Yong Zhong, and Xin Yuan, “Spatial-temporal transformer for video snapshot com- pressive imaging,”IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, vol. 45, no. 7, pp. 9072– 9089, 2022

2022

[12] [12]

Scinerf: Neural radiance fields from a snapshot compressive image,

Yunhao Li, Xiaodong Wang, Ping Wang, Xin Yuan, and Peidong Liu, “Scinerf: Neural radiance fields from a snapshot compressive image,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 10542–10552

2024

[13] [13]

Attention is all you need,

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,”Ad- vances in neural information processing systems, vol. 30, 2017

2017

[14] [14]

Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging,

Yubo Dong, Dahua Gao, Tian Qiu, Yuyan Li, Minxi Yang, and Guangming Shi, “Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2023, pp. 22262– 22271

2023

[15] [15]

Deep gaussian scale mixture prior for image reconstruction,

Tao Huang, Xin Yuan, Weisheng Dong, Jinjian Wu, and Guangming Shi, “Deep gaussian scale mixture prior for image reconstruction,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10778–10794, 2023

2023

[16] [16]

Alternating direction unfolding with a cross spectral attention prior for dual-camera compres- sive hyperspectral imaging,

Yubo Dong, Dahua Gao, Danhua Liu, Yanli Liu, and Guangming Shi, “Alternating direction unfolding with a cross spectral attention prior for dual-camera compres- sive hyperspectral imaging,”IEEE Transactions on Im- age Processing, vol. 34, pp. 5325–5340, 2025

2025

[17] [17]

Barf: Bundle-adjusting neural radiance fields,

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey, “Barf: Bundle-adjusting neural radiance fields,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5741–5751

2021

[18] [18]

Bad-nerf: Bundle adjusted deblur neural radiance fields,

Peng Wang, Lingzhe Zhao, Ruijie Ma, and Peidong Liu, “Bad-nerf: Bundle adjusted deblur neural radiance fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4170–4179

2023

[19] [19]

NeRF −−: Neural radiance fields without known camera parameters,

Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu, “Nerf–: Neural radiance fields without known camera parameters,”arXiv preprint arXiv:2102.07064, 2021

work page arXiv 2021

[20] [20]

Local light field fu- sion: Practical view synthesis with prescriptive sam- pling guidelines,

Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz- Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar, “Local light field fu- sion: Practical view synthesis with prescriptive sam- pling guidelines,”ACM Transactions on Graphics (ToG), vol. 38, no. 4, pp. 1–14, 2019

2019

[21] [21]

Nerf: Representing scenes as neural radiance fields for view synthesis,

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

2021

[22] [22]

Deblur-nerf: Neural radi- ance fields from blurry images,

Li Ma, Xiaoyu Li, Jing Liao, Qi Zhang, Xuan Wang, Jue Wang, and Pedro V Sander, “Deblur-nerf: Neural radi- ance fields from blurry images,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12861–12870

2022