Recognition: unknown
GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution
Pith reviewed 2026-05-08 16:23 UTC · model grok-4.3
The pith
An omnidirectional Transformer that processes all four EPI directions improves light field image super-resolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GTF combines directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to explicitly model horizontal, vertical, 45-degree, and 135-degree EPIs in a unified framework for superior light field super-resolution.
What carries the argument
Omnidirectional EPI Transformer with adaptive directional fusion of four EPI orientations to capture full epipolar geometry.
If this is right
- GTF achieves 32.78 dB PSNR on five standard benchmarks without additional inference enhancements.
- The lightweight GTF-Tiny reaches 32.57 dB using only 0.915 million parameters and 19.81 GFLOPs.
- The model secures 3rd place on two tracks and 4th on one in the NTIRE 2026 LF SR Challenge.
- Ablation studies validate the contribution of diagonal EPI modeling and the fusion strategy.
Where Pith is reading between the lines
- This directional approach could be adapted to other light field tasks such as depth estimation or novel view synthesis where diagonal disparities matter.
- The adaptive fusion might apply to multi-directional data in other domains like video processing or medical imaging.
- Testing the model on more diverse real-world LF datasets could reveal its robustness to noise and varying scene complexities.
Load-bearing premise
That modeling the diagonal 45 and 135 degree EPIs with adaptive fusion gives a meaningful improvement over horizontal-vertical only Transformer designs.
What would settle it
Running an ablation study that removes the diagonal EPI branches and measures if performance drops on the standard benchmarks.
Figures
read the original abstract
Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geometry underexplored. We present GTF, an omnidirectional EPI Transformer that explicitly models horizontal, vertical, 45-degree, and 135-degree EPIs within a unified reconstruction framework. GTF combines directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to better exploit LF geometry. For the NTIRE 2026 fidelity tracks, we use GTF as the main model, while a lightweight GTF-Tiny variant targets the efficiency track. On five standard LF SR benchmarks covering both real-captured and synthetic scenes, GTF reaches 32.78 dB without inference-time enhancement, and stronger inference settings with EPSW and test-time augmentation further improve performance. Under the NTIRE 2026 efficiency constraint, GTF-Tiny attains 32.57 dB with only 0.915M parameters and 19.81 GFLOPs. In the NTIRE 2026 Light Field Image Super-Resolution Challenge, our submissions rank 3rd on Track 1 and Track 3 and 4th on Track 2. Architecture-evolution, channel-width, and inference analyses further support the effectiveness of diagonal EPI modeling, directional fusion, and the lightweight design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GTF, an omnidirectional EPI Transformer for light field super-resolution that explicitly processes horizontal, vertical, 45°, and 135° epipolar plane images via directional EPI branches, MacPI prior injection, adaptive directional fusion, and a topology-preserving FFN. It reports 32.78 dB PSNR on five standard LF SR benchmarks (real and synthetic) without inference enhancements, with GTF-Tiny reaching 32.57 dB under 0.915M parameters and 19.81 GFLOPs for the NTIRE 2026 efficiency track; submissions rank 3rd/4th in the challenge. Architecture-evolution, channel-width, and inference analyses are presented to support the design choices.
Significance. If the performance gains hold under controlled evaluation, the work would provide a concrete demonstration that incorporating diagonal epipolar geometry can improve Transformer-based LF SR beyond horizontal-vertical baselines, with added value from the efficiency variant and challenge results. The empirical focus on standard benchmarks and parameter/FLOP reporting strengthens its practical relevance for multi-view imaging tasks.
major comments (1)
- [Architecture-evolution analysis] Architecture-evolution analysis (mentioned in abstract): the headline attribution of the 32.78 dB result to omnidirectional (including 45°/135°) EPI modeling is not supported by an isolated ablation that enables only the diagonal branches while holding MacPI injection, adaptive fusion, topology-preserving FFN, and all training settings identical to a strict horizontal-vertical baseline. Without this controlled comparison, gains cannot be separated from capacity increases or fusion effects, directly weakening the central claim.
minor comments (2)
- [Abstract] The abstract states concrete PSNR, parameter, and ranking numbers but does not name the five specific benchmarks or provide error bars/standard deviations; this should be added for reproducibility.
- [Methods] Notation for the four directional EPIs and the MacPI prior should be defined with explicit equations or diagrams in the methods section to clarify how 45°/135° slopes are discretized and fused.
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive feedback. We address the major comment below and will revise the manuscript to strengthen the evidence for our claims.
read point-by-point responses
-
Referee: [Architecture-evolution analysis] Architecture-evolution analysis (mentioned in abstract): the headline attribution of the 32.78 dB result to omnidirectional (including 45°/135°) EPI modeling is not supported by an isolated ablation that enables only the diagonal branches while holding MacPI injection, adaptive fusion, topology-preserving FFN, and all training settings identical to a strict horizontal-vertical baseline. Without this controlled comparison, gains cannot be separated from capacity increases or fusion effects, directly weakening the central claim.
Authors: We appreciate the referee highlighting this point. Our architecture-evolution analysis shows incremental gains when adding diagonal EPI branches, but we acknowledge that the current presentation does not include a strictly isolated ablation enabling only the diagonal branches on top of a fixed horizontal-vertical baseline while holding MacPI injection, adaptive directional fusion, topology-preserving FFN, and all training settings identical. To better isolate and substantiate the contribution of omnidirectional (including 45°/135°) EPI modeling to the reported performance, we will perform and include this controlled ablation in the revised manuscript. revision: yes
Circularity Check
No circularity: empirical architecture evaluated on external benchmarks
full rationale
The paper presents GTF as a novel Transformer architecture for light-field super-resolution that processes omnidirectional EPIs, with performance measured directly on five standard external benchmarks (real and synthetic). No equations, derivations, or first-principles predictions are claimed; results are reported as empirical outcomes of training and inference on held-out data. Architecture-evolution and channel-width analyses are internal ablations supporting design choices but do not reduce the headline metrics to quantities defined by the inputs or self-citations. The central claim remains falsifiable against independent test sets and does not collapse by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- model hyperparameters and training settings
axioms (1)
- domain assumption Transformer attention layers can capture directional epipolar features when applied to EPI slices
Reference graph
Works this paper leans on
-
[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hin- ton. Layer normalization.arXiv preprint arXiv:1607.06450,
work page internal anchor Pith review arXiv
-
[2]
Light field image super-resolution via angular and spatial interactive network
Yingqian Chen, Longguang Wang, Yingqian Wang, Jungang Yang, and Yulan Guo. Light field image super-resolution via angular and spatial interactive network. InECCV, 2022. 2
2022
-
[3]
Exploiting spatial and angular correlations with deep efficient transformers for light field image super- resolution.IEEE TMM, 26:1421–1435, 2024
Ruixuan Cong, Hao Sheng, Da Yang, Zhenglong Cui, and Rongshan Chen. Exploiting spatial and angular correlations with deep efficient transformers for light field image super- resolution.IEEE TMM, 26:1421–1435, 2024. 1, 2, 3, 5, 6, 8
2024
-
[4]
Light field image super-resolution net- work via joint spatial-angular and epipolar information
Vinh Van Duong, Thuc Nguyen Huu, Jonghoon Yim, and Byeungwoo Jeon. Light field image super-resolution net- work via joint spatial-angular and epipolar information. IEEE TIP, 32:1534–1545, 2023. 1, 2, 3, 5, 8
2023
-
[5]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023. 2
work page internal anchor Pith review arXiv 2023
-
[6]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An- dreetto, and Hartwig Adam. MobileNets: Efficient convolu- tional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861, 2017. 4
work page internal anchor Pith review arXiv 2017
-
[7]
Light field spatial super-resolution via deep combinatorial geom- etry embedding and structural consistency regularization
Jing Jin, Junhui Hou, Jie Chen, and Sam Kwong. Light field spatial super-resolution via deep combinatorial geom- etry embedding and structural consistency regularization. In CVPR, 2020. 5
2020
-
[8]
LFTransMamba: A hybrid mamba- transformer model for light field image super-resolution
Kai Jin, Zeqiang Wei, Angulia Yang, Di Wu, Mingzhi Gao, and Xiuzhuang Zhou. LFTransMamba: A hybrid mamba- transformer model for light field image super-resolution. In CVPRW, pages 1195–1204, 2025. 1, 2, 6
2025
-
[9]
Learning-based view synthesis for light field cameras.ACM Transactions on Graphics (TOG), 35(6): 193:1–193:10, 2016
Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ra- mamoorthi. Learning-based view synthesis for light field cameras.ACM Transactions on Graphics (TOG), 35(6): 193:1–193:10, 2016. 1
2016
-
[10]
Kingma and Jimmy Ba
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015. 6
2015
-
[11]
Light field image super-resolution with transformers.IEEE Signal Processing Letters, 30:310–314, 2023
Kwanyoung Ko, Yoonjong Yoo, and Suk-Ju Kang. Light field image super-resolution with transformers.IEEE Signal Processing Letters, 30:310–314, 2023. 2
2023
-
[12]
Light field rendering
Marc Levoy and Pat Hanrahan. Light field rendering. In ACM SIGGRAPH, 1996. 1
1996
-
[13]
SwinIR: Image restoration using swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. SwinIR: Image restoration using swin transformer. InICCV Workshops, 2021. 2
2021
-
[14]
Light field image super-resolution with transformers.IEEE Signal Processing Letters, 29:563– 567, 2022
Zhengyu Liang, Yingqian Wang, Longguang Wang, Jungang Yang, and Shilin Zhou. Light field image super-resolution with transformers.IEEE Signal Processing Letters, 29:563– 567, 2022. 2, 5, 8
2022
-
[15]
Learning non- local spatial-angular correlation for light field image super- resolution
Zhengyu Liang, Yingqian Wang, Longguang Wang, Jun- gang Yang, Shilin Zhou, and Yulan Guo. Learning non- local spatial-angular correlation for light field image super- resolution. InICCV, 2023. 1, 2, 3, 5, 7, 8
2023
-
[16]
Diving into epipolar transformers for light field super-resolution and disparity estimation.IEEE TPAMI, 2026
Zhengyu Liang, Yingqian Wang, Longguang Wang, Jungang Yang, Yulan Guo, Li Liu, Shilin Zhou, and Wei An. Diving into epipolar transformers for light field super-resolution and disparity estimation.IEEE TPAMI, 2026. Early access, on- line ahead of print. 1, 3
2026
-
[17]
Enhanced deep residual networks for single image super-resolution
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InCVPRW, 2017. 2, 5
2017
-
[18]
Intra-inter view interaction network for light field image super-resolution.IEEE TMM, 25:256–266, 2023
Gaosheng Liu, Huanjing Yue, Jiamin Wu, and Jingyu Yang. Intra-inter view interaction network for light field image super-resolution.IEEE TMM, 25:256–266, 2023. 2, 5
2023
-
[19]
LFTramba: Comprehensive information learning for light field image super-resolution via a hybrid transformer-mamba frame- work
Haosong Liu, Xiancheng Zhu, Huanqiang Zeng, Jianqing Zhu, Yifan Shi, Jing Chen, and Junhui Hou. LFTramba: Comprehensive information learning for light field image super-resolution via a hybrid transformer-mamba frame- work. InCVPRW, pages 1137–1147, 2025. 1, 2
2025
-
[20]
VMamba: Visual state space model
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. VMamba: Visual state space model. InNeurIPS, 2024. 2
2024
-
[21]
Light field photography with a hand-held plenoptic camera.Stanford Tech Report CTSR 2005-02, 2005
Ren Ng, Marc Levoy, Mathieu Br ´edif, Gene Duval, Mark Horowitz, and Pat Hanrahan. Light field photography with a hand-held plenoptic camera.Stanford Tech Report CTSR 2005-02, 2005. 1
2005
-
[22]
Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang
Wenzhe Shi, Jose Caballero, Ferenc Husz ´ar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016. 3
2016
-
[23]
EPINET: A fully-convolutional neural network using epipolar geometry for depth from light field images
Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, and Seon Joo Kim. EPINET: A fully-convolutional neural network using epipolar geometry for depth from light field images. InCVPR, 2018. 1, 2, 3
2018
-
[24]
Training region-based object detectors with online hard ex- ample mining
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard ex- ample mining. InCVPR, 2016. 6
2016
-
[25]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017. 4
2017
-
[26]
Detail-preserving transformer for light field image super- resolution.Proceedings of the AAAI Conference on Artificial Intelligence, 36(3):2522–2530, 2022
Shunzhou Wang, Tianfei Zhou, Yao Lu, and Huijun Di. Detail-preserving transformer for light field image super- resolution.Proceedings of the AAAI Conference on Artificial Intelligence, 36(3):2522–2530, 2022. 5
2022
-
[27]
Spatial-angular interaction for light field image super-resolution
Yingqian Wang, Longguang Wang, Jungang Yang, Wei An, and Yulan Guo. Spatial-angular interaction for light field image super-resolution. InECCV, 2020. 1, 2, 5
2020
-
[28]
Light field im- age super-resolution using deformable convolution.IEEE TIP, 30:1057–1071, 2021
Yingqian Wang, Jungang Yang, Longguang Wang, Xinyi Ying, Tianhao Wu, Wei An, and Yulan Guo. Light field im- age super-resolution using deformable convolution.IEEE TIP, 30:1057–1071, 2021. 2, 5
2021
-
[29]
NTIRE 2023 challenge on light field image super-resolution: Dataset, methods and results
Yingqian Wang, Longguang Wang, Zhengyu Liang, Jungang Yang, Radu Timofte, and Yulan Guo. NTIRE 2023 challenge on light field image super-resolution: Dataset, methods and results. InCVPRW, 2023. 6
2023
-
[30]
Disentangling light fields for super-resolution and disparity estimation
Yingqian Wang, Longguang Wang, Gaochang Wu, Jungang Yang, Wei An, Jingyi Yu, and Yulan Guo. Disentangling light fields for super-resolution and disparity estimation. IEEE TPAMI, 45(1):425–443, 2023. 1, 2, 5, 8
2023
-
[31]
NTIRE 2026 challenge on light field image super-resolution: Methods and results
Yingqian Wang, Zhengyu Liang, Fengyuan Zhang, Wend- ing Zhao, Longguang Wang, Juncheng Li, Jungang Yang, Radu Timofte, Yulan Guo, et al. NTIRE 2026 challenge on light field image super-resolution: Methods and results. In CVPRW, 2026. 6
2026
-
[32]
Light field spatial super-resolution using deep efficient spatial-angular separa- ble convolution.IEEE TIP, 28(5):2319–2330, 2019
Henry Wing Fung Yeung, Junhui Hou, Xiaoming Chen, Jie Chen, Zhibo Chen, and Yuk Ying Chung. Light field spatial super-resolution using deep efficient spatial-angular separa- ble convolution.IEEE TIP, 28(5):2319–2330, 2019. 2, 5
2019
-
[33]
LFMix: A lightweight hybrid architecture for light field super- resolution
Mingyang Yu, Zhijian Wu, and Dingjiang Huang. LFMix: A lightweight hybrid architecture for light field super- resolution. InCVPRW, pages 1450–1459, 2025. 2
2025
-
[34]
Residual net- works for light field image super-resolution
Shuo Zhang, Youfang Lin, and Hao Sheng. Residual net- works for light field image super-resolution. InCVPR, 2019. 2, 5
2019
-
[35]
End-to-end light field spatial super-resolution network using multiple epipolar geometry.IEEE TIP, 30:5956–5968, 2021
Shuo Zhang, Song Chang, and Youfang Lin. End-to-end light field spatial super-resolution network using multiple epipolar geometry.IEEE TIP, 30:5956–5968, 2021. 5
2021
-
[36]
Image super-resolution using very deep residual channel attention networks
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. InECCV, 2018. 2, 5
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.