pith. sign in

arxiv: 2605.29260 · v2 · pith:IHTGAKCTnew · submitted 2026-05-28 · 💻 cs.CV

Deep Psychovisual Image Representations

Pith reviewed 2026-06-29 08:56 UTC · model grok-4.3

classification 💻 cs.CV
keywords psychovisual codingfrequency-domain representationcomplex-valued representationsinterpretable featuresspectral filtersdeep learning visionsalience analysisobject parts
0
0 comments X

The pith

Psychovisual frequency coding with learned spectral filters produces interpretable object parts and reduces depth dependence in vision models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deep learning vision models can adopt psychovisual principles by using data-driven spectral filters in a frequency-domain representation combined with complex-valued images. This creates intermediate abstractions that separate low-level features in a way similar to human vision, leading to features that correspond to clear object parts rather than vague regions. A reader would care because it addresses the opacity and high computational demands of standard CNNs by offering a more efficient and transparent alternative through frequency sub-band encoding of semantic structures.

Core claim

Deep Visual Coding is a learned frequency-domain representation inspired by perceptually salient frequency quantization that, together with complex-valued image representations, produces psychovisual-style abstractions. Data-driven spectral filters encode task-relevant semantic structures within distinct frequency sub-bands, resulting in salience analyses that show highly interpretable object parts and models that are less dependent on depth for scaling compared to CNNs.

What carries the argument

Data-driven spectral filters that learn to encode task-relevant semantic structures within distinct frequency sub-bands, in combination with complex-valued representations.

If this is right

  • Psychovisual models extract highly interpretable object parts compared to amorphous regions in CNNs.
  • Models require less depth for scaling because complex-valued representations and learned abstractions subsume the role of deep spatial layers.
  • Provides a promising path toward more efficient and transparent vision models.
  • Utilizes data-driven spectral filters to encode semantic structures in frequency sub-bands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such frequency-based abstractions might generalize to other perception tasks like audio processing where spectral bands are natural.
  • Combining this with existing CNN backbones could create hybrid architectures that are both interpretable and accurate.
  • Reduced depth dependence suggests potential for deploying vision models on resource-constrained devices without accuracy loss.

Load-bearing premise

The observed interpretability and reduced depth dependence arise specifically from the psychovisual frequency coding rather than from the complex-valued representation or other implementation choices.

What would settle it

Compare salience maps and depth scaling curves between the proposed model and an otherwise identical model that uses complex values but standard spatial convolutions without the learned frequency sub-band filters.

Figures

Figures reproduced from arXiv: 2605.29260 by Aryaman Sharma, Shekhar S. Chandra, Wei Dai, Wendi Ma.

Figure 1
Figure 1. Figure 1: (a) Standard CNNs use deep stacks of homogeneous spatial layers. (b) Psychovisual models suggest human vision uses explicit intermediate abstractions. (c) Our psychovisual deep learning framework first produces rich complex-valued representations of interpretable semantic regions. They are then encoded in the frequency domain by Deep Visual Coding (DVC), a data-driven adaptation of hand-crafted psychovisua… view at source ↗
Figure 2
Figure 2. Figure 2: (Left) Hand-crafted psychovisual coding from Saadane et al. [17], which quantizes perceptually salient radial frequencies determined by human vision experiments. (Right) Our DVC module, a data-driven adaptation of psychovisual coding. It uses (1) Spectral Branches for radial spectral decomposition (2) Hadamard Blocks to apply learnt element-wise filters and channel mixing. CConv/BN/GELU denote complex-valu… view at source ↗
Figure 3
Figure 3. Figure 3: Overall design of the PsychoNet framework (bottom) and Phasor Block architectures (top). [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: DVC filters learnt by Psycho-B trained on ImageNet-100 and ImageNet-1K, as well as for two ablation models on ImageNet-1K. Bilinear smoothing has been applied. ‘High/mid/low freq.’ refer to the [14, 8], [8, 4] and [4, 1] frequency sub-bands created by Spectral Branches. ‘No Spectral Branches’ removes Spectral Branches and uses a single Hadamard Block with global filters - we extract sub-bands only for the … view at source ↗
Figure 5
Figure 5. Figure 5: Assorted activation maps (via KPCA-CAM) for mid-level Phasor Blocks of Psycho-B. Real [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Psycho-B Phasor Block salience maps (via HiResCAM) conditioned on gradients [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) Comparison of layer depth between different ResNet models and our shallowest PsychoNet model of comparable size. (b) Comparison between activation maps (via KPCA-CAM) of Psycho-B and ResNet101 for a range of layer depths. Real and imaginary components are denoted by R and I [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison between activation maps of Ablation Model A and C. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

Psychovisual models suggest human vision decouples low-level feature extraction from higher cognition by first forming intermediate abstractions. In contrast, deep learning-based vision models routinely extract and aggregate features using homogeneous stacks of spatial layers, rendering their decision-making processes opaque. In this paper, we propose Deep Visual Coding, a learned frequency-domain representation inspired by 1990s image codes that quantised perceptually salient frequencies, which together with complex-valued image representations produces psychovisual-style abstractions. This approach enables the first psychovisual-based deep learning framework, utilizing data-driven spectral filters that learn to encode task-relevant semantic structures within distinct frequency sub-bands. Salience analyses reveal that our psychovisual models extract highly interpretable object parts compared to the amorphous regions produced by regular Convolutional Neural Networks (CNNs). Furthermore, we find that our models are less depth dependent than CNNs for model scaling, since our complex-valued representations and learned abstractions subsume the role of the deep spatial layers. Together, these findings demonstrate that psychovisual coding provides a promising path toward more efficient and transparent vision models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes Deep Visual Coding, a learned frequency-domain representation using data-driven spectral filters and complex-valued image representations to produce psychovisual-style abstractions. It claims this yields salience maps with highly interpretable object parts (unlike amorphous regions from CNNs) and reduced depth dependence for model scaling, as the abstractions subsume the role of deep spatial layers.

Significance. If substantiated, the result would be significant for vision model design by offering a path to greater interpretability and efficiency through psychovisual frequency sub-band encoding of semantic structure. The approach could influence hybrid frequency-spatial architectures if the attribution to the psychovisual component is isolated and the claims are supported by controlled experiments.

major comments (3)
  1. [Abstract] Abstract: asserts empirical outcomes (interpretable object parts, lower depth dependence) without any quantitative results, baselines, or experimental details; claims rest on unspecified salience analyses.
  2. [Abstract] Abstract: the central attribution of interpretability and depth-independence gains to psychovisual frequency coding is not isolated from the complex-valued representation, as no ablation holding the complex-valued path fixed while removing or replacing the learned spectral filters is described.
  3. [Abstract] Abstract: no equations, derivations, or formal definitions of the data-driven spectral filters, frequency sub-bands, or complex-valued representations are shown, rendering the method non-reproducible.
minor comments (1)
  1. [Abstract] Abstract: the reference to '1990s image codes that quantised perceptually salient frequencies' lacks specific citations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: asserts empirical outcomes (interpretable object parts, lower depth dependence) without any quantitative results, baselines, or experimental details; claims rest on unspecified salience analyses.

    Authors: We agree that the abstract summarizes outcomes at a high level without quantitative details. The full manuscript reports quantitative salience metrics, baselines, and experimental protocols in the results section. We will revise the abstract to include brief references to key quantitative improvements (e.g., interpretability scores and depth-scaling metrics) to better ground the claims. revision: yes

  2. Referee: [Abstract] Abstract: the central attribution of interpretability and depth-independence gains to psychovisual frequency coding is not isolated from the complex-valued representation, as no ablation holding the complex-valued path fixed while removing or replacing the learned spectral filters is described.

    Authors: The referee correctly identifies that the abstract (and manuscript) does not present an ablation that holds the complex-valued path fixed while varying the learned spectral filters. Current comparisons are to standard CNNs. We will add this specific ablation experiment and its results to the revised manuscript to isolate the contribution of the psychovisual frequency coding. revision: yes

  3. Referee: [Abstract] Abstract: no equations, derivations, or formal definitions of the data-driven spectral filters, frequency sub-bands, or complex-valued representations are shown, rendering the method non-reproducible.

    Authors: Abstracts conventionally omit equations for accessibility. The manuscript body contains the formal definitions, derivations, and equations for the spectral filters, sub-bands, and complex-valued representations. We will revise the abstract to explicitly reference the relevant method sections containing these elements. revision: partial

Circularity Check

0 steps flagged

No derivation chain or self-referential reductions present

full rationale

The provided abstract and context contain no equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations. The paper proposes an empirical architecture (data-driven spectral filters plus complex-valued representations) and reports experimental outcomes on interpretability and depth dependence. These are presented as findings from salience analyses and scaling tests rather than as outputs of a closed mathematical chain that reduces to its own inputs by construction. No uniqueness theorems, ansatzes smuggled via citation, or renamings of known results appear. The central claims therefore remain independent of the circularity patterns enumerated in the instructions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit parameters, axioms, or new entities are detailed. The method implicitly assumes that frequency sub-bands can be learned to isolate semantic structures without additional justification.

pith-pipeline@v0.9.1-grok · 5718 in / 1118 out tokens · 19881 ms · 2026-06-29T08:56:35.089811+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 39 canonical work pages · 1 internal anchor

  1. [1]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks.Communications of The ACM, 60(6):84–90, May 2017. doi: 10.1145/3065386

  2. [2]

    ImageNet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. ImageNet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, June 2009. doi: 10.1109/cvpr.2009.5206848

  3. [3]

    2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, doi: 10.1109/CVPR.2016.90

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90

  4. [4]

    Aggregated residual transforma- tions for deep neural networks

    Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transforma- tions for deep neural networks. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995, 2017. doi: 10.1109/CVPR.2017.634

  5. [5]

    Densely connected convolu- tional networks

    Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q Weinberger. Densely connected convolu- tional networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017

  6. [6]

    doi: 10.1109/CVPR52688.2022.00176

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11966–11976, 2022. doi: 10.1109/CVPR52688.2022.01167

  7. [7]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In9th International Conference on Learning Representations, ICLR 2021, V...

  8. [8]

    Tolstikhin, N

    I. Tolstikhin, N. Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, and Alexey Dosovitskiy. MLP-Mixer: An all-MLP Architecture for Vision. InNeural Information Processing Systems, 2021

  9. [9]

    Container: Context Aggregation Network.Neural Information Processing Systems, 2021

    Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, and Aniruddha Kembhavi Kembhavi. Container: Context Aggregation Network.Neural Information Processing Systems, 2021

  10. [10]

    Gfnet: Global filter networks for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10960–10973,

    Yongming Rao, Wenliang Zhao, Zheng Zhu, Jie Zhou, and Jiwen Lu. Gfnet: Global filter networks for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10960–10973,

  11. [11]

    doi: 10.1109/TPAMI.2023.3263824

  12. [12]

    Invariant visual representation by single neurons in the human brain.Nature, 2005

    Rodrigo Quian Quiroga, Leila Reddy, Gabriel Kreiman, Christof Koch, and Itzhak Fried. Invariant visual representation by single neurons in the human brain.Nature, 2005. doi: 10.1038/nature03687

  13. [13]

    Ruff, Roozbeh Kiani, Jerzy Bodurka, Hossein Esteky, Keiji Tanaka, and Peter A

    Nikolaus Kriegeskorte, Marieke Mur, Douglas A. Ruff, Roozbeh Kiani, Jerzy Bodurka, Hossein Esteky, Keiji Tanaka, and Peter A. Bandettini. Matching categorical object representations in inferior temporal cortex of man and monkey.Neuron, 2008. doi: 10.1016/j.neuron.2008.10.043

  14. [14]

    Concept cells: the building blocks of declarative memory functions.Nature Reviews Neuroscience, 2012

    Rodrigo Quian Quiroga. Concept cells: the building blocks of declarative memory functions.Nature Reviews Neuroscience, 2012. doi: 10.1038/nrn3251

  15. [15]

    Seeliger, Antonio Lozano, Thirza Dado, Feng Wang, Pieter R

    Lynn Le, Paolo Papale, K. Seeliger, Antonio Lozano, Thirza Dado, Feng Wang, Pieter R. Roelfsema, M. van Gerven, Ya˘gmur Güçlütürk, and Umut Güçlü. Monkeysee: Space-time-resolved reconstructions of natural images from macaque multi-unit activity.Neural Information Processing Systems, 2024

  16. [16]

    Design of psychovisual quantizers for a visual subband image coding

    Abdelhakim Saadane, Hakim Senane, and Dominique Barba. Design of psychovisual quantizers for a visual subband image coding. InVisual Communications and Image Processing ’94, volume 2308, pages 1446–1453. SPIE, September 1994. doi: 10.1117/12.185903

  17. [17]

    Guedon, Dominique Barba, and Nicole Burger

    Jeanpierre V . Guedon, Dominique Barba, and Nicole Burger. Psychovisual image coding via an exact discrete Radon transform. InProceedings Volume 2501, Visual Communications and Image Processing 1995, volume 2501, pages 562–572, 1995. doi: 10.1117/12.206765

  18. [18]

    Saadane, H

    A. Saadane, H. Sénane, and D. Barba. Visual Coding: Design of Psychovisual Quantizers.Journal of Visual Communication and Image Representation, 9(4):381–391, December 1998. ISSN 1047-3203. doi: 10.1006/jvci.1998.0393. 11

  19. [19]

    Fast Fourier Convolution

    Lu Chi, Borui Jiang, and Yadong Mu. Fast Fourier Convolution. InAdvances in Neural Information Process- ing Systems, volume 33, pages 4479–4488. Curran Associates, Inc., 2020. URL https://papers.nips. cc/paper_files/paper/2020/hash/2fd5d41ec6cfab47e32164d5624269b1-Abstract.html

  20. [20]

    Oren Rippel, Jasper Snoek, and Ryan P. Adams. Spectral Representations for Convolutional Neural Networks. InProceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, pages 2449–2457, Cambridge, MA, USA, 2015. MIT Press

  21. [21]

    Phasemp: Robust 3d pose estimation via phase-conditioned human motion prior

    Saurabh Yadav and Koteswar Rao Jerripothula. FCCNs: Fully Complex-valued Convolutional Networks using Complex-valued Color Model and Loss Function. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10655–10664, October 2023. doi: 10.1109/ICCV51070.2023.00981. URL https://ieeexplore.ieee.org/document/10377516. ISSN: 2380-7504

  22. [22]

    Olshausen and David J

    Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607, June 1996. ISSN 1476-4687. doi: 10.1038/ 381607a0. URLhttps://www.nature.com/articles/381607a0

  23. [23]

    D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex.The Journal of Physiology, 160(1):106–154.2, January 1962. ISSN 0022-3751. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1359523/

  24. [24]

    Dario L. Ringach. Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex.Journal of Neurophysiology, 88(1):455–463, July 2002. doi: 10.1152/jn.2002.88.1.455

  25. [25]

    Bio-Inspired Multi-scale Contourlet Attention Networks.IEEE transactions on multimedia, pages 1–16, January 2023

    Mengkun Liu, Licheng Jiao, Xu Liu, Lingling Li, Fang Liu, Shuyuan Yang, and Xiangrong Zhang. Bio-Inspired Multi-scale Contourlet Attention Networks.IEEE transactions on multimedia, pages 1–16, January 2023. doi: 10.1109/tmm.2023.3304448

  26. [26]

    C-CNN: Contourlet Convolutional Neural Networks.IEEE Transactions on Neural Networks and Learning Systems, 32(6):2636– 2649, June 2021

    Mengkun Liu, Licheng Jiao, Xu Liu, Lingling Li, Fang Liu, and Shuyuan Yang. C-CNN: Contourlet Convolutional Neural Networks.IEEE Transactions on Neural Networks and Learning Systems, 32(6):2636– 2649, June 2021. ISSN 2162-2388. doi: 10.1109/TNNLS.2020.3007412. URL https://ieeexplore. ieee.org/document/9145825. Conference Name: IEEE Transactions on Neural ...

  27. [27]

    E. J. Candés and D. L. Donoho. Ridgelets: a key to higher-dimensional intermittency?Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 357(1760): 2495–2509, 1999. doi: 10.1098/rsta.1999.0444

  28. [28]

    Candes, and D.L

    Jean-Luc Starck, E.J. Candes, and D.L. Donoho. The curvelet transform for image denoising.IEEE Transactions on Image Processing, 11(6):670–684, June 2002. ISSN 1941-0042. doi: 10.1109/TIP.2002. 1014998. Conference Name: IEEE Transactions on Image Processing

  29. [29]

    Do and M

    M.N. Do and M. Vetterli. The contourlet transform: an efficient directional multiresolution image representation.IEEE Transactions on Image Processing, 14(12):2091–2106, December 2005. ISSN 1941-

  30. [30]

    URLhttps://ieeexplore.ieee.org/document/1532309

    doi: 10.1109/TIP.2005.859376. URLhttps://ieeexplore.ieee.org/document/1532309

  31. [31]

    Gonzalez and Richard E

    Rafael C. Gonzalez and Richard E. Woods.Digital Image Processing 3rd Edition. Prentice Hall, January 2014

  32. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Shaohua Li, Kaiping Xue, Bin Zhu, Chenkai Ding, Xindi Gao, David Wei, and Tao Wan. Falcon: A fourier transform based approach for fast and secure convolutional neural network predictions. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8702–8711, 2020. doi: 10.1109/CVPR42600.2020.00873

  33. [33]

    Unrolling of deep graph total variation for image denoising,

    Bochen Guan, Jinnian Zhang, William A. Sethares, Richard Kijowski, and Fang Liu. Spectral Domain Convolutional Neural Network. InICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2795–2799, June 2021. doi: 10.1109/ICASSP39728.2021. 9413409. ISSN: 2379-190X

  34. [34]

    Lee-Thorp, J

    J. Lee-Thorp, J. Ainslie, Ilya Eckstein, and Santiago Ontañón. FNet: Mixing Tokens with Fourier Transforms.North American Chapter of the Association for Computational Linguistics, 2021. doi: 10.18653/v1/2022.naacl-main.319

  35. [35]

    Phasemp: Robust 3d pose estimation via phase-conditioned human motion prior

    Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, and Baining Guo. Adaptive frequency filters as efficient global token mixers. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6026–6036, 2023. doi: 10.1109/ICCV51070.2023.00556. 12

  36. [36]

    As large as it gets: Learning infinitely large filters via neural implicit functions in the fourier domain.ArXiv, abs/2307.10001, 2023

    Julia Grabinski, Janis Keuper, and Margret Keuper. As large as it gets: Learning infinitely large filters via neural implicit functions in the fourier domain.ArXiv, abs/2307.10001, 2023. URL https://api. semanticscholar.org/CorpusID:259982481

  37. [37]

    Resolution-invariant image classification based on fourier neural operators

    Samira Kabri, Tim Roith, Daniel Tenbrinck, and Martin Burger. Resolution-invariant image classification based on fourier neural operators. InScale Space and Variational Methods in Computer Vision: 9th International Conference, SSVM 2023, Santa Margherita Di Pula, Italy, May 21–25, 2023, Proceedings, page 236–249, Berlin, Heidelberg, 2023. Springer-Verlag....

  38. [38]

    Senane, A

    H. Senane, A. Saadane, and D. Barba. Image coding in the context of a psychovisual image representation with vector quantization. InProceedings., International Conference on Image Processing, volume 1, pages 97–100 vol.1, October 1995. doi: 10.1109/ICIP.1995.529048

  39. [39]

    W Cooley, P

    J. W Cooley, P. Lewis, and P. Welch. The finite Fourier transform.Audio and Electroacoustics, IEEE Transactions on, 17(2):77–85, June 1969. ISSN 0018-9278

  40. [40]

    Complex-Valued Neural Networks: A Comprehensive Survey.IEEE/CAA Journal of Automatica Sinica, 9(8):1406–1426, August 2022

    ChiYan Lee, Hideyuki Hasegawa, and Shangce Gao. Complex-Valued Neural Networks: A Comprehensive Survey.IEEE/CAA Journal of Automatica Sinica, 9(8):1406–1426, August 2022. doi: 10.1109/jas.2022. 105743

  41. [41]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009. URL https://api. semanticscholar.org/CorpusID:18268744

  42. [42]

    KPCA-CAM: Visual Explainability of Deep Computer Vision Models Using Kernel PCA.IEEE Interna- tional Workshop on Multimedia Signal Processing, 2024

    Sachin Karmani, Thanushon Sivakaran, Gaurav Prasad, Mehmet Ali, Wenbo Yang, and Sheyang Tang. KPCA-CAM: Visual Explainability of Deep Computer Vision Models Using Kernel PCA.IEEE Interna- tional Workshop on Multimedia Signal Processing, 2024. doi: 10.1109/mmsp61759.2024.10743968

  43. [43]

    Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks.arXiv preprint arXiv:2011.08891, November 2020

    Rachel Lea Draelos and Lawrence Carin. Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks.arXiv preprint arXiv:2011.08891, November 2020. URL https: //arxiv.org/abs/2011.08891

  44. [44]

    Frequencylowcut pooling - plug and play against catastrophic overfitting

    Julia Grabinski, Steffen Jung, Janis Keuper, and Margret Keuper. Frequencylowcut pooling - plug and play against catastrophic overfitting. InComputer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIV, page 36–57, Berlin, Heidelberg, 2022. Springer-Verlag. ISBN 978-3-031-19780-2. doi: 10.1007/978-3-0...

  45. [45]

    High-quality subband image coding of TV signals at 5 Mbit/s with motion compensation interpolation and visually optimized scalar quantization

    Jose Hanen and Dominique Barba. High-quality subband image coding of TV signals at 5 Mbit/s with motion compensation interpolation and visually optimized scalar quantization. InVisual Communications and Image Processing ’93, volume 2094, pages 1477–1485. SPIE, October 1993. doi: 10.1117/12.157907

  46. [46]

    Interactions of chromatic components in the perceptual quantization of the achromatic component

    Patrick Le Callet, Abdelhakim Saadane, and Dominique Barba. Interactions of chromatic components in the perceptual quantization of the achromatic component. InHuman Vision and Electronic Imaging IV, volume 3644, pages 121–128. SPIE, May 1999. doi: 10.1117/12.348432

  47. [47]

    On the masking effects in a perceptually based image quality metric

    Abdelhakim Saadane, Nachida Bekkat, and Dominique Barba. On the masking effects in a perceptually based image quality metric. InImaging and vision systems: theory, assessment and applications, pages 161–177. Nova Science Publishers, Inc., USA, January 2001. ISBN 978-1-59033-033-3

  48. [48]

    Philippe, and D

    Nicolas Normand, Jean-Pierre Guédon, O. Philippe, and D. Barba. Controlled redundancy for image coding and high-speed transmission.Proc. of the SPIE - The International Society for Optical Engineering, 2727:1070–1081, 1996. ISSN 0277-786X

  49. [49]

    Generalised finite radon transform forN×Nimages.Image and Vision Computing, 25(10):1620–1630, October 2007

    Andrew Kingston and Imants Svalbe. Generalised finite radon transform forN×Nimages.Image and Vision Computing, 25(10):1620–1630, October 2007. ISSN 0262-8856. doi: 10.1016/j.imavis.2006.03.002

  50. [50]

    S. S. Chandra, N. Normand, A. Kingston, J. Guédon, and I. Svalbe. Robust Digital Image Reconstruction via the Discrete Fourier Slice Theorem.Signal Processing Letters, IEEE, 21(6):682–686, June 2014. ISSN 1070-9908. doi: 10.1109/LSP.2014.2313341

  51. [51]

    Guédon, Nicolas Normand, Pierre Verbert, Benoit Parrein, and Florent Autrusseau

    Jean-Pierre V . Guédon, Nicolas Normand, Pierre Verbert, Benoit Parrein, and Florent Autrusseau. Load- balancing and scalable multimedia distribution using the Mojette transform.Internet Multimedia Manage- ment Systems II, 4519(1):226–234, 2001

  52. [52]

    Parallel-Beam CT Reconstruction Based on Mojette Transform and Com- pressed Sensing.International Journal of Computer and Electrical Engineering, pages 83–87, 2013

    Wen Hou and Cishen Zhang. Parallel-Beam CT Reconstruction Based on Mojette Transform and Com- pressed Sensing.International Journal of Computer and Electrical Engineering, pages 83–87, 2013. ISSN 17938163. doi: 10.7763/IJCEE.2013.V5.669. 13

  53. [53]

    Benoit Parrein, Pierre Verbert, Nicolas Normand, and Jean-Pierre V . Guédon. Scalable multiple descriptions on packets networks via the n-dimensional Mojette transform.Quality of Service over Next-Generation Data Networks, 4524(1):243–252, 2001

  54. [54]

    Guédon, and Benoit Parrein

    Pierre Verbert, Jean-Pierre V . Guédon, and Benoit Parrein. Distributed and compressed multimedia transmission using a discrete backprojection operator.Internet Multimedia Management Systems III, 4862 (1):315–325, 2002. doi: 10.1117/12.473047

  55. [55]

    John Wiley & Sons, March 2013

    Jeanpierre Guédon.The Mojette Transform: Theory and Applications. John Wiley & Sons, March 2013. ISBN 9781118622933

  56. [56]

    Complex-valued Neural Networks with Non-parametric Activation Functions

    Simone Scardapane, Steven Van Vaerenbergh, Amir Hussain, and Aurelio Uncini. Complex-valued Neural Networks with Non-parametric Activation Functions, February 2018. URL http://arxiv.org/abs/ 1802.08026. arXiv:1802.08026 [cs]

  57. [57]

    Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, Joao Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J. Pal. Deep Complex Networks. InInternational Conference on Learning Representations, February 2018. URL https: //openreview.net/forum?id=H1T2hmZAb

  58. [58]

    Complex-valued densely connected convolutional networks

    Wenhan Li, Wenqing Xie, and Zhifang Wang. Complex-valued densely connected convolutional networks. In Jianchao Zeng, Weipeng Jing, Xianhua Song, and Zeguang Lu, editors,Data Science, pages 299–309, Singapore, 2020. Springer Singapore. ISBN 978-981-15-7981-3

  59. [59]

    Complex fully convolutional neural networks for mr image reconstruction

    Muneer Dedmari, Sailesh Conjeti, Santiago Estrada, Phillip Ehses, Tony Stocker, and Martin Reuter. Complex fully convolutional neural networks for mr image reconstruction. InMachine Learning for Medical Image Reconstruction : first International Workshop, MLMIR 2018, volume 1. Springer, 2018

  60. [60]

    Compressed sensing mri reconstruction with co-vegan: Complex-valued generative adversarial network

    Bhavya Vasudeva, Puneesh Deora, Saumik Bhattacharya, and Pyari Mohan Pradhan. Compressed sensing mri reconstruction with co-vegan: Complex-valued generative adversarial network. In2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1779–1788, 2022. doi: 10.1109/ W ACV51458.2022.00184

  61. [61]

    Analysis of deep complex- valued convolutional neural networks for mri reconstruction and phase-focused applications.Mag- netic Resonance in Medicine, 86(2):1093–1109, 2021

    Elizabeth Cole, Joseph Cheng, John Pauly, and Shreyas Vasanawala. Analysis of deep complex- valued convolutional neural networks for mri reconstruction and phase-focused applications.Mag- netic Resonance in Medicine, 86(2):1093–1109, 2021. doi: https://doi.org/10.1002/mrm.28733. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/mrm.28733

  62. [62]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. InNIPS-W, 2017

  63. [63]

    Fedus, Xianzhi Du, E

    Irwan Bello, W. Fedus, Xianzhi Du, E. D. Cubuk, A. Srinivas, Tsung-Yi Lin, Jonathon Shlens, and Barret Zoph. Revisiting ResNets: Improved Training and Scaling Strategies.Neural Information Processing Systems, 2021. 14 Appendices In the following we present appendices to our work, structured as follows: Appendix A lists the abbreviations used throughout ou...