pith. sign in

arxiv: 2602.07262 · v3 · submitted 2026-02-06 · 💻 cs.CV

TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

Pith reviewed 2026-05-16 06:19 UTC · model grok-4.3

classification 💻 cs.CV
keywords texture recognitionsecond-order channel interactionsspiral twistingconvolutional networksfine-grained classificationdirectional feature shifts
0
0 comments X

The pith

TwistNet-2D captures second-order channel co-occurrences by shifting feature maps along spiral directions before multiplication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TwistNet-2D to resolve a tension in texture recognition between methods that capture global channel correlations without spatial structure and those that model cross-position relations without explicit pairwise products. It does so through a lightweight module whose core operation shifts one feature map along a chosen direction, computes an L2-normalized product with the unshifted map, and aggregates four such directional heads via content-adaptive reweighting before a gated residual addition. All experiments train networks from scratch on four texture and fine-grained benchmarks, avoiding ImageNet pretraining to isolate the effect of the architectural choice. The module adds roughly 3.5 percent parameters and 2 percent FLOPs over a ResNet-18 baseline yet surpasses both parameter-matched networks and substantially larger ConvNeXt and Swin Transformer models. The multi-head outputs become interpretable, orientation-selective feature maps that align with classical descriptions of structured and periodic textures.

Core claim

TwistNet-2D computes local pairwise channel products under directional spatial displacement: one feature map is shifted along a prescribed direction, an L2-normalized channel multiplication is performed, four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path initialized near zero. This joint encoding of co-occurrence location and interaction strength improves recognition of textures whose characteristic patterns depend on both channel correlations and their relative spatial offsets.

What carries the argument

Spiral-Twisted Channel Interaction (STCI), which applies directional shifts to one feature map before L2-normalized channel multiplication to capture displaced co-occurrence patterns.

If this is right

  • Texture and fine-grained classification accuracy rises because the network explicitly models how channel pairs co-occur at specific relative positions.
  • The small parameter and FLOP overhead makes the module practical for deployment on resource-limited devices without sacrificing performance.
  • The four heads produce orientation-selective representations that can be inspected to verify alignment with classical texture properties.
  • Training entirely from scratch becomes a viable protocol for comparing architectural inductive biases on these tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same shift-and-multiply pattern could be tested on periodic pattern tasks outside standard texture benchmarks, such as defect detection in manufactured surfaces.
  • Replacing fixed directional shifts with learned shift amounts might allow the module to adapt to dataset-specific texture scales.
  • Because the operation is local and differentiable, it could be inserted into video or 3D models to capture spatio-temporal co-occurrences.

Load-bearing premise

The reported accuracy gains are produced by the directional shift and channel-multiplication mechanism rather than by other training choices or implementation details.

What would settle it

An ablation that removes the directional shifts while keeping every other component fixed and measures whether accuracy on the four benchmarks falls to the level of the plain baseline.

Figures

Figures reproduced from arXiv: 2602.07262 by Feng Xiong, Huiling Chen, Junbo Jacob Lian, Kaichen Ouyang, Mingyang Yu, Shengwei Fu, Yujun Sun, Zhang Yujun, Zhong Rui, Zong Ke.

Figure 1
Figure 1. Figure 1: TwistNet-2D architecture. Top: TwistNet-2D-18 follows a ResNet-like structure with four stages; Stages 3–4 use TwistBlocks that inject second-order channel interactions through a gated MH-STCI branch. Bottom: a TwistBlock augments the standard residual block with the gated MH-STCI branch operating on the intermediate activation 𝐻. The internal structure of MH-STCI and a single STCI head is detailed in Figs… view at source ↗
Figure 2
Figure 2. Figure 2: Why cross-position correlation? (a) Wood grain exhibits periodic stripe-brown alternation. (b)–(c) The CNN extracts a stripe detector 𝑧1 and a brown-region detector 𝑧2 that respond at interleaved positions. (d) The same-position product 𝑧1 𝑧2 has a low response because peaks of 𝑧1 and 𝑧2 are misaligned. (e) Spiral Twist shifts 𝑧2 by a small offset 𝛿 before the product; this re-aligns peaks and recovers the… view at source ↗
Figure 3
Figure 3. Figure 3: Single STCI head. Channel reduction 𝐶 → 𝐶𝑟 , directional spiral twist via a fixed-pattern depthwise 3×3 convolution with a learnable per-channel scale, 𝓁2 -normalization along the channel dimension, upper-triangular pairwise product field 𝜙𝜃 ∈ ℝ𝑃×𝐻×𝑊 , and concatenation with the normalized first-order features 𝑍̄ . J. J. Lian et al.: Preprint submitted to Elsevier Page 15 of 12 [PITH_FULL_IMAGE:figures/fu… view at source ↗
Figure 4
Figure 4. Figure 4: Multi-Head STCI. Four directional heads ( [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy versus parameters on DTD. TwistNet-2D-18 attains the highest accuracy among all models. Group-2 baselines (∼28M) degrade sharply without pretraining, illustrating that targeted inductive bias outweighs raw capacity in data-limited regimes. J. J. Lian et al.: Preprint submitted to Elsevier Page 17 of 12 [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Learned direction selectivity. Channel interaction matrices for three DTD textures. Columns: four directional heads (0 ◦ , 45◦ , 90◦ , 135◦ ). Red borders and bold 𝜇 values indicate the strongest-responding direction per row. The values of 𝜇 are channel-pair-mean magnitudes after AIS reweighting, summarizing the post-attention mass routed through each head. J. J. Lian et al.: Preprint submitted to Elsevier… view at source ↗
read the original abstract

Second-order feature statistics are central to texture recognition, yet existing mechanisms exhibit a structural tension: bilinear pooling and Gram matrices capture global channel correlations but discard spatial structure, whereas self-attention models capture cross-position relations through weighted sums rather than explicit pairwise products. We propose TwistNet-2D, a lightweight module that computes local pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before L2-normalized channel multiplication, capturing cross-position co-occurrence patterns that characterize structured and periodic textures. Four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path with near-zero initialization. TwistNet-2D adds only approximately 3.5% parameters and approximately 2% FLOPs over ResNet-18. To isolate the contribution of architectural inductive bias from that of transfer learning, all models in this study are trained from scratch without ImageNet pretraining. Under this protocol, TwistNet-2D consistently surpasses parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer backbones across four texture and fine-grained recognition benchmarks, while the multi-head structure produces interpretable, orientation-selective representations that align with classical texture analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces TwistNet-2D, a lightweight plug-in module for CNNs that computes second-order channel interactions via Spiral-Twisted Channel Interaction (STCI). STCI shifts one feature map along a fixed directional spiral before performing L2-normalized channel-wise multiplication, aggregates four directional heads with content-adaptive reweighting, and injects the result through a sigmoid-gated residual connection initialized near zero. The central claim is that this architectural bias yields consistent gains over parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer models on four texture/fine-grained benchmarks when every model is trained from scratch without ImageNet pre-training, while adding only ~3.5% parameters and ~2% FLOPs.

Significance. If the reported gains are robustly attributable to the STCI inductive bias rather than capacity or optimization artifacts, the work supplies a concrete, low-overhead mechanism for injecting spatially-aware second-order statistics into modern backbones. The from-scratch training protocol is a methodological strength that helps isolate architectural contribution. The multi-head directional design also offers a path toward interpretable, orientation-selective features that align with classical texture descriptors.

major comments (2)
  1. [§4] §4 (Experimental protocol and Table 2/3): The headline claim that TwistNet-2D surpasses substantially larger ConvNeXt and Swin backbones rests on comparisons performed on small texture datasets. Because higher-capacity models are known to overfit more readily without pre-training or heavy regularization, the observed gains could be driven by capacity mismatch rather than the directional channel-product bias. A control that applies capacity-matched regularization to the larger baselines or inserts the STCI module into ConvNeXt/Swin is required to make the attribution load-bearing.
  2. [§3.2] §3.2 (STCI definition and Eq. (3)–(5)): The four directional heads are described as “prescribed” yet the aggregation uses content-adaptive channel reweighting. It is unclear whether the spiral displacement vectors themselves are fixed hyperparameters or learned; if they are fixed, the method is not fully parameter-free in the sense claimed, and the interpretability argument needs quantitative support (e.g., orientation selectivity metrics) beyond qualitative visualizations.
minor comments (2)
  1. [Abstract] Abstract: The statement “consistently surpasses … while adding only approximately 3.5% parameters” would be strengthened by reporting the exact parameter and FLOP deltas alongside the accuracy deltas in the abstract itself.
  2. [Figure 4] Figure 4 (orientation-selective maps): The qualitative claim that the heads align with classical texture analysis would benefit from a quantitative measure (e.g., angular histogram correlation with ground-truth orientation labels) rather than visual inspection alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major point below with clarifications on the design and experimental protocol. We agree that additional controls will strengthen the attribution of performance gains to the STCI inductive bias and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental protocol and Table 2/3): The headline claim that TwistNet-2D surpasses substantially larger ConvNeXt and Swin backbones rests on comparisons performed on small texture datasets. Because higher-capacity models are known to overfit more readily without pre-training or heavy regularization, the observed gains could be driven by capacity mismatch rather than the directional channel-product bias. A control that applies capacity-matched regularization to the larger baselines or inserts the STCI module into ConvNeXt/Swin is required to make the attribution load-bearing.

    Authors: We agree that inserting the STCI module into larger backbones such as ConvNeXt and Swin would provide stronger evidence that the gains stem from the directional second-order bias rather than capacity differences alone. In the revised manuscript we will add these experiments on the same four benchmarks under the identical from-scratch training protocol. We note that the current protocol already applies the same optimization settings and data augmentation to all models, and the observed overfitting of high-capacity models without pre-training is itself part of the motivation for a lightweight, bias-injecting module; nevertheless, the suggested controls will be included to make the attribution more robust. revision: yes

  2. Referee: [§3.2] §3.2 (STCI definition and Eq. (3)–(5)): The four directional heads are described as “prescribed” yet the aggregation uses content-adaptive channel reweighting. It is unclear whether the spiral displacement vectors themselves are fixed hyperparameters or learned; if they are fixed, the method is not fully parameter-free in the sense claimed, and the interpretability argument needs quantitative support (e.g., orientation selectivity metrics) beyond qualitative visualizations.

    Authors: The spiral displacement vectors are fixed, prescribed hyperparameters (explicitly stated as “prescribed direction” in Section 3.2 and Eq. (3)). This choice keeps the module lightweight; the only learned parameters are the content-adaptive reweighting vectors and the gating scalar, resulting in the reported ~3.5 % parameter overhead. We will revise the text to remove any ambiguous phrasing around “parameter-free” and explicitly state that the displacements are fixed. For interpretability, we will add quantitative orientation-selectivity metrics (e.g., directional response variance on synthetically rotated texture patches) alongside the existing qualitative visualizations. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces TwistNet-2D as an architectural module (STCI with directional shifts, L2-normalized channel products, four-head aggregation, and gated residual) whose operations are defined explicitly from first principles of texture co-occurrence rather than fitted to any target metric. All performance claims rest on external benchmark comparisons under a from-scratch training protocol; no equation, prediction, or uniqueness result reduces by construction to the paper's own inputs or self-citations. This is the standard non-circular case for an inductive-bias proposal validated empirically.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no explicit mathematical derivation, so no free parameters, axioms, or invented entities are extractable; the module is presented as an empirical architectural addition.

pith-pipeline@v0.9.0 · 5574 in / 1198 out tokens · 38371 ms · 2026-05-16T06:19:35.966602+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    M.Haralick, K

    R. M.Haralick, K. Shanmugam,I. Dinstein, Textural featuresfor image classification, IEEETransactions on Systems,Man, and Cybernetics SMC-3 (1973) 610–621

  2. [2]

    T.-Y. Lin, A. RoyChowdhury, S. Maji, Bilinear CNN models for fine-grained visual recognition, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1449–1457. J. J. Lian et al.:Preprint submitted to ElsevierPage 10 of 12 Spiral-Twisted Channel Interactions for Texture Recognition

  3. [3]

    Y.Gao,O.Beijbom,N.Zhang,T.Darrell, Compactbilinearpooling, in:ProceedingsoftheIEEEConferenceonComputerVisionandPattern Recognition (CVPR), 2016, pp. 317–326

  4. [4]

    L. A. Gatys, A. S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2414–2423

  5. [5]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008

  6. [6]

    7132–7141

    J.Hu,L.Shen,G.Sun,Squeeze-and-excitationnetworks,in:ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition (CVPR), 2018, pp. 7132–7141

  7. [7]

    J.Xue,H.Zhang,K.Dana, Deeptexturemanifoldforgroundterrainrecognition, in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition (CVPR), 2018, pp. 558–567

  8. [8]

    Zhang, J

    H. Zhang, J. Xue, K. Dana, Deep TEN: Texture encoding network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 708–717

  9. [9]

    W. Zhai, Y. Cao, Z.-J. Zha, H. Xie, F. Wu, Deep structure-revealed network for texture recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11010–11019

  10. [10]

    W.Zhai,Y.Cao,J.Zhang,H.Xie,D.Tao,Z.-J.Zha, Onexploringmultiplicityofprimitivesandattributesfortexturerecognitioninthewild, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (2024) 403–420

  11. [11]

    Evani, D

    R. Evani, D. Rajan, S. Mao, Chebyshev attention depth permutation texture network with latent texture attribute loss, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 23423–23432

  12. [12]

    A.Sikdar,Y.Liu,S.Kedarisetty,Y.Zhao,A.Ahmed,A.Behera, Interweavinginsights:High-orderfeatureinteractionforfine-grainedvisual recognition, International Journal of Computer Vision 133 (2025) 1755–1779

  13. [13]

    Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981) 91–97

    B. Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981) 91–97

  14. [14]

    Portilla, E

    J. Portilla, E. P. Simoncelli, A parametric texture model based on joint statistics of complex wavelet coefficients, International Journal of Computer Vision 40 (2000) 49–70

  15. [15]

    J.Johnson,A.Alahi,L.Fei-Fei, Perceptuallossesforreal-timestyletransferandsuper-resolution, in:ProceedingsoftheEuropeanConference on Computer Vision (ECCV), 2016, pp. 694–711

  16. [16]

    Y. Li, N. Wang, J. Liu, X. Hou, Demystifying neural style transfer, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017, pp. 2230–2236

  17. [17]

    S. Kong, C. Fowlkes, Low-rank bilinear pooling for fine-grained classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 365–374

  18. [18]

    2070–2078

    P.Li,J.Xie,Q.Wang,W.Zuo,Issecond-orderinformationhelpfulforlarge-scalevisualrecognition?,in:ProceedingsoftheIEEEInternational Conference on Computer Vision (ICCV), 2017, pp. 2070–2078

  19. [19]

    Cimpoi, S

    M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi, Describing textures in the wild, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3606–3613

  20. [20]

    Cimpoi, S

    M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3828–3836

  21. [21]

    Scabini, K

    L. Scabini, K. M. Zielinski, L. C. Ribas, W. N. Gonçalves, B. De Baets, O. M. Bruno, RADAM: Texture recognition through randomized aggregated encoding of deep activation maps, Pattern Recognition 143 (2023) 109802

  22. [22]

    Z.Chen,Y.Quan,R.Xu,L.Jin,Y.Xu, Enhancingtexturerepresentationwithdeeptracingpatternencoding, PatternRecognition146(2024) 109959

  23. [23]

    X.Shu,H.Pan,J.Shi,X.Song,X.-J.Wu, Usingglobalinformationtorefinelocalpatternsfortexturerepresentationandclassification, Pattern Recognition 131 (2022) 108843

  24. [24]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth16×16words: Transformers for image recognition at scale, in: Proceedings of the International Conference on Learning Representations (ICLR), 2021, pp. 1–21

  25. [25]

    Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10012–10022

  26. [26]

    Katharopoulos, A

    A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, Transformers are RNNs: Fast autoregressive transformers with linear attention, in: ProceedingsoftheInternationalConferenceonMachineLearning(ICML),volume119ofProceedingsofMachineLearningResearch,2020, pp. 5156–5165

  27. [27]

    P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, A. Ranjan, FastViT: A fast hybrid vision transformer using structural reparameterization, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5785–5795

  28. [28]

    Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, Y. Li, MaxViT: Multi-axis vision transformer, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 13684 ofLecture Notes in Computer Science, 2022, pp. 459–479

  29. [29]

    S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, CBAM: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 11211 ofLecture Notes in Computer Science, 2018, pp. 3–19

  30. [30]

    Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11531–11539

  31. [31]

    13713–13722

    Q.Hou,D.Zhou,J.Feng, Coordinateattentionforefficientmobilenetworkdesign, in:ProceedingsoftheIEEE/CVFConferenceonComputer Vision and Pattern Recognition (CVPR), 2021, pp. 13713–13722

  32. [32]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

  33. [33]

    Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A ConvNet for the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11976–11986. J. J. Lian et al.:Preprint submitted to ElsevierPage 11 of 12 Spiral-Twisted Channel Interactions for Texture Recognition

  34. [34]

    M. Tan, Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, 2019, pp. 6105–6114

  35. [35]

    A. Wang, H. Chen, Z. Lin, J. Han, G. Ding, RepViT: Revisiting mobile CNN from ViT perspective, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15909–15920

  36. [36]

    G. G. Chrysos, S. Moschoglou, G. Bouritsas, J. Deng, Y. Panagakis, S. Zafeiriou, Deep polynomial neural networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022) 4021–4034

  37. [37]

    J. J. Lian, H. Chen, K. Ouyang, Y. Zhang, R. Zhong, H. Chen, Twisted convolutional networks (TCNs): Enhancing feature interactions for non-spatial data classification, Neural Networks 197 (2026) 108451

  38. [38]

    Y. Wu, K. He, Group normalization, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 11217 ofLecture Notes in Computer Science, 2018, pp. 3–19

  39. [39]

    Sharan, R

    L. Sharan, R. Rosenholtz, E. H. Adelson, Material perception: What can you see in a brief glance?, Journal of Vision 9 (2009) 784

  40. [40]

    C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, Technical Report CNS-TR-2011-001, California Institute of Technology, 2011

  41. [41]

    Nilsback, A

    M.-E. Nilsback, A. Zisserman, Automated flower classification over a large number of classes, in: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), 2008, pp. 722–729

  42. [42]

    S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, S. Xie, ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 16133–16142

  43. [43]

    Wightman, PyTorch image models,https://github.com/rwightman/pytorch-image-models, 2019

    R. Wightman, PyTorch image models,https://github.com/rwightman/pytorch-image-models, 2019

  44. [44]

    3008–3017

    E.D.Cubuk,B.Zoph,J.Shlens,Q.V.Le, RandAugment:Practicalautomateddataaugmentationwithareducedsearchspace, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 3008–3017

  45. [45]

    H.Zhang,M.Cisse,Y.N.Dauphin,D.Lopez-Paz,mixup:Beyondempiricalriskminimization,in:ProceedingsoftheInternationalConference on Learning Representations (ICLR), 2018, pp. 1–13

  46. [46]

    S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, Y. Yoo, CutMix: Regularization strategy to train strong classifiers with localizable features, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6022–6031. J. J. Lian et al.:Preprint submitted to ElsevierPage 12 of 12 Spiral-Twisted Channel Interactions for Texture Rec...