TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

Feng Xiong; Huiling Chen; Junbo Jacob Lian; Kaichen Ouyang; Mingyang Yu; Shengwei Fu; Yujun Sun; Zhang Yujun; Zhong Rui; Zong Ke

arxiv: 2602.07262 · v3 · submitted 2026-02-06 · 💻 cs.CV

TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

Junbo Jacob Lian , Feng Xiong , Yujun Sun , Kaichen Ouyang , Zong Ke , Mingyang Yu , Shengwei Fu , Zhong Rui

show 2 more authors

Zhang Yujun Huiling Chen

This is my paper

Pith reviewed 2026-05-16 06:19 UTC · model grok-4.3

classification 💻 cs.CV

keywords texture recognitionsecond-order channel interactionsspiral twistingconvolutional networksfine-grained classificationdirectional feature shifts

0 comments

The pith

TwistNet-2D captures second-order channel co-occurrences by shifting feature maps along spiral directions before multiplication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TwistNet-2D to resolve a tension in texture recognition between methods that capture global channel correlations without spatial structure and those that model cross-position relations without explicit pairwise products. It does so through a lightweight module whose core operation shifts one feature map along a chosen direction, computes an L2-normalized product with the unshifted map, and aggregates four such directional heads via content-adaptive reweighting before a gated residual addition. All experiments train networks from scratch on four texture and fine-grained benchmarks, avoiding ImageNet pretraining to isolate the effect of the architectural choice. The module adds roughly 3.5 percent parameters and 2 percent FLOPs over a ResNet-18 baseline yet surpasses both parameter-matched networks and substantially larger ConvNeXt and Swin Transformer models. The multi-head outputs become interpretable, orientation-selective feature maps that align with classical descriptions of structured and periodic textures.

Core claim

TwistNet-2D computes local pairwise channel products under directional spatial displacement: one feature map is shifted along a prescribed direction, an L2-normalized channel multiplication is performed, four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path initialized near zero. This joint encoding of co-occurrence location and interaction strength improves recognition of textures whose characteristic patterns depend on both channel correlations and their relative spatial offsets.

What carries the argument

Spiral-Twisted Channel Interaction (STCI), which applies directional shifts to one feature map before L2-normalized channel multiplication to capture displaced co-occurrence patterns.

If this is right

Texture and fine-grained classification accuracy rises because the network explicitly models how channel pairs co-occur at specific relative positions.
The small parameter and FLOP overhead makes the module practical for deployment on resource-limited devices without sacrificing performance.
The four heads produce orientation-selective representations that can be inspected to verify alignment with classical texture properties.
Training entirely from scratch becomes a viable protocol for comparing architectural inductive biases on these tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same shift-and-multiply pattern could be tested on periodic pattern tasks outside standard texture benchmarks, such as defect detection in manufactured surfaces.
Replacing fixed directional shifts with learned shift amounts might allow the module to adapt to dataset-specific texture scales.
Because the operation is local and differentiable, it could be inserted into video or 3D models to capture spatio-temporal co-occurrences.

Load-bearing premise

The reported accuracy gains are produced by the directional shift and channel-multiplication mechanism rather than by other training choices or implementation details.

What would settle it

An ablation that removes the directional shifts while keeping every other component fixed and measures whether accuracy on the four benchmarks falls to the level of the plain baseline.

Figures

Figures reproduced from arXiv: 2602.07262 by Feng Xiong, Huiling Chen, Junbo Jacob Lian, Kaichen Ouyang, Mingyang Yu, Shengwei Fu, Yujun Sun, Zhang Yujun, Zhong Rui, Zong Ke.

**Figure 1.** Figure 1: TwistNet-2D architecture. Top: TwistNet-2D-18 follows a ResNet-like structure with four stages; Stages 3–4 use TwistBlocks that inject second-order channel interactions through a gated MH-STCI branch. Bottom: a TwistBlock augments the standard residual block with the gated MH-STCI branch operating on the intermediate activation 𝐻. The internal structure of MH-STCI and a single STCI head is detailed in Figs… view at source ↗

**Figure 2.** Figure 2: Why cross-position correlation? (a) Wood grain exhibits periodic stripe-brown alternation. (b)–(c) The CNN extracts a stripe detector 𝑧1 and a brown-region detector 𝑧2 that respond at interleaved positions. (d) The same-position product 𝑧1 𝑧2 has a low response because peaks of 𝑧1 and 𝑧2 are misaligned. (e) Spiral Twist shifts 𝑧2 by a small offset 𝛿 before the product; this re-aligns peaks and recovers the… view at source ↗

**Figure 3.** Figure 3: Single STCI head. Channel reduction 𝐶 → 𝐶𝑟 , directional spiral twist via a fixed-pattern depthwise 3×3 convolution with a learnable per-channel scale, 𝓁2 -normalization along the channel dimension, upper-triangular pairwise product field 𝜙𝜃 ∈ ℝ𝑃×𝐻×𝑊 , and concatenation with the normalized first-order features 𝑍̄ . J. J. Lian et al.: Preprint submitted to Elsevier Page 15 of 12 [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 4.** Figure 4: Multi-Head STCI. Four directional heads ( [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy versus parameters on DTD. TwistNet-2D-18 attains the highest accuracy among all models. Group-2 baselines (∼28M) degrade sharply without pretraining, illustrating that targeted inductive bias outweighs raw capacity in data-limited regimes. J. J. Lian et al.: Preprint submitted to Elsevier Page 17 of 12 [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Learned direction selectivity. Channel interaction matrices for three DTD textures. Columns: four directional heads (0 ◦ , 45◦ , 90◦ , 135◦ ). Red borders and bold 𝜇 values indicate the strongest-responding direction per row. The values of 𝜇 are channel-pair-mean magnitudes after AIS reweighting, summarizing the post-attention mass routed through each head. J. J. Lian et al.: Preprint submitted to Elsevier… view at source ↗

read the original abstract

Second-order feature statistics are central to texture recognition, yet existing mechanisms exhibit a structural tension: bilinear pooling and Gram matrices capture global channel correlations but discard spatial structure, whereas self-attention models capture cross-position relations through weighted sums rather than explicit pairwise products. We propose TwistNet-2D, a lightweight module that computes local pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before L2-normalized channel multiplication, capturing cross-position co-occurrence patterns that characterize structured and periodic textures. Four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path with near-zero initialization. TwistNet-2D adds only approximately 3.5% parameters and approximately 2% FLOPs over ResNet-18. To isolate the contribution of architectural inductive bias from that of transfer learning, all models in this study are trained from scratch without ImageNet pretraining. Under this protocol, TwistNet-2D consistently surpasses parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer backbones across four texture and fine-grained recognition benchmarks, while the multi-head structure produces interpretable, orientation-selective representations that align with classical texture analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TwistNet-2D offers a lightweight directional channel-shift module that claims gains on texture benchmarks when all models train from scratch, but the wins over larger ConvNeXt and Swin backbones look vulnerable to capacity-overfitting confounds on small datasets.

read the letter

The paper's main move is Spiral-Twisted Channel Interaction: shift one feature map along prescribed spiral directions, multiply channels after L2 normalization, aggregate four heads with content-adaptive reweighting, and add the result through a sigmoid-gated residual with near-zero initialization. This sits inside a ResNet-18 and targets the gap between global bilinear stats and position-weighted attention by keeping explicit local pairwise products plus directional displacement. The design is straightforward and the overhead stays low at roughly 3.5 percent extra parameters and 2 percent FLOPs. Training every model from scratch without ImageNet pretraining is a clean choice that removes transfer-learning noise and lets the architectural bias stand on its own. That protocol is worth crediting because it makes the comparison more direct than the usual fine-tuning setup. The multi-head structure also produces orientation-selective maps that line up with classical texture analysis, which is a nice side benefit. The soft spot is the central comparison. The abstract states consistent outperformance over both parameter-matched baselines and substantially larger ConvNeXt and Swin models on four texture and fine-grained datasets, yet the stress-test concern holds: small texture collections make high-capacity models prone to overfitting when trained from scratch, independent of the spiral shift or channel products. Without ablations that insert the module into the larger backbones, or capacity-matched regularization on the baselines, it is hard to know whether the reported edge comes from the inductive bias or simply from using a smaller net. The abstract supplies no numerical deltas, error bars, or statistical tests, so the full paper must show those tables before the claim can be taken as solid. The math itself contains no circular definitions or post-hoc fitting tricks. This paper is aimed at researchers who build efficient CNN extensions for texture, material, or fine-grained recognition tasks, especially in inspection or medical imaging where parameter count matters. A reader already working on second-order pooling or directional operators would get the most out of it. I would send it for peer review so the experimental controls and numbers can be checked properly.

Referee Report

2 major / 2 minor

Summary. The paper introduces TwistNet-2D, a lightweight plug-in module for CNNs that computes second-order channel interactions via Spiral-Twisted Channel Interaction (STCI). STCI shifts one feature map along a fixed directional spiral before performing L2-normalized channel-wise multiplication, aggregates four directional heads with content-adaptive reweighting, and injects the result through a sigmoid-gated residual connection initialized near zero. The central claim is that this architectural bias yields consistent gains over parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer models on four texture/fine-grained benchmarks when every model is trained from scratch without ImageNet pre-training, while adding only ~3.5% parameters and ~2% FLOPs.

Significance. If the reported gains are robustly attributable to the STCI inductive bias rather than capacity or optimization artifacts, the work supplies a concrete, low-overhead mechanism for injecting spatially-aware second-order statistics into modern backbones. The from-scratch training protocol is a methodological strength that helps isolate architectural contribution. The multi-head directional design also offers a path toward interpretable, orientation-selective features that align with classical texture descriptors.

major comments (2)

[§4] §4 (Experimental protocol and Table 2/3): The headline claim that TwistNet-2D surpasses substantially larger ConvNeXt and Swin backbones rests on comparisons performed on small texture datasets. Because higher-capacity models are known to overfit more readily without pre-training or heavy regularization, the observed gains could be driven by capacity mismatch rather than the directional channel-product bias. A control that applies capacity-matched regularization to the larger baselines or inserts the STCI module into ConvNeXt/Swin is required to make the attribution load-bearing.
[§3.2] §3.2 (STCI definition and Eq. (3)–(5)): The four directional heads are described as “prescribed” yet the aggregation uses content-adaptive channel reweighting. It is unclear whether the spiral displacement vectors themselves are fixed hyperparameters or learned; if they are fixed, the method is not fully parameter-free in the sense claimed, and the interpretability argument needs quantitative support (e.g., orientation selectivity metrics) beyond qualitative visualizations.

minor comments (2)

[Abstract] Abstract: The statement “consistently surpasses … while adding only approximately 3.5% parameters” would be strengthened by reporting the exact parameter and FLOP deltas alongside the accuracy deltas in the abstract itself.
[Figure 4] Figure 4 (orientation-selective maps): The qualitative claim that the heads align with classical texture analysis would benefit from a quantitative measure (e.g., angular histogram correlation with ground-truth orientation labels) rather than visual inspection alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major point below with clarifications on the design and experimental protocol. We agree that additional controls will strengthen the attribution of performance gains to the STCI inductive bias and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (Experimental protocol and Table 2/3): The headline claim that TwistNet-2D surpasses substantially larger ConvNeXt and Swin backbones rests on comparisons performed on small texture datasets. Because higher-capacity models are known to overfit more readily without pre-training or heavy regularization, the observed gains could be driven by capacity mismatch rather than the directional channel-product bias. A control that applies capacity-matched regularization to the larger baselines or inserts the STCI module into ConvNeXt/Swin is required to make the attribution load-bearing.

Authors: We agree that inserting the STCI module into larger backbones such as ConvNeXt and Swin would provide stronger evidence that the gains stem from the directional second-order bias rather than capacity differences alone. In the revised manuscript we will add these experiments on the same four benchmarks under the identical from-scratch training protocol. We note that the current protocol already applies the same optimization settings and data augmentation to all models, and the observed overfitting of high-capacity models without pre-training is itself part of the motivation for a lightweight, bias-injecting module; nevertheless, the suggested controls will be included to make the attribution more robust. revision: yes
Referee: [§3.2] §3.2 (STCI definition and Eq. (3)–(5)): The four directional heads are described as “prescribed” yet the aggregation uses content-adaptive channel reweighting. It is unclear whether the spiral displacement vectors themselves are fixed hyperparameters or learned; if they are fixed, the method is not fully parameter-free in the sense claimed, and the interpretability argument needs quantitative support (e.g., orientation selectivity metrics) beyond qualitative visualizations.

Authors: The spiral displacement vectors are fixed, prescribed hyperparameters (explicitly stated as “prescribed direction” in Section 3.2 and Eq. (3)). This choice keeps the module lightweight; the only learned parameters are the content-adaptive reweighting vectors and the gating scalar, resulting in the reported ~3.5 % parameter overhead. We will revise the text to remove any ambiguous phrasing around “parameter-free” and explicitly state that the displacements are fixed. For interpretability, we will add quantitative orientation-selectivity metrics (e.g., directional response variance on synthetically rotated texture patches) alongside the existing qualitative visualizations. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces TwistNet-2D as an architectural module (STCI with directional shifts, L2-normalized channel products, four-head aggregation, and gated residual) whose operations are defined explicitly from first principles of texture co-occurrence rather than fitted to any target metric. All performance claims rest on external benchmark comparisons under a from-scratch training protocol; no equation, prediction, or uniqueness result reduces by construction to the paper's own inputs or self-citations. This is the standard non-circular case for an inductive-bias proposal validated empirically.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no explicit mathematical derivation, so no free parameters, axioms, or invented entities are extractable; the module is presented as an empirical architectural addition.

pith-pipeline@v0.9.0 · 5574 in / 1198 out tokens · 38371 ms · 2026-05-16T06:19:35.966602+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

M.Haralick, K

R. M.Haralick, K. Shanmugam,I. Dinstein, Textural featuresfor image classification, IEEETransactions on Systems,Man, and Cybernetics SMC-3 (1973) 610–621

work page 1973
[2]

T.-Y. Lin, A. RoyChowdhury, S. Maji, Bilinear CNN models for fine-grained visual recognition, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1449–1457. J. J. Lian et al.:Preprint submitted to ElsevierPage 10 of 12 Spiral-Twisted Channel Interactions for Texture Recognition

work page 2015
[3]

Y.Gao,O.Beijbom,N.Zhang,T.Darrell, Compactbilinearpooling, in:ProceedingsoftheIEEEConferenceonComputerVisionandPattern Recognition (CVPR), 2016, pp. 317–326

work page 2016
[4]

L. A. Gatys, A. S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2414–2423

work page 2016
[5]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008

work page 2017
[6]

7132–7141

J.Hu,L.Shen,G.Sun,Squeeze-and-excitationnetworks,in:ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition (CVPR), 2018, pp. 7132–7141

work page 2018
[7]

J.Xue,H.Zhang,K.Dana, Deeptexturemanifoldforgroundterrainrecognition, in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition (CVPR), 2018, pp. 558–567

work page 2018
[8]

Zhang, J

H. Zhang, J. Xue, K. Dana, Deep TEN: Texture encoding network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 708–717

work page 2017
[9]

W. Zhai, Y. Cao, Z.-J. Zha, H. Xie, F. Wu, Deep structure-revealed network for texture recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11010–11019

work page 2020
[10]

W.Zhai,Y.Cao,J.Zhang,H.Xie,D.Tao,Z.-J.Zha, Onexploringmultiplicityofprimitivesandattributesfortexturerecognitioninthewild, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (2024) 403–420

work page 2024
[11]

Evani, D

R. Evani, D. Rajan, S. Mao, Chebyshev attention depth permutation texture network with latent texture attribute loss, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 23423–23432

work page 2025
[12]

A.Sikdar,Y.Liu,S.Kedarisetty,Y.Zhao,A.Ahmed,A.Behera, Interweavinginsights:High-orderfeatureinteractionforfine-grainedvisual recognition, International Journal of Computer Vision 133 (2025) 1755–1779

work page 2025
[13]

Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981) 91–97

B. Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981) 91–97

work page 1981
[14]

Portilla, E

J. Portilla, E. P. Simoncelli, A parametric texture model based on joint statistics of complex wavelet coefficients, International Journal of Computer Vision 40 (2000) 49–70

work page 2000
[15]

J.Johnson,A.Alahi,L.Fei-Fei, Perceptuallossesforreal-timestyletransferandsuper-resolution, in:ProceedingsoftheEuropeanConference on Computer Vision (ECCV), 2016, pp. 694–711

work page 2016
[16]

Y. Li, N. Wang, J. Liu, X. Hou, Demystifying neural style transfer, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017, pp. 2230–2236

work page 2017
[17]

S. Kong, C. Fowlkes, Low-rank bilinear pooling for fine-grained classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 365–374

work page 2017
[18]

2070–2078

P.Li,J.Xie,Q.Wang,W.Zuo,Issecond-orderinformationhelpfulforlarge-scalevisualrecognition?,in:ProceedingsoftheIEEEInternational Conference on Computer Vision (ICCV), 2017, pp. 2070–2078

work page 2017
[19]

Cimpoi, S

M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi, Describing textures in the wild, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3606–3613

work page 2014
[20]

Cimpoi, S

M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3828–3836

work page 2015
[21]

Scabini, K

L. Scabini, K. M. Zielinski, L. C. Ribas, W. N. Gonçalves, B. De Baets, O. M. Bruno, RADAM: Texture recognition through randomized aggregated encoding of deep activation maps, Pattern Recognition 143 (2023) 109802

work page 2023
[22]

Z.Chen,Y.Quan,R.Xu,L.Jin,Y.Xu, Enhancingtexturerepresentationwithdeeptracingpatternencoding, PatternRecognition146(2024) 109959

work page 2024
[23]

X.Shu,H.Pan,J.Shi,X.Song,X.-J.Wu, Usingglobalinformationtorefinelocalpatternsfortexturerepresentationandclassification, Pattern Recognition 131 (2022) 108843

work page 2022
[24]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth16×16words: Transformers for image recognition at scale, in: Proceedings of the International Conference on Learning Representations (ICLR), 2021, pp. 1–21

work page 2021
[25]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10012–10022

work page 2021
[26]

Katharopoulos, A

A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, Transformers are RNNs: Fast autoregressive transformers with linear attention, in: ProceedingsoftheInternationalConferenceonMachineLearning(ICML),volume119ofProceedingsofMachineLearningResearch,2020, pp. 5156–5165

work page 2020
[27]

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, A. Ranjan, FastViT: A fast hybrid vision transformer using structural reparameterization, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5785–5795

work page 2023
[28]

Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, Y. Li, MaxViT: Multi-axis vision transformer, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 13684 ofLecture Notes in Computer Science, 2022, pp. 459–479

work page 2022
[29]

S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, CBAM: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 11211 ofLecture Notes in Computer Science, 2018, pp. 3–19

work page 2018
[30]

Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11531–11539

work page 2020
[31]

13713–13722

Q.Hou,D.Zhou,J.Feng, Coordinateattentionforefficientmobilenetworkdesign, in:ProceedingsoftheIEEE/CVFConferenceonComputer Vision and Pattern Recognition (CVPR), 2021, pp. 13713–13722

work page 2021
[32]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

work page 2016
[33]

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A ConvNet for the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11976–11986. J. J. Lian et al.:Preprint submitted to ElsevierPage 11 of 12 Spiral-Twisted Channel Interactions for Texture Recognition

work page 2022
[34]

M. Tan, Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, 2019, pp. 6105–6114

work page 2019
[35]

A. Wang, H. Chen, Z. Lin, J. Han, G. Ding, RepViT: Revisiting mobile CNN from ViT perspective, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15909–15920

work page 2024
[36]

G. G. Chrysos, S. Moschoglou, G. Bouritsas, J. Deng, Y. Panagakis, S. Zafeiriou, Deep polynomial neural networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022) 4021–4034

work page 2022
[37]

J. J. Lian, H. Chen, K. Ouyang, Y. Zhang, R. Zhong, H. Chen, Twisted convolutional networks (TCNs): Enhancing feature interactions for non-spatial data classification, Neural Networks 197 (2026) 108451

work page 2026
[38]

Y. Wu, K. He, Group normalization, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 11217 ofLecture Notes in Computer Science, 2018, pp. 3–19

work page 2018
[39]

Sharan, R

L. Sharan, R. Rosenholtz, E. H. Adelson, Material perception: What can you see in a brief glance?, Journal of Vision 9 (2009) 784

work page 2009
[40]

C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, Technical Report CNS-TR-2011-001, California Institute of Technology, 2011

work page 2011
[41]

Nilsback, A

M.-E. Nilsback, A. Zisserman, Automated flower classification over a large number of classes, in: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), 2008, pp. 722–729

work page 2008
[42]

S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, S. Xie, ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 16133–16142

work page 2023
[43]

Wightman, PyTorch image models,https://github.com/rwightman/pytorch-image-models, 2019

R. Wightman, PyTorch image models,https://github.com/rwightman/pytorch-image-models, 2019

work page 2019
[44]

3008–3017

E.D.Cubuk,B.Zoph,J.Shlens,Q.V.Le, RandAugment:Practicalautomateddataaugmentationwithareducedsearchspace, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 3008–3017

work page 2020
[45]

H.Zhang,M.Cisse,Y.N.Dauphin,D.Lopez-Paz,mixup:Beyondempiricalriskminimization,in:ProceedingsoftheInternationalConference on Learning Representations (ICLR), 2018, pp. 1–13

work page 2018
[46]

S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, Y. Yoo, CutMix: Regularization strategy to train strong classifiers with localizable features, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6022–6031. J. J. Lian et al.:Preprint submitted to ElsevierPage 12 of 12 Spiral-Twisted Channel Interactions for Texture Rec...

work page 2019

[1] [1]

M.Haralick, K

R. M.Haralick, K. Shanmugam,I. Dinstein, Textural featuresfor image classification, IEEETransactions on Systems,Man, and Cybernetics SMC-3 (1973) 610–621

work page 1973

[2] [2]

T.-Y. Lin, A. RoyChowdhury, S. Maji, Bilinear CNN models for fine-grained visual recognition, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1449–1457. J. J. Lian et al.:Preprint submitted to ElsevierPage 10 of 12 Spiral-Twisted Channel Interactions for Texture Recognition

work page 2015

[3] [3]

Y.Gao,O.Beijbom,N.Zhang,T.Darrell, Compactbilinearpooling, in:ProceedingsoftheIEEEConferenceonComputerVisionandPattern Recognition (CVPR), 2016, pp. 317–326

work page 2016

[4] [4]

L. A. Gatys, A. S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2414–2423

work page 2016

[5] [5]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008

work page 2017

[6] [6]

7132–7141

J.Hu,L.Shen,G.Sun,Squeeze-and-excitationnetworks,in:ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition (CVPR), 2018, pp. 7132–7141

work page 2018

[7] [7]

J.Xue,H.Zhang,K.Dana, Deeptexturemanifoldforgroundterrainrecognition, in:ProceedingsoftheIEEEConferenceonComputerVision and Pattern Recognition (CVPR), 2018, pp. 558–567

work page 2018

[8] [8]

Zhang, J

H. Zhang, J. Xue, K. Dana, Deep TEN: Texture encoding network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 708–717

work page 2017

[9] [9]

W. Zhai, Y. Cao, Z.-J. Zha, H. Xie, F. Wu, Deep structure-revealed network for texture recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11010–11019

work page 2020

[10] [10]

W.Zhai,Y.Cao,J.Zhang,H.Xie,D.Tao,Z.-J.Zha, Onexploringmultiplicityofprimitivesandattributesfortexturerecognitioninthewild, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (2024) 403–420

work page 2024

[11] [11]

Evani, D

R. Evani, D. Rajan, S. Mao, Chebyshev attention depth permutation texture network with latent texture attribute loss, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 23423–23432

work page 2025

[12] [12]

A.Sikdar,Y.Liu,S.Kedarisetty,Y.Zhao,A.Ahmed,A.Behera, Interweavinginsights:High-orderfeatureinteractionforfine-grainedvisual recognition, International Journal of Computer Vision 133 (2025) 1755–1779

work page 2025

[13] [13]

Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981) 91–97

B. Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981) 91–97

work page 1981

[14] [14]

Portilla, E

J. Portilla, E. P. Simoncelli, A parametric texture model based on joint statistics of complex wavelet coefficients, International Journal of Computer Vision 40 (2000) 49–70

work page 2000

[15] [15]

J.Johnson,A.Alahi,L.Fei-Fei, Perceptuallossesforreal-timestyletransferandsuper-resolution, in:ProceedingsoftheEuropeanConference on Computer Vision (ECCV), 2016, pp. 694–711

work page 2016

[16] [16]

Y. Li, N. Wang, J. Liu, X. Hou, Demystifying neural style transfer, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017, pp. 2230–2236

work page 2017

[17] [17]

S. Kong, C. Fowlkes, Low-rank bilinear pooling for fine-grained classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 365–374

work page 2017

[18] [18]

2070–2078

P.Li,J.Xie,Q.Wang,W.Zuo,Issecond-orderinformationhelpfulforlarge-scalevisualrecognition?,in:ProceedingsoftheIEEEInternational Conference on Computer Vision (ICCV), 2017, pp. 2070–2078

work page 2017

[19] [19]

Cimpoi, S

M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi, Describing textures in the wild, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3606–3613

work page 2014

[20] [20]

Cimpoi, S

M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3828–3836

work page 2015

[21] [21]

Scabini, K

L. Scabini, K. M. Zielinski, L. C. Ribas, W. N. Gonçalves, B. De Baets, O. M. Bruno, RADAM: Texture recognition through randomized aggregated encoding of deep activation maps, Pattern Recognition 143 (2023) 109802

work page 2023

[22] [22]

Z.Chen,Y.Quan,R.Xu,L.Jin,Y.Xu, Enhancingtexturerepresentationwithdeeptracingpatternencoding, PatternRecognition146(2024) 109959

work page 2024

[23] [23]

X.Shu,H.Pan,J.Shi,X.Song,X.-J.Wu, Usingglobalinformationtorefinelocalpatternsfortexturerepresentationandclassification, Pattern Recognition 131 (2022) 108843

work page 2022

[24] [24]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth16×16words: Transformers for image recognition at scale, in: Proceedings of the International Conference on Learning Representations (ICLR), 2021, pp. 1–21

work page 2021

[25] [25]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10012–10022

work page 2021

[26] [26]

Katharopoulos, A

A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, Transformers are RNNs: Fast autoregressive transformers with linear attention, in: ProceedingsoftheInternationalConferenceonMachineLearning(ICML),volume119ofProceedingsofMachineLearningResearch,2020, pp. 5156–5165

work page 2020

[27] [27]

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, A. Ranjan, FastViT: A fast hybrid vision transformer using structural reparameterization, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5785–5795

work page 2023

[28] [28]

Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, Y. Li, MaxViT: Multi-axis vision transformer, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 13684 ofLecture Notes in Computer Science, 2022, pp. 459–479

work page 2022

[29] [29]

S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, CBAM: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 11211 ofLecture Notes in Computer Science, 2018, pp. 3–19

work page 2018

[30] [30]

Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11531–11539

work page 2020

[31] [31]

13713–13722

Q.Hou,D.Zhou,J.Feng, Coordinateattentionforefficientmobilenetworkdesign, in:ProceedingsoftheIEEE/CVFConferenceonComputer Vision and Pattern Recognition (CVPR), 2021, pp. 13713–13722

work page 2021

[32] [32]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

work page 2016

[33] [33]

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A ConvNet for the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11976–11986. J. J. Lian et al.:Preprint submitted to ElsevierPage 11 of 12 Spiral-Twisted Channel Interactions for Texture Recognition

work page 2022

[34] [34]

M. Tan, Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, 2019, pp. 6105–6114

work page 2019

[35] [35]

A. Wang, H. Chen, Z. Lin, J. Han, G. Ding, RepViT: Revisiting mobile CNN from ViT perspective, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15909–15920

work page 2024

[36] [36]

G. G. Chrysos, S. Moschoglou, G. Bouritsas, J. Deng, Y. Panagakis, S. Zafeiriou, Deep polynomial neural networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022) 4021–4034

work page 2022

[37] [37]

J. J. Lian, H. Chen, K. Ouyang, Y. Zhang, R. Zhong, H. Chen, Twisted convolutional networks (TCNs): Enhancing feature interactions for non-spatial data classification, Neural Networks 197 (2026) 108451

work page 2026

[38] [38]

Y. Wu, K. He, Group normalization, in: Proceedings of the European Conference on Computer Vision (ECCV), volume 11217 ofLecture Notes in Computer Science, 2018, pp. 3–19

work page 2018

[39] [39]

Sharan, R

L. Sharan, R. Rosenholtz, E. H. Adelson, Material perception: What can you see in a brief glance?, Journal of Vision 9 (2009) 784

work page 2009

[40] [40]

C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, Technical Report CNS-TR-2011-001, California Institute of Technology, 2011

work page 2011

[41] [41]

Nilsback, A

M.-E. Nilsback, A. Zisserman, Automated flower classification over a large number of classes, in: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), 2008, pp. 722–729

work page 2008

[42] [42]

S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, S. Xie, ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 16133–16142

work page 2023

[43] [43]

Wightman, PyTorch image models,https://github.com/rwightman/pytorch-image-models, 2019

R. Wightman, PyTorch image models,https://github.com/rwightman/pytorch-image-models, 2019

work page 2019

[44] [44]

3008–3017

E.D.Cubuk,B.Zoph,J.Shlens,Q.V.Le, RandAugment:Practicalautomateddataaugmentationwithareducedsearchspace, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 3008–3017

work page 2020

[45] [45]

H.Zhang,M.Cisse,Y.N.Dauphin,D.Lopez-Paz,mixup:Beyondempiricalriskminimization,in:ProceedingsoftheInternationalConference on Learning Representations (ICLR), 2018, pp. 1–13

work page 2018

[46] [46]

S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, Y. Yoo, CutMix: Regularization strategy to train strong classifiers with localizable features, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6022–6031. J. J. Lian et al.:Preprint submitted to ElsevierPage 12 of 12 Spiral-Twisted Channel Interactions for Texture Rec...

work page 2019