pith. sign in

arxiv: 2604.20368 · v1 · submitted 2026-04-22 · 💻 cs.CV · cs.AI

LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel

Pith reviewed 2026-05-10 00:25 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords linear attentionLaplacian kernelvision transformerNyström approximationNewton-Schulz iterationattention mechanismImageNetefficient transformer
0
0 comments X

The pith

A Laplacian kernel replaces softmax in attention to achieve linear complexity while retaining expressiveness in vision transformers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision transformers face quadratic costs in attention that limit high-resolution use. LaplacianFormer replaces softmax with a Laplacian kernel, motivated by observations that Gaussian kernels suppress mid-range token interactions and by supporting theory. The design adds a provably injective feature map to preserve token details under approximation, then applies Nyström kernel approximation solved via Newton-Schulz iteration with custom CUDA kernels for speed. If the approach holds, transformers can scale to larger images with competitive ImageNet accuracy and better efficiency than prior linear attention methods.

Core claim

LaplacianFormer employs a Laplacian kernel as a principled alternative to softmax, motivated by empirical observations and theoretical analysis, together with a provably injective feature map, Nyström approximation, and Newton-Schulz solver, achieving strong performance-efficiency trade-offs on ImageNet while improving attention expressiveness.

What carries the argument

Laplacian kernel paired with a provably injective feature map, Nyström approximation, and Newton-Schulz solver for linear attention computation.

If this is right

  • Attention computation scales linearly with token count, supporting higher-resolution inputs without quadratic blowup.
  • Mid-range token dependencies receive stronger weighting than under Gaussian kernels.
  • The injective feature map prevents loss of fine-grained token information during low-rank approximation.
  • Newton-Schulz iteration plus custom CUDA kernels deliver high-throughput forward and backward passes suitable for edge hardware.
  • Overall model accuracy on ImageNet remains competitive while efficiency improves over softmax baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same kernel-plus-solver pattern could be tested in non-vision transformers where mid-range dependencies matter.
  • Newton-Schulz iteration might accelerate other kernel-matrix operations inside deep networks beyond attention.
  • Hybrid models could combine Laplacian attention layers with standard softmax layers for tasks needing both long-range and local focus.
  • The efficiency gains suggest practical deployment on resource-limited devices that current quadratic transformers cannot reach.

Load-bearing premise

That the Laplacian kernel, when paired with the proposed feature map and approximations, genuinely improves mid-range token interactions and overall expressiveness compared with Gaussian kernels, and that the claimed theoretical grounding and injectivity hold in the actual model implementation.

What would settle it

A side-by-side ImageNet experiment in which an equivalently approximated Gaussian-kernel linear attention model matches or exceeds LaplacianFormer accuracy at the same throughput would falsify the claim of superior expressiveness.

Figures

Figures reproduced from arXiv: 2604.20368 by Changwei Wang, Muyang Zhang, Rongtao Xu, Sen Lian, Tianlong Tan, Weiliang Meng, Xiaopeng Zhang, Zhe Feng.

Figure 1
Figure 1. Figure 1: Distributions of ℓ1 and ℓ 2 2 Q-K distances in DeiT, PVT, and Swin Transformers. Theoretically, the Gaussian kernel presumes that query-key similarity should decay rapidly with increasing ℓ 2 2 distance. However, this assumption may not reflect the actual distribution of query-key interactions in vision Transformers. To investigate this issue, we analyze the empirical distribution of query-key distances in… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Top-1 accuracy (%) over training epochs on ImageNet. The left plot shows results [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy and Memory Comparison.(a) Top-1 accuracy vs. FLOPs on ImageNet-1k Deng [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison between Softmax Self-Attention (left) and Linear Self-Attention (right). The [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Execution time breakdown of custom CUDA kernels. Comparison of forward and backward execution time for Newton–Schulz iteration (left) and Laplacian kernel (right) across dif￾ferent matrix sizes (batch = 1, 2 heads, 32 channels). CUDA execution times (< 0.05ms) are shown as 0.0 due to timing resolution limits. definite, we apply a small diagonal perturbation W ← W+ ϵI, with ϵ > 0, preserving the structure w… view at source ↗
Figure 6
Figure 6. Figure 6: Convergence behavior of Newton–Schulz and conjugate gradient methods under varying [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of attention maps under different Laplacian kernel scales λ. From left to right: λ = 0.5, 1, 2, 4, 8. 6 CONCLUSIONS AND FUTURE WORK We propose LaplacianFormer, a Transformer variant that employs a Laplacian kernel to construct injective and normalized attention, enabling fine-grained token discrimination with linear com￾plexity. To ensure scalability, we adopt the Nystrom approximation and ac… view at source ↗
read the original abstract

The quadratic complexity of softmax attention presents a major obstacle for scaling Transformers to high-resolution vision tasks. Existing linear attention variants often replace the softmax with Gaussian kernels to reduce complexity, but such approximations lack theoretical grounding and tend to oversuppress mid-range token interactions. We propose LaplacianFormer, a Transformer variant that employs a Laplacian kernel as a principled alternative to softmax, motivated by empirical observations and theoretical analysis. To address expressiveness degradation under low-rank approximations, we introduce a provably injective feature map that retains fine-grained token information. For efficient computation, we adopt a Nystr\"om approximation of the kernel matrix and solve the resulting system using Newton--Schulz iteration, avoiding costly matrix inversion and SVD. We further develop custom CUDA implementations for both the kernel and solver, enabling high-throughput forward and backward passes suitable for edge deployment. Experiments on ImageNet show that LaplacianFormer achieves strong performance-efficiency trade-offs while improving attention expressiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces LaplacianFormer, a Transformer variant for vision tasks that replaces softmax attention with a Laplacian kernel to achieve linear complexity. It motivates the choice via empirical observations and theoretical analysis, introduces a provably injective feature map to retain fine-grained token information under low-rank approximations, adopts Nyström approximation of the kernel matrix solved via Newton-Schulz iteration (with custom CUDA kernels), and reports strong performance-efficiency trade-offs on ImageNet while claiming improved attention expressiveness over Gaussian-kernel baselines.

Significance. If the claimed theoretical properties and experimental gains hold, the work could provide a more principled linear-attention alternative that better preserves mid-range token interactions than existing Gaussian-kernel methods, with potential benefits for high-resolution vision Transformers and edge deployment.

major comments (2)
  1. [Sections describing the feature map, Nyström approximation, and Newton-Schulz solver (likely §3)] The central claim requires that the provably injective feature map retains its properties (and thus mid-range expressiveness) after Nyström low-rank approximation plus Newton-Schulz iteration. The paper introduces the injective map specifically to counteract degradation from low-rank approximations, yet neither step is shown to commute with or preserve the injectivity property in the actual attention output. An explicit check (e.g., distance-dependent attention weight preservation on toy token sets before/after approximation) is needed.
  2. [Abstract and experimental results section] The abstract asserts theoretical analysis, a provable property, and experimental gains on ImageNet, but the provided text supplies no derivations, proofs, quantitative results, baselines, or error bars. Without these, the claims of improved expressiveness and strong trade-offs cannot be verified.
minor comments (1)
  1. Ensure all theoretical claims (injectivity proof, motivation for Laplacian over Gaussian) are accompanied by clear derivations or proof sketches in the main text or appendix, with explicit statements of assumptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing clarifications and committing to targeted revisions that strengthen the presentation of our theoretical and empirical contributions without altering the core claims.

read point-by-point responses
  1. Referee: [Sections describing the feature map, Nyström approximation, and Newton-Schulz solver (likely §3)] The central claim requires that the provably injective feature map retains its properties (and thus mid-range expressiveness) after Nyström low-rank approximation plus Newton-Schulz iteration. The paper introduces the injective map specifically to counteract degradation from low-rank approximations, yet neither step is shown to commute with or preserve the injectivity property in the actual attention output. An explicit check (e.g., distance-dependent attention weight preservation on toy token sets before/after approximation) is needed.

    Authors: We agree that an explicit verification of property preservation under the combined approximations is valuable for rigor. The injectivity proof holds for the exact Laplacian kernel, and our design of the feature map was intended to mitigate low-rank effects, but we did not include a direct before/after comparison on toy data. In the revision, we will add a new subsection (likely in §3.3) with a controlled toy experiment on synthetic token sets that measures distance-dependent attention weight preservation before and after Nyström + Newton-Schulz, confirming that mid-range interactions remain better retained than in Gaussian baselines. revision: yes

  2. Referee: [Abstract and experimental results section] The abstract asserts theoretical analysis, a provable property, and experimental gains on ImageNet, but the provided text supplies no derivations, proofs, quantitative results, baselines, or error bars. Without these, the claims of improved expressiveness and strong trade-offs cannot be verified.

    Authors: The full manuscript (Sections 3 and 4 plus appendix) contains the complete theoretical derivations, injectivity proof, Nyström/Newton-Schulz analysis, ImageNet results with multiple baselines (including Gaussian linear attention variants), quantitative metrics, and error bars from repeated runs. The abstract is intentionally concise; however, we will revise it to more explicitly reference the key theoretical guarantees and performance trade-offs while ensuring the main text highlights the supporting evidence. We will also add a brief summary paragraph at the end of the introduction that cross-references the proofs and tables. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain.

full rationale

The paper introduces a Laplacian kernel as an alternative to softmax, motivated by empirical and theoretical considerations, along with a new provably injective feature map, Nyström approximation, and Newton-Schulz solver. These are presented as novel components rather than re-derivations of prior results. No equations, predictions, or claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the central claims rest on independent theoretical grounding and standard approximation techniques applied to the proposed kernel. The derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the Laplacian kernel is a principled and superior replacement for softmax or Gaussian kernels in attention, plus the existence of a provably injective feature map that preserves information. No explicit free parameters are described in the abstract.

axioms (1)
  • domain assumption Laplacian kernel provides better mid-range token interactions than Gaussian kernels without oversuppression
    Stated as motivated by empirical observations and theoretical analysis in the abstract.
invented entities (1)
  • Provably injective feature map for the Laplacian kernel no independent evidence
    purpose: Retains fine-grained token information under low-rank approximations to avoid expressiveness degradation
    Introduced to solve a stated limitation of low-rank kernel approximations; no independent evidence outside the paper is given.

pith-pipeline@v0.9.0 · 5474 in / 1397 out tokens · 52181 ms · 2026-05-10T00:25:06.879736+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 1 internal anchor

  1. [1]

    The Eleventh International Conference on Learning Representations , year=

    HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer , author=. The Eleventh International Conference on Learning Representations , year=

  2. [2]

    The Twelfth International Conference on Learning Representations , year=

    The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry , author=. The Twelfth International Conference on Learning Representations , year=

  3. [3]

    The Tenth International Conference on Learning Representations , year=

    cosFormer: Rethinking Softmax In Attention , author=. The Tenth International Conference on Learning Representations , year=

  4. [4]

    The Thirteenth International Conference on Learning Representations , year=

    PolaFormer: Polarity-aware Linear Attention for Vision Transformers , author=. The Thirteenth International Conference on Learning Representations , year=

  5. [5]

    The Thirteenth International Conference on Learning Representations , year=

    Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures , author=. The Thirteenth International Conference on Learning Representations , year=

  6. [6]

    2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    Learning Correlation Structures for Vision Transformers , author=. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  7. [7]

    International Conference on Machine Learning , year=

    Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , author=. International Conference on Machine Learning , year=

  8. [8]

    International Conference on Machine Learning , year=

    Linear Complexity Randomized Self-attention Mechanism , author=. International Conference on Machine Learning , year=

  9. [9]

    Smith and Lingpeng Kong , title =

    Hao Peng and Nikolaos Pappas and Dani Yogatama and Roy Schwartz and Noah A. Smith and Lingpeng Kong , title =. 9th International Conference on Learning Representations,

  10. [10]

    Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level , volume =

    Hassani, Ali and Hwu, Wen-mei and Shi, Humphrey , booktitle =. Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level , volume =

  11. [11]

    2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    ScanFormer: Referring Expression Comprehension by Iteratively Scanning , author=. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  12. [12]

    ICLR 2024 Workshop on Reliable and Responsible Foundation Models , year=

    ProTransformer: Robustify Transformers via Plug-and-Play Paradigm , author=. ICLR 2024 Workshop on Reliable and Responsible Foundation Models , year=

  13. [13]

    Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation

    Yu, Hyunwoo and Cho, Yubin and Kang, Beoungwoo and Moon, Seunghun and Kong, Kyeongbo and Kang, Suk-Ju. Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation. Computer Vision -- ECCV 2024. 2025

  14. [14]

    2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=

    RBSFormer: Enhanced Transformer Network for Raw Image Super-Resolution , author=. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , year=

  15. [15]

    International Conference on Algorithmic Learning Theory , year=

    On The Computational Complexity of Self-Attention , author=. International Conference on Algorithmic Learning Theory , year=

  16. [16]

    Neural Information Processing Systems , year=

    Attention is All you Need , author=. Neural Information Processing Systems , year=

  17. [17]

    9th International Conference on Learning Representations,

    Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby , title =. 9th International Conference on Learning Representations,

  18. [18]

    2021 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

    Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , author=. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

  19. [19]

    Computational Visual Media , year=

    PVT v2: Improved baselines with Pyramid Vision Transformer , author=. Computational Visual Media , year=

  20. [20]

    International Conference on Machine Learning , year=

    Training data-efficient image transformers & distillation through attention , author=. International Conference on Machine Learning , year=

  21. [21]

    2021 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

    Going deeper with Image Transformers , author=. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

  22. [22]

    2021 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

    Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , author=. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

  23. [23]

    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    Swin Transformer V2: Scaling Up Capacity and Resolution , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  24. [24]

    European Conference on Computer Vision , year=

    DeiT III: Revenge of the ViT , author=. European Conference on Computer Vision , year=

  25. [25]

    9th International Conference on Learning Representations,

    Xizhou Zhu and Weijie Su and Lewei Lu and Bin Li and Xiaogang Wang and Jifeng Dai , title =. 9th International Conference on Learning Representations,

  26. [26]

    Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel Ni and Heung-Yeung Shum , booktitle=

  27. [27]

    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , author=. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  28. [28]

    Neural Information Processing Systems , year=

    SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , author=. Neural Information Processing Systems , year=

  29. [29]

    Neural Information Processing Systems , year=

    Per-Pixel Classification is Not All You Need for Semantic Segmentation , author=. Neural Information Processing Systems , year=

  30. [30]

    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    Masked-attention Mask Transformer for Universal Image Segmentation , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  31. [31]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Head-free lightweight semantic segmentation with linear transformer , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  32. [32]

    Advances in Neural Information Processing Systems , volume=

    Soft: Softmax-free transformer with linear complexity , author=. Advances in Neural Information Processing Systems , volume=

  33. [33]

    Neural Information Processing Systems , year=

    QT-ViT: Improving Linear Attention in ViT with Quadratic Taylor Expansion , author=. Neural Information Processing Systems , year=

  34. [34]

    Proxyformer: Nystr

    Sangho Lee and Hayun Lee and Dongkun Shin , booktitle=. Proxyformer: Nystr

  35. [35]

    European Conference on Computer Vision , pages=

    Agent attention: On the integration of softmax and linear attention , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  36. [36]

    NeurIPS , year=

    Bridging the Divide: Reconsidering Softmax and Linear Attention , author=. NeurIPS , year=

  37. [37]

    Christopher K. I. Williams and Matthias W. Seeger , booktitle=. Using the Nystr

  38. [38]

    Antoine Chatalic and Nicolas Schreuder and Alessandro Rudi and Lorenzo Rosasco , booktitle=. Nystr

  39. [39]

    1997 , publisher=

    Iterative Methods for Solving Linear Systems , author=. 1997 , publisher=

  40. [40]

    2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    MobileOne: An Improved One millisecond Mobile Backbone , author=. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  41. [41]

    2009 IEEE Conference on Computer Vision and Pattern Recognition , year=

    ImageNet: A large-scale hierarchical image database , author=. 2009 IEEE Conference on Computer Vision and Pattern Recognition , year=

  42. [42]

    International Conference on Learning Representations , year=

    Long Range Arena : A Benchmark for Efficient Transformers , author=. International Conference on Learning Representations , year=

  43. [43]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v...

  44. [44]

    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    MetaFormer is Actually What You Need for Vision , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  45. [45]

    International Conference on Learning Representations , year=

    MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer , author=. International Conference on Learning Representations , year=

  46. [46]

    International Conference on Learning Representations , year=

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

  47. [47]

    ECCV Workshops , year=

    Hydra Attention: Efficient Attention with Many Heads , author=. ECCV Workshops , year=

  48. [48]

    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  49. [49]

    ArXiv , year=

    Linformer: Self-Attention with Linear Complexity , author=. ArXiv , year=

  50. [50]

    Longformer: The Long-Document Transformer

    Longformer: The Long-Document Transformer , author=. arXiv:2004.05150 , year=

  51. [51]

    International Conference on Learning Representations , year=

    Rethinking Attention with Performers , author=. International Conference on Learning Representations , year=

  52. [52]

    Yunyang Xiong and Zhanpeng Zeng and Rudrasis Chakraborty and Mingxing Tan and Glenn Moo Fung and Yin Li and Vikas Singh , journal=. Nystr. 2021 , volume=

  53. [53]

    International Conference on Learning Representations , year=

    Reformer: The Efficient Transformer , author=. International Conference on Learning Representations , year=

  54. [54]

    2025 , booktitle=

    Breaking the Low-Rank Dilemma of Linear Attention , author=. 2025 , booktitle=

  55. [55]

    Automatic differentiation in PyTorch , author=

  56. [56]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , year =

    Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision , year =

  57. [57]

    2024 , issn =

    RoFormer: Enhanced transformer with Rotary Position Embedding , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.neucom.2023.127063 , author =

  58. [58]

    2023 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

    FLatten Transformer: Vision Transformer using Focused Linear Attention , author=. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) , year=

  59. [59]

    International Conference on Machine Learning , year=

    SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization , author=. International Conference on Machine Learning , year=

  60. [60]

    International Journal of Computer Vision , volume =

    Jiachen Lu and Junge Zhang and Xiatian Zhu and Jianfeng Feng and Tao Xiang and Li Zhang , title =. International Journal of Computer Vision , volume =. 2024 , month = aug, doi =

  61. [61]

    2017 IEEE International Conference on Computer Vision (ICCV) , year=

    Mask R-CNN , author=. 2017 IEEE International Conference on Computer Vision (ICCV) , year=

  62. [62]

    2017 IEEE International Conference on Computer Vision (ICCV) , year=

    Focal Loss for Dense Object Detection , author=. 2017 IEEE International Conference on Computer Vision (ICCV) , year=

  63. [63]

    Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr

    Yifan Chen and Qi Zeng and Heng Ji and Yun Yang , booktitle=. Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr

  64. [64]

    ArXiv , year=

    Revisiting Kernel Attention with Correlated Gaussian Process Representation , author=. ArXiv , year=

  65. [65]

    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

    Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition , author=. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

  66. [66]

    NeurIPS , year=

    Demystify Mamba in Vision: A Linear Attention Perspective , author=. NeurIPS , year=

  67. [67]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Vision Transformer with Super Token Sampling , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  68. [68]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Lei Zhu and Xinjiang Wang and Zhanghan Ke and Wayne Zhang and Rynson Lau , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  69. [69]

    International Conference on Learning Representations , year=

    MogaNet: Multi-order Gated Aggregation Network , author=. International Conference on Learning Representations , year=

  70. [70]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Neighborhood Attention Transformer , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2023 , pages =

  71. [71]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    Deep Long-Tailed Learning: A Survey , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=