Recognition: unknown
Caracal: Causal Architecture via Spectral Mixing
Pith reviewed 2026-05-09 19:39 UTC · model grok-4.3
The pith
Caracal replaces attention with a Fourier module to model long sequences efficiently while supporting causal generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Caracal introduces a causal architecture that performs sequence mixing via spectral methods with the Fast Fourier Transform, enforcing causality through frequency-domain masking to achieve efficient and portable long-context modeling.
What carries the argument
Multi-Head Fourier (MHF) module with frequency-domain causal masking via asymmetric padding and truncation.
If this is right
- Sequence modeling becomes feasible at linearithmic cost in sequence length.
- Models can be implemented and deployed using ubiquitous FFT libraries without custom kernels.
- Generative capabilities are preserved for tasks like language modeling.
- Scalability to longer sequences is improved compared to quadratic attention methods.
Where Pith is reading between the lines
- Other sequence tasks could benefit from similar spectral causal designs.
- The simplicity of standard operators may accelerate adoption in production systems.
- Potential for hybrid models combining Fourier mixing with other efficient techniques.
Load-bearing premise
The frequency-domain causal masking technique via asymmetric padding and truncation successfully enforces autoregressive capabilities for generative modeling without loss of expressiveness or performance.
What would settle it
Demonstrating that the causal masking either leaks future information or leads to significantly worse performance than baselines on long sequences would falsify the approach.
Figures
read the original abstract
The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, O(L log(L)) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Caracal, a novel neural architecture for long-sequence modeling that substitutes the attention mechanism with a Multi-Head Fourier (MHF) module based on the Fast Fourier Transform (FFT). This achieves O(L log L) complexity for sequence mixing. The key innovation is a causal masking technique applied in the frequency domain using asymmetric padding and truncation to support autoregressive generation. The architecture is designed to be portable by relying on standard library operators rather than custom hardware-specific implementations. The paper claims that Caracal achieves competitive performance compared to Transformer and State-Space Model (SSM) baselines.
Significance. If the causal masking technique is proven to maintain strict causality without spectral artifacts and the performance claims are substantiated with rigorous experiments, this work could represent a significant advancement in efficient sequence modeling by providing a simple, FFT-based alternative that avoids the quadratic cost of attention and the implementation complexities of SSMs. The emphasis on portability using standard operators is a notable strength for practical deployment.
major comments (2)
- Abstract (contribution 2): The frequency-domain causal masking technique via asymmetric padding and truncation is presented as enforcing autoregressive capabilities. However, because the FFT has global support, it is not immediately clear that this operation results in a strictly causal (lower-triangular) effective transformation in the time domain. A mathematical derivation showing that the resulting time-domain kernel has no dependence on future inputs, or an empirical test (such as checking for zero leakage on future tokens), is necessary to support the claim that this overcomes the barrier for Fourier-based generative models.
- Abstract: The claim that 'Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines' lacks supporting details such as specific metrics, datasets used, baseline models, ablation studies, or statistical significance. This omission makes it challenging to evaluate the strength of the empirical results. The full manuscript should include these in the experiments section with clear tables or figures.
minor comments (2)
- Abstract: Grammatical error: 'we uses standard library operators' should be 'we use standard library operators'.
- Abstract: The abstract mentions 'Code is available in Appendix' but does not provide a link or repository information, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to incorporate the suggested clarifications and additions.
read point-by-point responses
-
Referee: Abstract (contribution 2): The frequency-domain causal masking technique via asymmetric padding and truncation is presented as enforcing autoregressive capabilities. However, because the FFT has global support, it is not immediately clear that this operation results in a strictly causal (lower-triangular) effective transformation in the time domain. A mathematical derivation showing that the resulting time-domain kernel has no dependence on future inputs, or an empirical test (such as checking for zero leakage on future tokens), is necessary to support the claim that this overcomes the barrier for Fourier-based generative models.
Authors: We agree that explicit verification of strict causality is important given the global nature of the FFT. In the revised manuscript, we will add a dedicated subsection with a mathematical derivation showing that the asymmetric padding followed by truncation in the frequency domain produces an effective time-domain kernel that is strictly lower-triangular (i.e., no dependence on future inputs). We will also include an empirical check by constructing the equivalent time-domain transformation matrix and verifying that entries corresponding to future tokens are numerically zero (within floating-point precision). revision: yes
-
Referee: Abstract: The claim that 'Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines' lacks supporting details such as specific metrics, datasets used, baseline models, ablation studies, or statistical significance. This omission makes it challenging to evaluate the strength of the empirical results. The full manuscript should include these in the experiments section with clear tables or figures.
Authors: The Experiments section of the full manuscript already reports results on long-sequence benchmarks (including perplexity and accuracy metrics) against Transformer and SSM baselines such as Mamba, with ablations on the Multi-Head Fourier module and multiple datasets. To directly address the concern, we will expand the abstract to briefly reference the key quantitative findings and add a summary table in the main text that highlights statistical significance (e.g., via standard deviations over multiple runs). Additional ablation figures will also be included if space permits. revision: partial
Circularity Check
No circularity; novel architecture validated empirically
full rationale
The paper introduces Caracal as a new O(L log L) architecture based on Multi-Head Fourier mixing with an asymmetric padding/truncation masking technique for causality. No derivation chain, fitted parameters, or equations are shown that reduce by construction to prior inputs or self-citations. Central claims rest on competitive empirical evaluations against Transformer and SSM baselines using standard library operators, with no load-bearing self-referential steps or uniqueness theorems imported from the authors' prior work. The method is presented as self-contained and portable without mathematical self-definition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Language models enable simple systems for generating structured views of heterogeneous data lakes
Arora, S., Yang, B., Eyuboglu, S., Narayan, A., Hojel, A., Trummer, I., and R \' e , C. Language models enable simple systems for generating structured views of heterogeneous data lakes. Proc. VLDB Endow. , 17 0 (2): 0 92--105, 2023
2023
-
[4]
PIQA : Reasoning about physical commonsense in natural language
Bisk, Y., Zellers, R., Le Bras, R., Gao, J., and Choi, Y. PIQA : Reasoning about physical commonsense in natural language. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 34, pp.\ 7432--7439, 2020
2020
-
[5]
D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp.\ 1877--1901, 2020
1901
-
[6]
BoolQ : Exploring the surprising difficulty of natural yes/no questions
Clark, C., Lee, K., Chang, M.-W., Kwiatkowski, T., Collins, M., and Toutanova, K. BoolQ : Exploring the surprising difficulty of natural yes/no questions. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), volume 1, pp.\ 2924--2936, 2019
2019
-
[8]
and Gu, A
Dao, T. and Gu, A. Transformers are SSMs : Generalized models and efficient algorithms through structured state space duality. In International Conference on Machine Learning (ICML), 2024
2024
-
[11]
Monarch mixer: A simple sub-quadratic GEMM -based architecture
Fu, D., Arora, S., Grogan, J., Johnson, I., Eyuboglu, S., Thomas, A., Spector, B., Poli, M., Rudra, A., and R \' e , C. Monarch mixer: A simple sub-quadratic GEMM -based architecture. In Advances in Neural Information Processing Systems (NeurIPS), 2023 a
2023
-
[12]
Hungry Hungry Hippos : Towards language modeling with state space models
Fu, D., Dao, T., Saab, K., Thomas, A., Rudra, A., and R \' e , C. Hungry Hungry Hippos : Towards language modeling with state space models. In International Conference on Learning Representations (ICLR), 2023 b
2023
-
[13]
Simple hardware-efficient long convolutions for sequence modeling
Fu, D., Epstein, E., Nguyen, E., Thomas, A., Zhang, M., Dao, T., Rudra, A., and R \' e , C. Simple hardware-efficient long convolutions for sequence modeling. In International Conference on Machine Learning (ICML), volume 202, pp.\ 10373--10391, 2023 c
2023
-
[15]
On the parameterization and initialization of diagonal state space models
Gu, A., Goel, K., Gupta, A., and R \'e , C. On the parameterization and initialization of diagonal state space models. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, pp.\ 35971--35983, 2022 a
2022
-
[16]
Efficiently modeling long sequences with structured state spaces
Gu, A., Goel, K., and R \'e , C. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations (ICLR), 2022 b
2022
-
[18]
Fourier Transformer : Fast long range modeling by removing sequence redundancy with FFT operator
He, Z., Yang, M., Feng, M., Yin, J., Wang, X., Leng, J., and Lin, Z. Fourier Transformer : Fast long range modeling by removing sequence redundancy with FFT operator. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), pp.\ 8954--8966, 2023
2023
-
[20]
Reformer : The efficient Transformer
Kitaev, N., Kaiser, L., and Levskaya, A. Reformer : The efficient Transformer . In International Conference on Learning Representations (ICLR), 2020
2020
-
[21]
Crafting papers on machine learning
Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp.\ 1207--1216, Stanford, CA, 2000. Morgan Kaufmann
2000
-
[22]
FNet : Mixing tokens with Fourier transforms
Lee-Thorp, J., Ainslie, J., Eckstein, I., and Ontanon, S. FNet : Mixing tokens with Fourier transforms. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp.\ 4296--4313, 2022
2022
-
[23]
Jamba : Hybrid Transformer-Mamba language models
Lenz, B., Lieber, O., Arazi, A., Bergman, A., Manevich, A., Peleg, B., Aviram, B., Almagor, C., Fridman, C., Padnos, D., and et al. Jamba : Hybrid Transformer-Mamba language models. In International Conference on Learning Representations (ICLR), 2025
2025
-
[24]
Fourier neural operator for parametric partial differential equations
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations (ICLR), 2021
2021
-
[25]
Lockard, C., Shiralkar, P., and Dong, X. L. OpenCeres : When open information extraction meets the semi-structured web. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), volume 1, pp.\ 3047--3056, 2019
2019
-
[27]
N., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., and Fern \' a ndez, R
Paperno, D., Kruszewski, G., Lazaridou, A., Pham, Q. N., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., and Fern \' a ndez, R. The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), volume 1, pp.\ 1525--1534, 2016
2016
-
[28]
A., Von Werra, L., and Wolf, T
Penedo, G., Kydl \'i c ek, H., Lozhkov, A., Mitchell, M., Raffel, C. A., Von Werra, L., and Wolf, T. The FineWeb datasets: Decanting the web for the finest text data at scale. In Advances in Neural Information Processing Systems (NeurIPS), volume 37, pp.\ 30811--30849, 2024
2024
-
[29]
YaRN : Efficient context window extension of large language models
Peng, B., Quesnelle, J., Fan, H., and Shippole, E. YaRN : Efficient context window extension of large language models. In International Conference on Learning Representations (ICLR), 2024
2024
-
[30]
Hyena Hierarchy : Towards larger convolutional language models
Poli, M., Massaroli, S., Nguyen, E., Fu, D., Dao, T., Baccus, S., Bengio, Y., Ermon, S., and R \' e , C. Hyena Hierarchy : Towards larger convolutional language models. In International Conference on Machine Learning (ICML), volume 202, pp.\ 28043--28078, 2023
2023
-
[31]
A., and Lewis, M
Press, O., Smith, N. A., and Lewis, M. Train short, test long: Attention with linear biases enables input length extrapolation. In International Conference on Learning Representations (ICLR), 2022
2022
-
[32]
Language models are unsupervised multitask learners
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language models are unsupervised multitask learners. Technical report, OpenAI, 2019
2019
-
[33]
Global filter networks for image classification
Rao, Y., Zhao, W., Zhu, Z., Lu, J., and Zhou, J. Global filter networks for image classification. In Advances in Neural Information Processing Systems (NeurIPS), pp.\ 980--993, 2021
2021
-
[34]
Efficient content-based sparse attention with routing Transformers
Roy, A., Saffar, M., Vaswani, A., and Grangier, D. Efficient content-based sparse attention with routing Transformers . In Transactions of the Association for Computational Linguistics (TACL), volume 9, pp.\ 53--68, 2021
2021
-
[35]
WinoGrande : An adversarial Winograd schema challenge at scale
Sakaguchi, K., Le Bras, R., Bhagavatula, C., and Choi, Y. WinoGrande : An adversarial Winograd schema challenge at scale. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 34, pp.\ 8732--8740, 2020
2020
-
[36]
SocialIQA : Commonsense reasoning about social interactions
Sap, M., Rashkin, H., Chen, D., Le Bras, R., and Choi, Y. SocialIQA : Commonsense reasoning about social interactions. In Proceedings of Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing, (EMNLP-IJCNLP), pp.\ 4462--4472, 2019
2019
-
[37]
DCT -former: Efficient self-attention with discrete cosine transform
Scribano, C., Franchini, G., Prato, M., and Bertogna, M. DCT -former: Efficient self-attention with discrete cosine transform. Journal of Scientific Computing, 94 0 (3): 0 67, 2023
2023
-
[39]
Roformer : Enhanced Transformer with rotary position embedding
Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., and Liu, Y. Roformer : Enhanced Transformer with rotary position embedding. Neurocomputing, 568: 0 127063, 2024
2024
-
[41]
Learning to (learn at test time): Rnns with expressive hidden states
Sun, Y., Li, X., Dalal, K., Xu, J., Vikram, A., Zhang, G., Dubois, Y., Chen, X., Wang, X., Koyejo, S., Hashimoto, T., and Guestrin, C. Learning to (learn at test time): Rnns with expressive hidden states. In International Conference on Machine Learning (ICML), 2025
2025
-
[43]
N., Kaiser, ., and Polosukhin, I
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pp.\ 5998--6008, 2017
2017
-
[44]
Gated linear attention transformers with hardware-efficient training
Yang, S., Wang, B., Shen, Y., Panda, R., and Kim, Y. Gated linear attention transformers with hardware-efficient training. In International Conference on Machine Learning (ICML), pp.\ 56501--56523, 2024 a
2024
-
[45]
Parallelizing linear Transformers with the delta rule over sequence length
Yang, S., Wang, B., Zhang, Y., Shen, Y., and Kim, Y. Parallelizing linear Transformers with the delta rule over sequence length. In Advances in Neural Information Processing Systems (NeurIPS), 2024 b
2024
-
[46]
Gated Delta Networks : Improving Mamba2 with delta rule
Yang, S., Kautz, J., and Hatamizadeh, A. Gated Delta Networks : Improving Mamba2 with delta rule. In International Conference on Learning Representations (ICLR), 2025
2025
-
[47]
A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., et al
Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., et al. Big Bird : Transformers for longer sequences. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp.\ 17283--17297, 2020
2020
-
[48]
HellaSwag : Can a machine really finish your sentence? In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), volume 1, pp.\ 4791--4800, 2019
Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. HellaSwag : Can a machine really finish your sentence? In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), volume 1, pp.\ 4791--4800, 2019
2019
-
[50]
Long-range sequence modeling with predictable sparse attention
Zhuang, Y., Zhang, J., and Tu, M. Long-range sequence modeling with predictable sparse attention. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), volume 1, pp.\ 234--243, 2022
2022
-
[51]
Langley , title =
P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =
2000
-
[52]
T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980
1980
-
[53]
M. J. Kearns , title =
-
[54]
Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983
1983
-
[55]
R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000
2000
-
[56]
Suppressed for Anonymity , author=
-
[57]
Newell and P
A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981
1981
-
[58]
A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959
1959
-
[59]
and Kaiser,
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is All You Need , booktitle =
-
[60]
GLU Variants Improve Transformer
Shazeer, Noam , title =. arXiv preprint arXiv:2002.05202 , year =
work page internal anchor Pith review arXiv 2002
-
[61]
Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya , title =
-
[62]
and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others , title =
Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D. and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[63]
LLaMA: Open and Efficient Foundation Language Models
Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie. arXiv preprint arXiv:2302.13971 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[64]
, title =
Lenz, Barak and Lieber, Opher and Arazi, Alan and Bergman, Amir and Manevich, Avshalom and Peleg, Barak and Aviram, Ben and Almagor, Chen and Fridman, Clara and Padnos, Dan and et al. , title =. International Conference on Learning Representations (ICLR) , year =
-
[65]
Longformer: The Long-Document Transformer
Beltagy, Iz and Peters, Matthew E. and Cohan, Arman , title =. arXiv preprint arXiv:2004.05150 , year =
work page internal anchor Pith review arXiv 2004
-
[66]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and others , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[67]
International Conference on Learning Representations (ICLR) , year =
Kitaev, Nikita and Kaiser, Lukasz and Levskaya, Anselm , title =. International Conference on Learning Representations (ICLR) , year =
-
[68]
Transactions of the Association for Computational Linguistics (TACL) , volume =
Roy, Aurko and Saffar, Mohammad and Vaswani, Ashish and Grangier, David , title =. Transactions of the Association for Computational Linguistics (TACL) , volume =
-
[69]
Neurocomputing , volume =
Su, Jianlin and Ahmed, Murtadha and Lu, Yu and Pan, Shengfeng and Bo, Wen and Liu, Yunfeng , title =. Neurocomputing , volume =
-
[70]
and Lewis, Mike , title =
Press, Ofir and Smith, Noah A. and Lewis, Mike , title =. International Conference on Learning Representations (ICLR) , year =
-
[71]
International Conference on Learning Representations (ICLR) , year =
Peng, Bowen and Quesnelle, Jeffrey and Fan, Honglu and Shippole, Enrico , title =. International Conference on Learning Representations (ICLR) , year =
-
[72]
Efficiently Modeling Long Sequences with Structured State Spaces , booktitle =
Gu, Albert and Goel, Karan and R. Efficiently Modeling Long Sequences with Structured State Spaces , booktitle =
-
[73]
On the parameterization and initialization of diagonal state space models , booktitle =
Gu, Albert and Goel, Karan and Gupta, Ankit and R. On the parameterization and initialization of diagonal state space models , booktitle =
-
[74]
International Conference on Learning Representations (ICLR) , year =
Fu, Daniel and Dao, Tri and Saab, Khaled and Thomas, Armin and Rudra, Atri and R. International Conference on Learning Representations (ICLR) , year =
-
[75]
International Conference on Machine Learning (ICML) , volume =
Poli, Michael and Massaroli, Stefano and Nguyen, Eric and Fu, Daniel and Dao, Tri and Baccus, Stephen and Bengio, Yoshua and Ermon, Stefano and R. International Conference on Machine Learning (ICML) , volume =
-
[76]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gu, Albert and Dao, Tri , title =. arXiv preprint arXiv:2312.00752 , year =
work page internal anchor Pith review arXiv
-
[77]
International Conference on Machine Learning (ICML) , year =
Dao, Tri and Gu, Albert , title =. International Conference on Machine Learning (ICML) , year =
-
[78]
Retentive Network: A Successor to Transformer for Large Language Models
Sun, Yutao and Dong, Li and Huang, Shaohan and Ma, Shuming and Xia, Yuqing and Xue, Jilong and Wang, Jianyong and Wei, Furu , title =. arXiv preprint arXiv:2307.08621 , year =
work page internal anchor Pith review arXiv
-
[79]
International Conference on Machine Learning (ICML) , pages =
Yang, Songlin and Wang, Bailin and Shen, Yikang and Panda, Rameswar and Kim, Yoon , title =. International Conference on Machine Learning (ICML) , pages =
-
[80]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Yang, Songlin and Wang, Bailin and Zhang, Yu and Shen, Yikang and Kim, Yoon , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[81]
International Conference on Learning Representations (ICLR) , year =
Yang, Songlin and Kautz, Jan and Hatamizadeh, Ali , title =. International Conference on Learning Representations (ICLR) , year =
-
[82]
International Conference on Machine Learning (ICML) , year =
Sun, Yu and Li, Xinhao and Dalal, Karan and Xu, Jiarui and Vikram, Arjun and Zhang, Genghan and Dubois, Yann and Chen, Xinlei and Wang, Xiaolong and Koyejo, Sanmi and Hashimoto, Tatsunori and Guestrin, Carlos , title =. International Conference on Machine Learning (ICML) , year =
-
[83]
Lee-Thorp, James and Ainslie, Joshua and Eckstein, Ilya and Ontanon, Santiago , booktitle =
-
[84]
International Conference on Learning Representations (ICLR) , year =
Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and Liu, Burigede and Bhattacharya, Kaushik and Stuart, Andrew and Anandkumar, Anima , title =. International Conference on Learning Representations (ICLR) , year =
-
[85]
Advances in Neural Information Processing Systems (NeurIPS) , pages =
Rao, Yongming and Zhao, Wenliang and Zhu, Zheng and Lu, Jiwen and Zhou, Jie , title =. Advances in Neural Information Processing Systems (NeurIPS) , pages =
-
[86]
arXiv preprint arXiv:2111.13587 , year=
Guibas, John and Mardani, Morteza and Li, Zongyi and Tao, Andrew and Anandkumar, Anima and Catanzaro, Bryan , title =. arXiv preprint arXiv:2111.13587 , year =
-
[87]
arXiv preprint arXiv:2502.18394 , year=
Fein-Ashley, Jacob and Gupta, Neelesh and Kannan, Rajgopal and Prasanna, Viktor , title =. arXiv preprint arXiv:2502.18394 , year =
-
[88]
Journal of Scientific Computing , volume =
Scribano, Carmelo and Franchini, Giorgia and Prato, Marco and Bertogna, Marko , title =. Journal of Scientific Computing , volume =
-
[89]
Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL) , pages =
He, Ziwei and Yang, Meng and Feng, Minwei and Yin, Jingcheng and Wang, Xinbing and Leng, Jingwen and Lin, Zhouhan , title =. Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL) , pages =
-
[90]
arXiv preprint arXiv:2503.07630 , year =
Kiruluta, Andrew and Lundy, Eric and Lemos, Andreas , title =. arXiv preprint arXiv:2503.07630 , year =
-
[91]
Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL) , volume =
Zhuang, Yimeng and Zhang, Jing and Tu, Mei , title =. Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL) , volume =
-
[92]
Fan: Fourier analysis networks.ArXiv, abs/2410.02675, 2024
Dong, Yihong and Li, Ge and Tao, Yongding and Jiang, Xue and Zhang, Kechi and Li, Jia and Su, Jing and Zhang, Jun and Xu, Jingjing , title =. arXiv preprint arXiv:2410.02675 , year =
-
[93]
arXiv preprint arXiv:2502.18094 , year =
Mian, Shengtian and Wang, Ya and Gu, Nannan and Wang, Yuping and Li, Xiaoqing , title =. arXiv preprint arXiv:2502.18094 , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.