Recognition: 2 theorem links
· Lean TheoremTransformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Pith reviewed 2026-05-11 12:10 UTC · model grok-4.3
The pith
Transformers and state-space models share a common structure through decompositions of semiseparable matrices, allowing a faster Mamba-2 model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Transformers and SSMs are closely related, connected through various decompositions of structured semiseparable matrices. The state space duality framework lets us design Mamba-2, whose core layer refines Mamba's selective SSM to be 2-8X faster while remaining competitive with Transformers on language modeling.
What carries the argument
State space duality (SSD) framework, which equates variants of attention and selective SSMs through structured semiseparable matrix decompositions.
If this is right
- Mamba-2 achieves 2-8X faster inference and training than the prior Mamba selective SSM.
- Mamba-2 maintains competitive performance with Transformers on language modeling tasks.
- The duality supplies efficient algorithms for both SSMs and attention variants.
- New architectures can be built by choosing different decompositions within the same semiseparable matrix family.
Where Pith is reading between the lines
- The shared matrix view suggests hardware kernels written for one architecture can be reused for the other with only a change of decomposition.
- Results from linear algebra on semiseparable matrices, such as fast inversion or low-rank updates, could be ported directly to improve long-sequence scaling in either model family.
- Hybrid layers that switch between attention-style and SSM-style decompositions within a single network become a natural design option rather than an ad-hoc combination.
Load-bearing premise
The decompositions of structured semiseparable matrices preserve the modeling capacity and training dynamics of the original selective SSM.
What would settle it
Running Mamba-2 on standard language modeling benchmarks and finding either no measurable speedup over Mamba or a clear drop in perplexity relative to both Mamba and Transformers would show the decompositions fail to deliver the claimed benefits in practice.
read the original abstract
While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper establishes a theoretical framework called State Space Duality (SSD) that unifies Transformers and state space models (SSMs) like Mamba by showing connections through decompositions of structured semiseparable matrices. It proposes Mamba-2, a new architecture based on a refined selective SSM that achieves significant speedups (2-8X) over previous models while remaining competitive with Transformers on language modeling benchmarks.
Significance. If the SSD framework provides exact equivalences and the proposed algorithms deliver the claimed efficiency gains without sacrificing modeling capacity, this work has the potential to advance the field by offering a unified view of attention and SSMs, leading to more efficient and scalable sequence models. The development of generalized models and efficient algorithms is a strength, particularly if supported by rigorous derivations.
major comments (2)
- [§3] §3 (SSD framework and matrix decompositions): The claim that SSD yields an equivalent selective SSM layer must be shown to hold exactly for input-dependent A/B/C matrices. The manuscript should provide a formal derivation or proof that the structured semiseparable factorization introduces no hidden low-rank or block-diagonal approximations, as any such assumption would risk altering long-range, input-dependent recall dynamics.
- [§5] §5 (Mamba-2 architecture and experiments): To substantiate that modeling capacity is preserved, include direct comparisons of Mamba-2 against the original selective SSM on tasks emphasizing long-context input-dependent memory, with ablations isolating the SSD algorithm from implementation optimizations to confirm the reported 2-8X speedups.
minor comments (2)
- [Abstract] Abstract: The phrasing 'a refinement of Mamba's selective SSM' is vague; specify the precise modifications to the core layer.
- [Notation] Notation and figures: Ensure uniform notation for semiseparable matrices and improve clarity of any matrix decomposition diagrams.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and recognition of the potential impact of the SSD framework and Mamba-2. We address each major comment below and describe the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (SSD framework and matrix decompositions): The claim that SSD yields an equivalent selective SSM layer must be shown to hold exactly for input-dependent A/B/C matrices. The manuscript should provide a formal derivation or proof that the structured semiseparable factorization introduces no hidden low-rank or block-diagonal approximations, as any such assumption would risk altering long-range, input-dependent recall dynamics.
Authors: We thank the referee for this important clarification request. Section 3 derives the SSD framework by expressing the selective SSM recurrence as a structured semiseparable matrix and showing its duality to an attention-like form. The construction incorporates input-dependent A, B, and C directly into the diagonal blocks and low-rank factors of the semiseparable decomposition, preserving the exact recurrence without additional low-rank or block-diagonal approximations. To address the concern rigorously, we will add a formal proof in the appendix of the revised manuscript that verifies the equivalence holds exactly for arbitrary input-dependent parameters, with explicit steps showing that no hidden assumptions are introduced that would alter long-range dynamics. revision: yes
-
Referee: [§5] §5 (Mamba-2 architecture and experiments): To substantiate that modeling capacity is preserved, include direct comparisons of Mamba-2 against the original selective SSM on tasks emphasizing long-context input-dependent memory, with ablations isolating the SSD algorithm from implementation optimizations to confirm the reported 2-8X speedups.
Authors: We agree that targeted experiments would further substantiate the preservation of modeling capacity. The current manuscript shows Mamba-2 remains competitive with Transformers on language modeling while delivering the reported speedups over prior SSMs. In the revision, we will add direct comparisons of Mamba-2 against the original selective SSM (Mamba) on long-context tasks focused on input-dependent memory, such as associative recall and long-range dependency benchmarks. We will also include ablations that isolate the SSD algorithmic improvements from low-level implementation optimizations, thereby confirming that the 2-8X speedups arise from the structured duality rather than engineering alone. revision: yes
Circularity Check
No significant circularity; derivation rests on independent matrix decompositions
full rationale
The paper establishes connections between SSMs and attention variants by decomposing structured semiseparable matrices, then uses this SSD framework to refine the selective SSM into Mamba-2 with claimed speedups. No step equates a prediction to its fitted input, renames a known result as novel unification, or reduces the central claim to a self-citation chain. Prior Mamba work is cited for context on selectivity, but the duality, decompositions, and algorithm derivations are presented as self-contained linear-algebra results with explicit matrix constructions that do not presuppose the target architecture or performance claims. The framework remains falsifiable via direct implementation and benchmarking against the original recurrence.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 57 Pith papers
-
When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models
Content-based routing succeeds only when models provide bidirectional context and perform pairwise comparisons, with bidirectional Mamba plus rank-1 projection reaching 99.7% precision at linear inference cost.
-
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
-
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
-
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
Radar-Modulated Selection perturbs only the step size Δ and readout C parameters inside Mamba's selective scan with radar data while keeping other components image-only, yielding state-of-the-art depth estimation on n...
-
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
-
TIDES: Implicit Time-Awareness in Selective State Space Models
TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and P...
-
FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences
FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.
-
Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control
Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.
-
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token co...
-
Rethink MAE with Linear Time-Invariant Dynamics
Token order in frozen visual representations is exploitable via SSM-based LTI probes, revealing pre-training-dependent heterogeneity that fixed pooling misses.
-
Sparse Prefix Caching for Hybrid and Recurrent LLM Serving
Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.
-
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model
Mamba-2 models fail to learn reversible state retrieval in the UNDO Flip-Flop task, defaulting to a toggle heuristic and achieving only 41% accuracy under adversarial conditions.
-
S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models
S0 tuning optimizes initial recurrent states in hybrid models to outperform LoRA with zero inference cost on HumanEval and partial cross-domain transfer.
-
The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions
Language models have an intrinsic randomness floor: transformers show ~0.30 entropic deviation from uniform on neutral prompts, accounting for 88-93% of observed non-randomness, while state-space models exhibit twice ...
-
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training
Freezing deep layers and training shallow layers during continued pre-training of LLMs outperforms full fine-tuning and the opposite allocation on C-Eval and CMMLU, guided by a new layer-sensitivity diagnostic.
-
MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining
A compact Mamba-2 model performs end-to-end byte-level network traffic classification without tokenization or pre-training and remains competitive with substantially larger pre-trained systems.
-
RT-Transformer: The Transformer Block as a Spherical State Estimator
Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.
-
Structured Recurrent Mixers for Massively Parallelized Sequence Generation
Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, delivering higher efficiency, information capacity, and throughput than other linear-complexity models.
-
Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators
Spectral Koopman operators let SSMs achieve 100% accuracy on long-gap multi-query associative recall with fixed memory, where pure Mamba fails.
-
Cubit: Token Mixer with Kernel Ridge Regression
Cubit replaces Transformer attention with Kernel Ridge Regression token mixing and shows potential gains on longer sequences.
-
The Impossibility Triangle of Long-Context Modeling
No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.
-
Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling
HyLo upcycles Transformer LLMs into hybrids with MLA and Mamba2/Gated DeltaNet blocks via staged training and distillation, extending context to 2M tokens and outperforming prior upcycled hybrids on long-context benchmarks.
-
NAKUL-Med: Spectral-Graph State Space Models with Dynamics Kernels for Medical Signals
NAKUL achieves 91.7% accuracy on motor imagery EEG with 28% fewer parameters than EEG-Conformer by using dynamic kernel generation, spectral context modeling, and graph-guided spatial attention.
-
HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models
HubRouter is a sub-quadratic routing primitive using learned hubs that replaces attention layers in hybrid models while delivering competitive perplexity and large throughput gains.
-
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
-
M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit
M²GRPO uses a Mamba-based policy and normalized group-relative advantages under CTDE to achieve higher pursuit success and capture efficiency than MAPPO and recurrent baselines in simulations and pool tests.
-
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-devi...
-
MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis
MambaBack is a hybrid Mamba-CNN model with Hilbert sampling and chunked inference that reports better performance than seven prior methods on five whole-slide image datasets.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
-
Parcae: Scaling Laws For Stable Looped Language Models
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth...
-
Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
-
Optimal Decay Spectra for Linear Recurrences
PoST reparameterizes decay spectra in linear recurrences with geometric log-spacing and position-adaptive scaling to achieve O(exp(-cN/log t)) decay, improving zero-shot language modeling and long-context retrieval ac...
-
In-Place Test-Time Training
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
-
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
-
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
Stochastic training with random cross-layer KV attention enables depth-wise cache sharing in transformers, cutting memory footprint while preserving or improving performance.
-
Attention to Mamba: A Recipe for Cross-Architecture Distillation
A two-stage distillation recipe converts a Pythia-1B Transformer into a Mamba model that preserves performance with perplexity 14.11 versus the teacher's 13.86.
-
Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World
Automated architectural discovery engines can outperform human design teams by exploring massive design spaces and compressing development cycles from months to weeks.
-
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
-
LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling
LPC-SM is a hybrid architecture separating local attention, persistent memory, predictive correction, and control with ONT for memory writes, showing loss reductions on 158M-parameter models up to 4096-token contexts.
-
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.
-
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training...
-
Titans: Learning to Memorize at Test Time
Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
-
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher thro...
-
Mela: Test-Time Memory Consolidation based on Transformation Hypothesis
Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.
-
Kaczmarz Linear Attention
Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack...
-
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.
-
Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
-
Reasoning Primitives in Hybrid and Non-Hybrid LLMs
Reasoning augmentation extends the difficulty range for both architectures, but hybrid models stay robust longer than transformers as sequential dependence increases in state-based recall tasks.
-
LayerTracer: A Joint Task-Particle and Vulnerable-Layer Analysis framework for Arbitrary Large Language Model Architectures
LayerTracer defines task particles as the first layer where target token probability rises sharply and vulnerable layers via maximum JS divergence after masking, showing task particles in deep layers and greater robus...
-
FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control
FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.
-
Sessa: Selective State Space Attention
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
-
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
HyperSSM integrates hypergraphs and state space models to let correlated objects mutually refine motion estimates, stabilizing trajectories under noise and occlusion for state-of-the-art multi-object tracking.
-
COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels
COREY maps activation entropy to chunk sizes for SSM kernels, matching static-oracle latency at kernel level with 3.9-4.4x speedups over baselines but adding overhead that prevents end-to-end gains while preserving ex...
-
CARE-ECG: Causal Agent-based Reasoning for Explainable and Counterfactual ECG Interpretation
CARE-ECG unifies ECG representation learning, causal graph-based diagnosis, and counterfactual assessment in an agentic LLM pipeline to improve accuracy and explanation faithfulness.
-
Efficient Spatial-Temporal Focal Adapter with SSM for Temporal Action Detection
A new adapter module combining boundary-aware state space modeling with spatial processing boosts localization and robustness in temporal action detection.
-
Hierarchical Reasoning Model
HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples ...
-
The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency
Mamba-3 architectural changes optimized for hyperscale GPUs cause 28% higher edge latency at 880M parameters and 48% at 15M parameters compared to earlier versions.
Reference graph
Works this paper leans on
-
[1]
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. “GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints”. In:arXiv preprint arXiv:2305.13245 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Yaroslav Aksenov, Nikita Balagansky, Sofia Maria Lo Cicero Vaina, Boris Shaposhnikov, Alexey Gorbatovski, and Daniil Gavrilov. “Linear Transformers with Learnable Kernel Functions are Better In-Context Models”. In: arXiv preprint arXiv:2402.10644 (2024)
-
[3]
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek, Bailin Wang, Yoon Kim, and Jacob Andreas. “In-Context Language Learning: Architectures and Algorithms”. In: The International Conference on Machine Learning (ICML) . 2024
work page 2024
-
[4]
The hidden attention of mamba models
Ameen Ali, Itamar Zimerman, and Lior Wolf. The Hidden Attention of Mamba Models . 2024. arXiv: 2403.01590 [cs.LG]
-
[5]
Zoology: Measuring and Improving Recall in Efficient Language Models
Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, and Christo- pher Ré. “Zoology: Measuring and Improving Recall in Efficient Language Models”. In:The International Conference on Learning Representations (ICLR) . 2024
work page 2024
-
[6]
Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff
Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, and Christopher Ré. “Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff”. In: The International Conference on Machine Learning (ICML) . 2024
work page 2024
-
[7]
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. In: The International Conference on Learning Representations (ICLR) . 2015. 36
work page 2015
-
[8]
Pade Approximants: Encyclopedia of Mathematics and It’s Applications, Vol
George A Baker, George A Baker Jr, Peter Graves-Morris, and Susan S Baker. Pade Approximants: Encyclopedia of Mathematics and It’s Applications, Vol. 59 George A. Baker, Jr., Peter Graves-Morris . Vol. 59. Cambridge University Press, 1996
work page 1996
-
[9]
xlstm: Ex- tended long short-term memory
Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Gün- ter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. “xLSTM: Extended Long Short-Term Memory”. In: arXiv preprint arXiv:2405.04517 (2024)
-
[10]
Pythia: A Suite for Analyzing Large Language Models across Training and Scaling
Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mo- hammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. “Pythia: A Suite for Analyzing Large Language Models across Training and Scaling”. In:The International Conference on Machine Learning (ICML). PMLR. 2023, pp. 2397–2430
work page 2023
-
[11]
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. “PIQA: Reasoning about Physical Commonsense in Natural Language”. In: Proceedings of the AAAI conference on Artificial Intelligence . Vol. 34. 2020
work page 2020
-
[12]
Gpt-neox-20b: An open-source autoregressive language model
Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, et al. “Gpt-NeoX-20B: An Open-source Autoregressive Language Model”. In: arXiv preprint arXiv:2204.06745 (2022)
-
[13]
Prefix Sums and Their Applications
Guy E Blelloch. “Prefix Sums and Their Applications”. In: (1990)
work page 1990
-
[14]
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, et al. “RecurrentGemma: Moving Past Transform- ers for Efficient Open Language Models”. In:arXiv preprint arXiv:2404.07839 (2024)
-
[15]
Time Series Analysis: Forecasting and Control
George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time Series Analysis: Forecasting and Control. John Wiley & Sons, 2015
work page 2015
-
[16]
Quasi-Recurrent Neural Networks
James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. “Quasi-recurrent Neural Networks”. In: arXiv preprint arXiv:1611.01576 (2016)
work page Pith review arXiv 2016
-
[17]
Striped attention: Faster ring attention for causal transformers
William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, and Jonathan Ragan- Kelley. “Striped attention: Faster ring attention for causal transformers”. In:arXiv preprint arXiv:2311.09431 (2023)
-
[18]
Language Models are Few-shot Learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. “Language Models are Few-shot Learners”. In: Advances in Neural Information Processing Systems (NeurIPS) 33 (2020), pp. 1877–1901
work page 2020
-
[19]
Rethinking Attention with Performers
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. “Rethinking Attention with Performers”. In: The International Conference on Learning Representations (ICLR) . 2021
work page 2021
-
[20]
PaLM: Scaling Language Modeling with Path- ways
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. “PaLM: Scaling Language Modeling with Path- ways”. In:Journal of Machine Learning Research24.240 (2023), pp. 1–113.url: http://jmlr.org/papers/v24/22- 1144.html
work page 2023
-
[21]
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. “Empirical Evaluation of Gated Recur- rent Neural Networks on Sequence Modeling”. In: arXiv preprint arXiv:1412.3555 (2014)
work page internal anchor Pith review arXiv 2014
-
[22]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. “Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge”. In:arXiv preprint arXiv:1803.05457 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[23]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao. “FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning”. In: The International Conference on Learning Representations (ICLR) . 2024
work page 2024
-
[24]
Monarch: Expressive structured matrices for efficient and accurate training
Tri Dao, Beidi Chen, Nimit S Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, and Christopher Ré. “Monarch: Expressive structured matrices for efficient and accurate training”. In: International Conference on Machine Learning . PMLR. 2022, pp. 4690–4721
work page 2022
-
[25]
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Tri Dao, Daniel Y Fu, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher Ré. “Hungry Hungry Hippos: Towards Language Modeling with State Space Models”. In:The International Conference on Learning Representations (ICLR). 2023
work page 2023
-
[26]
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, and Christopher Ré. “Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations”. In:The International Conference on Machine Learning (ICML) . 2019
work page 2019
-
[27]
Kaleidoscope: An Efficient, Learnable Representation for All Structured Linear Maps
Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christo- pher Ré. “Kaleidoscope: An Efficient, Learnable Representation for All Structured Linear Maps”. In: The Interna- tional Conference on Learning Representations (ICLR) . 2020. 37
work page 2020
-
[28]
Vision Transformers Need Registers
Timothée Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. “Vision Transformers Need Registers”. In: The International Conference on Learning Representations (ICLR) . 2024
work page 2024
-
[29]
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De, Samuel L Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, et al. “Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models”. In:arXiv preprint arXiv:2402.19427 (2024)
work page internal anchor Pith review arXiv 2024
-
[30]
A Two-Pronged Progress in Structured Dense Matrix Vector Multiplication
Christopher De Sa, Albert Gu, Rohan Puttagunta, Christopher Ré, and Atri Rudra. “A Two-Pronged Progress in Structured Dense Matrix Vector Multiplication”. In:Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM. 2018, pp. 1060–1079
work page 2018
-
[31]
arXiv preprint arXiv:2404.10830 , year=
Hantian Ding, Zijian Wang, Giovanni Paolini, Varun Kumar, Anoop Deoras, Dan Roth, and Stefano Soatto. “Fewer truncations improve language modeling”. In: arXiv preprint arXiv:2404.10830 (2024)
-
[32]
On a new class of structured matrices
Yuli Eidelman and Israel Gohberg. “On a new class of structured matrices”. In: Integral Equations and Operator Theory 34.3 (1999), pp. 293–324
work page 1999
-
[33]
Monarch mixer: A simple sub-quadratic gemm-based architecture
Dan Fu, Simran Arora, Jessica Grogan, Isys Johnson, Evan Sabri Eyuboglu, Armin Thomas, Benjamin Spector, Michael Poli, Atri Rudra, and Christopher Ré. “Monarch mixer: A simple sub-quadratic gemm-based architecture”. In: Advances in Neural Information Processing Systems 36 (2024)
work page 2024
-
[34]
Simple Hardware-efficient Long Convolutions for Sequence Modeling
Daniel Y Fu, Elliot L Epstein, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré. “Simple Hardware-efficient Long Convolutions for Sequence Modeling”. In: The International Conference on Machine Learning (ICML) (2023)
work page 2023
-
[35]
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”. In: arXiv preprint arXiv:2101.00027 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[36]
doi:10.5281/zenodo.5371628 , url =
Leo Gao, Jonathan Tow, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Kyle McDonell, Niklas Muennighoff, Jason Phang, Laria Reynolds, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A Framework for Few-shot Language Model Evaluation . Version v0.0.1. Sept. 2021. doi: 10. 5281/zenodo.5371628. url: htt...
-
[37]
Zamba: A compact 7b SSM.arXiv preprint arXiv:2405.16712,
Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, and Beren Millidge. “Zamba: A Compact 7B SSM Hybrid Model”. In:arXiv preprint arXiv:2405.16712 (2024)
-
[38]
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, and Frank Hutter. “Is Mamba Capable of In-Context Learning?” In: arXiv preprint arXiv:2402.03170 (2024)
-
[39]
Modeling Sequences with Structured State Spaces
Albert Gu. “Modeling Sequences with Structured State Spaces”. PhD thesis. Stanford University, 2023
work page 2023
-
[40]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces”. In: arXiv preprint arXiv:2312.00752 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[41]
HIPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. “HIPPO: Recurrent Memory with Optimal Polynomial Projections”. In: Advances in Neural Information Processing Systems (NeurIPS) . 2020
work page 2020
-
[42]
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu, Karan Goel, and Christopher Ré. “Efficiently Modeling Long Sequences with Structured State Spaces”. In: The International Conference on Learning Representations (ICLR) . 2022
work page 2022
-
[43]
On the Parameterization and Initialization of Diagonal State Space Models
Albert Gu, Ankit Gupta, Karan Goel, and Christopher Ré. “On the Parameterization and Initialization of Diagonal State Space Models”. In: Advances in Neural Information Processing Systems (NeurIPS) . 2022
work page 2022
-
[44]
Combining Recurrent, Convolutional, and Continuous-time Models with the Linear State Space Layer
Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. “Combining Recurrent, Convolutional, and Continuous-time Models with the Linear State Space Layer”. In:Advances in Neural Information Processing Systems (NeurIPS). 2021
work page 2021
-
[45]
How to Train Your HIPPO: State Space Models with Generalized Basis Projections
Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, and Christopher Ré. “How to Train Your HIPPO: State Space Models with Generalized Basis Projections”. In: The International Conference on Learning Representations (ICLR) . 2023
work page 2023
-
[46]
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta, Albert Gu, and Jonathan Berant. “Diagonal State Spaces are as Effective as Structured State Spaces”. In: Advances in Neural Information Processing Systems 35 (2022), pp. 22982–22994
work page 2022
-
[47]
Gaussian Error Linear Units (GELUs)
Dan Hendrycks and Kevin Gimpel. “Gaussian Error Linear Units (GELUs)”. In: arXiv preprint arXiv:1606.08415 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[48]
W Daniel Hillis and Guy L Steele Jr. “Data Parallel Algorithms”. In: Communications of the ACM 29.12 (1986), pp. 1170–1183
work page 1986
-
[49]
Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In:Neural Computation 9.8 (1997), pp. 1735– 1780
work page 1997
-
[50]
An Empirical Analysis of Compute- 38 Optimal Large Language Model Training
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. “An Empirical Analysis of Compute- 38 Optimal Large Language Model Training”. In: Advances in Neural Information Processing Systems (NeurIPS) 35 (2022), pp. 30016–30030
work page 2022
-
[51]
Repeat After Me: Transformers Are Better Than State Space Models at Copying
Samy Jelassi, David Brandfonbrener, Sham M Kakade, and Eran Malach. “Repeat After Me: Transformers Are Better Than State Space Models at Copying”. In: The International Conference on Machine Learning (ICML) . 2024
work page 2024
-
[52]
Transformers are RNNs: Fast Au- toregressive Transformers with Linear Attention
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. “Transformers are RNNs: Fast Au- toregressive Transformers with Linear Attention”. In:International Conference on Machine Learning . PMLR. 2020, pp. 5156–5165
work page 2020
-
[53]
Gateloop: Fully data-controlled linear recurrence for sequence modeling
Tobias Katsch. “GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling”. In: arXiv preprint arXiv:2311.01927 (2023)
-
[54]
Linear Dynamical Systems as a Core Computational Primitive
Shiva Kaul. “Linear Dynamical Systems as a Core Computational Primitive”. In: Advances in Neural Information Processing Systems 33 (2020), pp. 16808–16820
work page 2020
-
[55]
Reducing activation recomputation in large transformer models
Vijay Anand Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, and Bryan Catanzaro. “Reducing activation recomputation in large transformer models”. In:Proceedings of Machine Learning and Systems 5 (2023)
work page 2023
-
[56]
Fnet: Mixing tokens with fourier transforms.arXiv preprint arXiv:2105.03824,
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. “Fnet: Mixing tokens with fourier trans- forms”. In: arXiv preprint arXiv:2105.03824 (2021)
-
[57]
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei. “When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute”. In: Pro- ceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . 2021, pp. 7633–7648
work page 2021
-
[58]
Simple Recurrent Units for Highly Parallelizable Recur- rence
Tao Lei, Yu Zhang, Sida I Wang, Hui Dai, and Yoav Artzi. “Simple Recurrent Units for Highly Parallelizable Recur- rence”. In:arXiv preprint arXiv:1709.02755 (2017)
-
[59]
What Makes Convolutional Models Great on Long Sequence Modeling?
Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, and Debadeepta Dey. “What Makes Convolutional Models Great on Long Sequence Modeling?” In: The International Conference on Learning Representations (ICLR) . 2023
work page 2023
-
[60]
Jamba: A Hybrid Transformer-Mamba Language Model
Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, et al. “Jamba: A Hybrid Transformer-Mamba Language Model”. In:arXiv preprint arXiv:2403.19887 (2024)
work page internal anchor Pith review arXiv 2024
-
[61]
World model on million-length video and language with blockwise ringattention
Hao Liu, Wilson Yan, Matei Zaharia, and Pieter Abbeel. “World Model on Million-Length Video And Language With RingAttention”. In: arXiv preprint arXiv:2402.08268 (2024)
-
[62]
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu, Matei Zaharia, and Pieter Abbeel. “Ring attention with blockwise transformers for near-infinite context”. In: arXiv preprint arXiv:2310.01889 (2023)
work page internal anchor Pith review arXiv 2023
-
[63]
Structured State Space Models for In-Context Reinforcement Learning
Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, and Feryal Behbahani. “Structured State Space Models for In-Context Reinforcement Learning”. In: Advances in Neural Information Pro- cessing Systems (NeurIPS). 2023
work page 2023
-
[64]
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. “Mega: Moving Average Equipped Gated Attention”. In: The International Conference on Learning Representations (ICLR). 2023
work page 2023
-
[65]
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin and Chris Cundy. “Parallelizing Linear Recurrent Neural Nets Over Sequence Length”. In: The Inter- national Conference on Learning Representations (ICLR) . 2018
work page 2018
-
[66]
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. “Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering”. In: arXiv preprint arXiv:1809.02789 (2018)
work page internal anchor Pith review arXiv 2018
-
[67]
In-context Learning and Induction Heads
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...
work page 2022
-
[68]
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, and Soham De. “Resurrecting Recurrent Neural Networks for Long Sequences”. In: The International Conference on Machine Learning (ICML). 2023
work page 2023
-
[69]
The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context
Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Ngoc-Quan Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fernández. “The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context”. In:Proceedings of the 54th Annual Meeting of the Association for Computational Linguis- tics. 2016, pp. 1525–1534
work page 2016
-
[70]
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, and Dimitris Papailiopoulos. “Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks”. In: The International Conference on Machine Learning (ICML) . 2024. 39
work page 2024
-
[71]
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, et al. “RWKV: Reinventing RNNs for the Transformer Era”. In: arXiv preprint arXiv:2305.13048 (2023)
work page internal anchor Pith review arXiv 2023
-
[72]
Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence
Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, et al. “Eagle and Finch: RWKV with matrix-valued states and dy- namic recurrence”. In:arXiv preprint arXiv:2404.05892 (2024)
-
[73]
Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A Smith, and Lingpeng Kong. “Random Feature Attention”. In: The International Conference on Learning Representations (ICLR) . 2021
work page 2021
-
[74]
Computing with Quasiseparable Matrices
Clément Pernet. “Computing with Quasiseparable Matrices”. In: Proceedings of the ACM on International Sympo- sium on Symbolic and Algebraic Computation . 2016, pp. 389–396
work page 2016
-
[75]
Exact computations with quasiseparable matrices
Clément Pernet, Hippolyte Signargout, and Gilles Villard. “Exact computations with quasiseparable matrices”. In: arXiv preprint arXiv:2302.04515 (2023)
-
[76]
Time and space efficient generators for quasiseparable matrices
Clément Pernet and Arne Storjohann. “Time and space efficient generators for quasiseparable matrices”. In:Journal of Symbolic Computation 85 (2018), pp. 224–246
work page 2018
-
[77]
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Ré. “Hyena Hierarchy: Towards Larger Convolutional Language Models”. In: The International Conference on Machine Learning (ICML) . 2023
work page 2023
-
[78]
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, and Oncel Tuzel. “Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum”. In: arXiv preprint arXiv:2405.13226 (2024)
-
[79]
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press, Noah Smith, and Mike Lewis. “Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation”. In: International Conference on Learning Representations . 2022
work page 2022
-
[80]
Toeplitz Neural Network for Sequence Modeling
Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, and Yiran Zhong. “Toeplitz Neural Network for Sequence Modeling”. In:The International Conference on Learning Represen- tations (ICLR). 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.