MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.
hub
Simard, and Paolo Frasconi
11 Pith papers cite this work, alongside 6,592 external citations. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.
BMRUs enable a direct one-to-one mapping from learned parameters to current-mode analog circuit elements, with discrete hysteretic outputs suppressing noise by at least 20x and supporting sub-microwatt RNN inference in 180 nm CMOS simulations.
Neural operators progressively forget domain geometry with depth due to Markovian layers and global mixing; a geometry memory injection mechanism mitigates this forgetting.
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
mGRADE uses learnable-spaced convolutions shown to be equivalent to delay embeddings plus a lightweight gated recurrent component to achieve low-memory multi-timescale sequence modeling.
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.
Sequence models on EHR data from a Swedish heart failure cohort achieve AUPRCs of 0.555 to 0.854 for one-year instability and mortality predictions and support four care pathways.
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
citing papers explorer
-
MinMax Recurrent Neural Cascades
MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.
-
HRM-Text: Efficient Pretraining Beyond Scaling
A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.
-
Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations
BMRUs enable a direct one-to-one mapping from learned parameters to current-mode analog circuit elements, with discrete hysteretic outputs suppressing noise by at least 20x and supporting sub-microwatt RNN inference in 180 nm CMOS simulations.
-
Do Neural Operators Forget Geometry? The Forgetting Hypothesis in Deep Operator Learning
Neural operators progressively forget domain geometry with depth due to Markovian layers and global mixing; a geometry memory injection mechanism mitigates this forgetting.
-
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
-
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
-
mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling
mGRADE uses learnable-spaced convolutions shown to be equivalent to delay embeddings plus a lightweight gated recurrent component to achieve low-memory multi-timescale sequence modeling.
-
The Serial Scaling Hypothesis
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
-
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.
-
Predicting one-year clinical instability and mortality in heart failure patients using sequence modeling
Sequence models on EHR data from a Swedish heart failure cohort achieve AUPRCs of 0.555 to 0.854 for one-year instability and mortality predictions and support four care pathways.
-
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.