A systematic approach maps any-dimensional invariant functions to a unique function on an infinite-dimensional limit space admitting a topology with compact sets where universality holds, with examples of non-universal architectures and fixes.
hub
Long short-term memory.Neural computation, 9(8):1735–1780
22 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.
TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.
A residual-corrected ECM-UDE hybrid model outperforms standalone ECM and LSTM baselines in battery terminal voltage prediction, with the largest gains under temperature and drive-cycle distribution shifts.
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.
VACE learns compact directionally coherent representations for multivariate time series anomaly detection via velocity-consistency training and reports state-of-the-art results on TSB-AD-M.
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
DAD4TS augments small time-series datasets with a diffusion model trained via mathematical geometric projections and guided by reinforcement learning to improve forecasting accuracy.
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
A five-phase co-training framework enables stable JEPA pretraining on EHR trajectories, producing converging latent rollouts and higher multi-task AUROC than baselines on MIMIC-IV ICU data.
A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.
AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.
EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
Recurrent networks built from tunable expressive neurons reveal scaling laws with an optimal parameter split that shifts toward higher per-neuron complexity at larger scales.
Re-evaluation under controlled protocols shows that attention-enhanced PKT models do not consistently outperform standard DKT on the CodeWorkout dataset.
The paper introduces a Common Task Framework for scientific ML, benchmarks it on Kuramoto-Sivashinsky and Lorenz systems, and launches a competition on a global sea surface temperature dataset with holdout data.
SmartIntentV2 uses a pre-trained BERT model on smart contracts to achieve an F1 score of 0.9279 for detecting malicious intents, outperforming previous models and GPT-4.1.
Logistic Regression with TF-IDF achieved 73.5% accuracy and outperformed BiLSTM at 69.17% on a 10k-tweet subset of Sentiment140, with the deep model showing mild overfitting.
BiLSTM achieves 89% accuracy and 0.89 weighted F1 on 20-class emotion detection, marginally outperforming SVM at 88.11% on a 79,595-sentence dataset.
citing papers explorer
-
Any-Dimensional Invariant Universality
A systematic approach maps any-dimensional invariant functions to a unique function on an infinite-dimensional limit space admitting a topology with compact sets where universality holds, with examples of non-universal architectures and fixes.
-
MaxSketch: Robust Distinct Counting in Streams via Random Projections
MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.
-
TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification
TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.
-
Residual-Corrected Equivalent-Circuit Model with Universal Differential Equations for Robust Battery Voltage Prediction under Operating-Condition Shift
A residual-corrected ECM-UDE hybrid model outperforms standalone ECM and LSTM baselines in battery terminal voltage prediction, with the largest gains under temperature and drive-cycle distribution shifts.
-
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
-
FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
-
Cognitive Alpha Mining via LLM-Driven Code-Based Evolution
CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.
-
VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection
VACE learns compact directionally coherent representations for multivariate time series anomaly detection via velocity-consistency training and reports state-of-the-art results on TSB-AD-M.
-
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
-
DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data
DAD4TS augments small time-series datasets with a diffusion model trained via mathematical geometric projections and guided by reinforcement learning to improve forecasting accuracy.
-
CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
-
Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories
A five-phase co-training framework enables stable JEPA pretraining on EHR trajectories, producing converging latent rollouts and higher multi-task AUROC than baselines on MIMIC-IV ICU data.
-
Learning to Test: Physics-Informed Representation for Dynamical Instability Detection
A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.
-
A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset
AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
-
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.
-
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records
EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
-
Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons
Recurrent networks built from tunable expressive neurons reveal scaling laws with an optimal parameter split that shifts toward higher per-neuron complexity at larger scales.
-
Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols
Re-evaluation under controlled protocols shows that attention-enhanced PKT models do not consistently outperform standard DKT on the CodeWorkout dataset.
-
Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms
The paper introduces a Common Task Framework for scientific ML, benchmarks it on Kuramoto-Sivashinsky and Lorenz systems, and launches a competition on a global sea surface temperature dataset with holdout data.
-
Detecting Malicious Intents in Smart Contracts with Pre-trained Programming Language Models
SmartIntentV2 uses a pre-trained BERT model on smart contracts to achieve an F1 score of 0.9279 for detecting malicious intents, outperforming previous models and GPT-4.1.
-
A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset
Logistic Regression with TF-IDF achieved 73.5% accuracy and outperformed BiLSTM at 69.17% on a 10k-tweet subset of Sentiment140, with the deep model showing mild overfitting.
-
Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection
BiLSTM achieves 89% accuracy and 0.89 weighted F1 on 20-class emotion detection, marginally outperforming SVM at 88.11% on a 79,595-sentence dataset.