hub

Long short-term memory.Neural computation, 9(8):1735–1780

Sepp Hochreiter, Jürgen Schmidhuber · 1997

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

browse 22 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Any-Dimensional Invariant Universality

cs.LG · 2026-05-22 · unverdicted · novelty 8.0

A systematic approach maps any-dimensional invariant functions to a unique function on an infinite-dimensional limit space admitting a topology with compact sets where universality holds, with examples of non-universal architectures and fixes.

MaxSketch: Robust Distinct Counting in Streams via Random Projections

stat.ML · 2026-05-15 · unverdicted · novelty 7.0

MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.

Residual-Corrected Equivalent-Circuit Model with Universal Differential Equations for Robust Battery Voltage Prediction under Operating-Condition Shift

eess.SY · 2026-05-07 · unverdicted · novelty 7.0

A residual-corrected ECM-UDE hybrid model outperforms standalone ECM and LSTM baselines in battery terminal voltage prediction, with the largest gains under temperature and drive-cycle distribution shifts.

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.

FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

cs.CL · 2025-11-24 · unverdicted · novelty 7.0

CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.

VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

VACE learns compact directionally coherent representations for multivariate time series anomaly detection via velocity-consistency training and reports state-of-the-art results on TSB-AD-M.

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

cs.CL · 2026-05-19 · conditional · novelty 6.0

DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.

DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

DAD4TS augments small time-series datasets with a diffusion model trained via mathematical geometric projections and guided by reinforcement learning to improve forecasting accuracy.

CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

A five-phase co-training framework enables stable JEPA pretraining on EHR trajectories, producing converging latent rollouts and higher multi-task AUROC than baselines on MIMIC-IV ICU data.

Learning to Test: Physics-Informed Representation for Dynamical Instability Detection

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

cs.LG · 2026-03-27 · accept · novelty 6.0

AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.

EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

cs.IR · 2026-05-12 · unverdicted · novelty 5.0

EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.

Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Recurrent networks built from tunable expressive neurons reveal scaling laws with an optimal parameter split that shifts toward higher per-neuron complexity at larger scales.

Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols

cs.LG · 2026-05-06 · conditional · novelty 5.0

Re-evaluation under controlled protocols shows that attention-enhanced PKT models do not consistently outperform standard DKT on the CodeWorkout dataset.

Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms

cs.CE · 2025-10-27 · conditional · novelty 5.0

The paper introduces a Common Task Framework for scientific ML, benchmarks it on Kuramoto-Sivashinsky and Lorenz systems, and launches a competition on a global sea surface temperature dataset with holdout data.

Detecting Malicious Intents in Smart Contracts with Pre-trained Programming Language Models

cs.SE · 2025-08-27 · unverdicted · novelty 5.0

SmartIntentV2 uses a pre-trained BERT model on smart contracts to achieve an F1 score of 0.9279 for detecting malicious intents, outperforming previous models and GPT-4.1.

A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset

cs.CL · 2026-05-06 · unverdicted · novelty 2.0

Logistic Regression with TF-IDF achieved 73.5% accuracy and outperformed BiLSTM at 69.17% on a 10k-tweet subset of Sentiment140, with the deep model showing mild overfitting.

Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection

cs.CL · 2026-04-29 · unverdicted · novelty 2.0

BiLSTM achieves 89% accuracy and 0.89 weighted F1 on 20-class emotion detection, marginally outperforming SVM at 88.11% on a 79,595-sentence dataset.

citing papers explorer

Showing 22 of 22 citing papers.

Any-Dimensional Invariant Universality cs.LG · 2026-05-22 · unverdicted · none · ref 10
A systematic approach maps any-dimensional invariant functions to a unique function on an infinite-dimensional limit space admitting a topology with compact sets where universality holds, with examples of non-universal architectures and fixes.
MaxSketch: Robust Distinct Counting in Streams via Random Projections stat.ML · 2026-05-15 · unverdicted · none · ref 14
MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.
TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification cs.LG · 2026-05-09 · unverdicted · none · ref 14
TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.
Residual-Corrected Equivalent-Circuit Model with Universal Differential Equations for Robust Battery Voltage Prediction under Operating-Condition Shift eess.SY · 2026-05-07 · unverdicted · none · ref 37
A residual-corrected ECM-UDE hybrid model outperforms standalone ECM and LSTM baselines in battery terminal voltage prediction, with the largest gains under temperature and drive-cycle distribution shifts.
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences cs.LG · 2026-05-06 · unverdicted · none · ref 25
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning cs.LG · 2026-05-06 · unverdicted · none · ref 7
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
Cognitive Alpha Mining via LLM-Driven Code-Based Evolution cs.CL · 2025-11-24 · unverdicted · none · ref 39
CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.
VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection cs.LG · 2026-05-22 · unverdicted · none · ref 18
VACE learns compact directionally coherent representations for multivariate time series anomaly detection via velocity-consistency training and reports state-of-the-art results on TSB-AD-M.
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models cs.CL · 2026-05-19 · conditional · none · ref 21
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data cs.LG · 2026-05-18 · unverdicted · none · ref 41
DAD4TS augments small time-series datasets with a diffusion model trained via mathematical geometric projections and guided by reinforcement learning to improve forecasting accuracy.
CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models cs.LG · 2026-05-15 · unverdicted · none · ref 36
CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.
Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories cs.LG · 2026-05-11 · unverdicted · none · ref 30 · 2 links
A five-phase co-training framework enables stable JEPA pretraining on EHR trajectories, producing converging latent rollouts and higher multi-task AUROC than baselines on MIMIC-IV ICU data.
Learning to Test: Physics-Informed Representation for Dynamical Instability Detection cs.LG · 2026-04-13 · unverdicted · none · ref 27
A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.
A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset cs.LG · 2026-03-27 · accept · none · ref 15
AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent cs.CL · 2025-07-03 · unverdicted · none · ref 29
MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records cs.IR · 2026-05-12 · unverdicted · none · ref 22
EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons cs.LG · 2026-05-12 · unverdicted · none · ref 21
Recurrent networks built from tunable expressive neurons reveal scaling laws with an optimal parameter split that shifts toward higher per-neuron complexity at larger scales.
Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols cs.LG · 2026-05-06 · conditional · none · ref 5
Re-evaluation under controlled protocols shows that attention-enhanced PKT models do not consistently outperform standard DKT on the CodeWorkout dataset.
Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms cs.CE · 2025-10-27 · conditional · none · ref 30
The paper introduces a Common Task Framework for scientific ML, benchmarks it on Kuramoto-Sivashinsky and Lorenz systems, and launches a competition on a global sea surface temperature dataset with holdout data.
Detecting Malicious Intents in Smart Contracts with Pre-trained Programming Language Models cs.SE · 2025-08-27 · unverdicted · none · ref 16
SmartIntentV2 uses a pre-trained BERT model on smart contracts to achieve an F1 score of 0.9279 for detecting malicious intents, outperforming previous models and GPT-4.1.
A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset cs.CL · 2026-05-06 · unverdicted · none · ref 6
Logistic Regression with TF-IDF achieved 73.5% accuracy and outperformed BiLSTM at 69.17% on a 10k-tweet subset of Sentiment140, with the deep model showing mild overfitting.
Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection cs.CL · 2026-04-29 · unverdicted · none · ref 4
BiLSTM achieves 89% accuracy and 0.89 weighted F1 on 20-class emotion detection, marginally outperforming SVM at 88.11% on a 79,595-sentence dataset.

Long short-term memory.Neural computation, 9(8):1735–1780

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer