super hub Canonical reference

Long short -term memory

Jürgen Schmidhuber, Sepp Hochreiter · 1997 · Neural Computation · DOI 10.1162/neco.1997.9.8.1735 · arXiv gov/9377276

Canonical reference. 74% of citing Pith papers cite this work as background.

126 Pith papers citing it

80.8k external citations · Crossref

Background 74% of classified citations

open at publisher browse 126 citing papers more from Jürgen Schmidhuber arXiv PDF

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 15 baseline 2 method 2

citation-polarity summary

background 14 baseline 2 use method 2 support 1

authors

Jürgen Schmidhuber Sepp Hochreiter

co-cited works

representative citing papers

HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations

cs.LG · 2026-05-10 · conditional · novelty 8.0 · 2 refs

HS-FNO lifts the state to include history and decomposes updates into a learned future-slice predictor plus an exact shift-append transport, yielding lower rollout errors than standard or lag-stack FNO baselines on five non-Markovian PDE families.

RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting

cs.LG · 2026-06-23 · unverdicted · novelty 7.0

RAVEN proposes a regime-aware MoE architecture with cumulative importance thresholding and correlation-aware weighting to adaptively select temporal context for non-stationary financial forecasting.

ConTex: Reformulating Counterfactual Generation For Time Series Forecasting

cs.LG · 2026-06-16 · unverdicted · novelty 7.0

ConTex learns a global intervention strategy via a decomposed temporal-conditional encoder architecture to generate consistent, sparse counterfactuals for time series models in a single forward pass.

Causally Evaluating the Learnability of Formal Language Tasks

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Introduces the binning semiring and causal graphical models to show that correlational evaluation of learnability in formal language tasks leads to incorrect conclusions from confounders.

RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

RESCAST-100K is a large-scale benchmark dataset of simulated and real residential energy data for cross-domain load and temperature forecasting.

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

cs.LG · 2026-05-29 · conditional · novelty 7.0

Repetition rate mismatch between small-scale proxies and target budgets is the main reason data mixture experiments do not scale; a subsampling procedure that equalizes repetition rates recovers optimal mixtures from 1/16-scale experiments.

Faithful Embeddings of Irregular and Asynchronous Data for Online Log-NCDEs

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Introduces a continuous injective embedding for Log-NCDEs that builds log-signatures from data increments without interpolation or imputation while preserving compact-set universality.

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.

BadmintonGRF: A Multimodal Dataset and Benchmark for Markerless Ground Reaction Force Estimation in Badminton

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

BadmintonGRF is a new public multimodal dataset and benchmark that pairs multi-view video with instrumented GRF for markerless load estimation in badminton.

Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

cs.LG · 2026-05-01 · conditional · novelty 7.0

Adding temporal memory via LIF, precision-weighted gating, and anticipatory prediction to MoE routers recovers effective expert selection at distribution transitions, with ablation confirming a super-additive beta-ant interaction.

AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

AsmRAG detects malware at 96% F1 and attributes families at 95% F1 by retrieving functionally similar assembly code via LLM embeddings and density-weighted anchor selection, remaining robust to metamorphic obfuscation.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection

cs.CR · 2026-04-13 · unverdicted · novelty 7.0

BRIDGE creates the first formal heterogeneous multi-dataset benchmark for IoT botnet detection with LODO evaluation, and TCH-Net achieves mean LODO F1 of 0.5577 while reaching F1 0.8296 on standard tests, outperforming twelve baselines.

FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment

cs.AI · 2026-03-17 · unverdicted · novelty 7.0

FactorEngine mines alpha factors as Turing-complete code via LLM-guided directional search, parameter separation, and a multi-agent pipeline that converts financial reports into executable programs, delivering higher IC/ICIR and Sharpe ratios than baselines in backtests.

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

cs.CE · 2026-02-05 · unverdicted · novelty 7.0

Koopman autoencoders with forcings and temporal unrolling deliver accurate year-long predictions for coastal-ocean models at 300-1400x speedup, outperforming POD in two of three cases.

Temporal Graph Networks for Deep Learning on Dynamic Graphs

cs.LG · 2020-06-18 · unverdicted · novelty 7.0

Temporal Graph Networks combine memory modules and graph operators to learn on dynamic graphs as timed event sequences, outperforming prior methods on transductive and inductive tasks while unifying earlier models as special cases.

Language Models as Knowledge Bases?

cs.CL · 2019-09-03 · accept · novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.

Quantization Inflates Reasoning: Token Inflation as a Hidden Cost of Low-Bit Reasoning Models

cs.AI · 2026-06-24 · unverdicted · novelty 6.0

Low-bit post-training quantization of reasoning LLMs increases reasoning token counts while preserving accuracy, introducing a hidden test-time compute cost.

Prediction of Viscoelastic Droplet Impact Dynamics Using a Vision Transformer-Based Approach

physics.flu-dyn · 2026-06-22 · unverdicted · novelty 6.0

ViViT model predicts full viscoelastic droplet impact dynamics from initial 10-20% of VOF simulation data, reducing cost by 80-90% while capturing spreading and bouncing regimes.

Topological Out-of-Domain Generalization in Dynamical Systems Reconstruction

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Proposes feature splitting and a closed-form bound on extrapolation range to enable zero-shot topological out-of-domain generalization in dynamical systems reconstruction across tipping points.

citing papers explorer

Showing 50 of 92 citing papers after filters.

HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations cs.LG · 2026-05-10 · conditional · none · ref 45 · 2 links
HS-FNO lifts the state to include history and decomposes updates into a learned future-slice predictor plus an exact shift-append transport, yielding lower rollout errors than standard or lag-stack FNO baselines on five non-Markovian PDE families.
RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting cs.LG · 2026-06-23 · unverdicted · none · ref 16
RAVEN proposes a regime-aware MoE architecture with cumulative importance thresholding and correlation-aware weighting to adaptively select temporal context for non-stationary financial forecasting.
ConTex: Reformulating Counterfactual Generation For Time Series Forecasting cs.LG · 2026-06-16 · unverdicted · none · ref 11
ConTex learns a global intervention strategy via a decomposed temporal-conditional encoder architecture to generate consistent, sparse counterfactuals for time series models in a single forward pass.
Causally Evaluating the Learnability of Formal Language Tasks cs.CL · 2026-06-08 · unverdicted · none · ref 72
Introduces the binning semiring and causal graphical models to show that correlational evaluation of learnability in formal language tasks leads to incorrect conclusions from confounders.
RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting cs.LG · 2026-06-01 · unverdicted · none · ref 28
RESCAST-100K is a large-scale benchmark dataset of simulated and real residential energy data for cross-domain load and temperature forecasting.
Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them cs.LG · 2026-05-29 · conditional · none · ref 30
Repetition rate mismatch between small-scale proxies and target budgets is the main reason data mixture experiments do not scale; a subsampling procedure that equalizes repetition rates recovers optimal mixtures from 1/16-scale experiments.
Faithful Embeddings of Irregular and Asynchronous Data for Online Log-NCDEs cs.LG · 2026-05-28 · unverdicted · none · ref 1
Introduces a continuous injective embedding for Log-NCDEs that builds log-signatures from data increments without interpolation or imputation while preserving compact-set universality.
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents cs.AI · 2026-05-11 · unverdicted · none · ref 76
PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification cs.AI · 2026-05-08 · conditional · none · ref 118
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS cs.LG · 2026-05-08 · unverdicted · none · ref 39
SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.
BadmintonGRF: A Multimodal Dataset and Benchmark for Markerless Ground Reaction Force Estimation in Badminton cs.CV · 2026-05-03 · unverdicted · none · ref 16
BadmintonGRF is a new public multimodal dataset and benchmark that pairs multi-view video with instrumented GRF for markerless load estimation in badminton.
Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts cs.LG · 2026-05-01 · conditional · none · ref 36
Adding temporal memory via LIF, precision-weighted gating, and anticipatory prediction to MoE routers recovers effective expert selection at distribution transitions, with ablation confirming a super-additive beta-ant interaction.
AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code cs.CR · 2026-04-25 · unverdicted · none · ref 5
AsmRAG detects malware at 96% F1 and attributes families at 95% F1 by retrieving functionally similar assembly code via LLM embeddings and density-weighted anchor selection, remaining robust to metamorphic obfuscation.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 104
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection cs.CR · 2026-04-13 · unverdicted · none · ref 11
BRIDGE creates the first formal heterogeneous multi-dataset benchmark for IoT botnet detection with LODO evaluation, and TCH-Net achieves mean LODO F1 of 0.5577 while reaching F1 0.8296 on standard tests, outperforming twelve baselines.
FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment cs.AI · 2026-03-17 · unverdicted · none · ref 6
FactorEngine mines alpha factors as Turing-complete code via LLM-guided directional search, parameter separation, and a multi-agent pipeline that converts financial reports into executable programs, delivering higher IC/ICIR and Sharpe ratios than baselines in backtests.
Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models cs.CE · 2026-02-05 · unverdicted · none · ref 12
Koopman autoencoders with forcings and temporal unrolling deliver accurate year-long predictions for coastal-ocean models at 300-1400x speedup, outperforming POD in two of three cases.
Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs cs.LG · 2026-06-26 · unverdicted · none · ref 92
Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.
Quantization Inflates Reasoning: Token Inflation as a Hidden Cost of Low-Bit Reasoning Models cs.AI · 2026-06-24 · unverdicted · none · ref 170
Low-bit post-training quantization of reasoning LLMs increases reasoning token counts while preserving accuracy, introducing a hidden test-time compute cost.
Prediction of Viscoelastic Droplet Impact Dynamics Using a Vision Transformer-Based Approach physics.flu-dyn · 2026-06-22 · unverdicted · none · ref 20
ViViT model predicts full viscoelastic droplet impact dynamics from initial 10-20% of VOF simulation data, reducing cost by 80-90% while capturing spreading and bouncing regimes.
Topological Out-of-Domain Generalization in Dynamical Systems Reconstruction cs.LG · 2026-06-22 · unverdicted · none · ref 27
Proposes feature splitting and a closed-form bound on extrapolation range to enable zero-shot topological out-of-domain generalization in dynamical systems reconstruction across tipping points.
Remember what you did?: Learning Behavioral Memories for Partially Observable Object Manipulation cs.RO · 2026-06-19 · unverdicted · none · ref 14
CAMP learns a compressed behavioral memory from action history to enable success in long-horizon partially observable object manipulation without extra supervision, showing gains over baselines in real-robot and simulation tests.
TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living cs.CV · 2026-06-18 · unverdicted · none · ref 267
TimeProVe proposes a propose-then-verify framework using lightweight action-based candidate evidence generation followed by targeted VLM verification for efficient long video temporal reasoning, achieving 7.3% improvement on OTB with 75% fewer VLM calls.
A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors cs.LG · 2026-06-17 · unverdicted · none · ref 12
Hybrid LSTM-ViT model using mesonet surface data and profiler vertical profiles improves HRRR forecast error prediction for precipitation, wind speed, and temperature, with roughly twofold skill gain for precipitation over baseline LSTM.
Tests for Independence of High-Dimensional Nonstationary Time Series math.ST · 2026-06-07 · unverdicted · none · ref 103
A new test statistic and bootstrap for independence testing of high-dimensional nonstationary time series that avoids whitening by removing temporal dependence bias under the null.
GeoGNN: Time Series Geo-Localization using Two-Tower Graph Neural Networks cs.LG · 2026-06-06 · unverdicted · none · ref 25
GeoGNN is a two-tower GNN that learns geographic cell embeddings from adjacency graphs and matches them to temporal representations via dot-product similarity plus classification, improving geolocalization accuracy by ~27% on electricity datasets.
ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall stat.ML · 2026-06-03 · unverdicted · none · ref 28
ReSGA, a large autoencoder, outperforms prior methods on joint VaR-ES forecasting for US equities and converts the edge into economic gains via a size-enhanced momentum strategy, with gains attributed to data complexity.
Composing Non-Conjugate Factor Graphs with Closed-Form Variational Inference cs.LG · 2026-05-28 · unverdicted · none · ref 8
Models composed from bilinear factor, exponential link, Gamma prior, Gaussian likelihood, and equality node admit closed-form variational message passing under mean-field factorization.
Fast MoE Inference via Predictive Prefetching and Expert Replication cs.LG · 2026-05-12 · conditional · none · ref 7
Dynamic replication of predicted overloaded experts in MoE models achieves near-100% GPU utilization and up to 3x faster inference while retaining 90-95% of baseline performance.
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies cs.LG · 2026-05-08 · unverdicted · none · ref 96
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge cs.DL · 2026-05-07 · unverdicted · none · ref 29 · 3 links
AI use in science has grown exponentially since 2015 but stays confined to computer science and statistics topics, shows higher retraction rates and citations, and follows distinct global adoption patterns.
Conditional Neural Field based Reduced Order Model for Dynamic Ditching Load Prediction physics.flu-dyn · 2026-05-05 · unverdicted · none · ref 10
Conditional neural fields combined with LSTM networks predict aircraft ditching loads accurately across heterogeneous spatial discretizations using fewer parameters than convolutional autoencoders.
Hybrid Machine Learning and Physical Modeling of Feedstock Deformation During Robotic 3D Printing of Continuous Fiber Thermoplastic Composites cs.CE · 2026-05-04 · unverdicted · none · ref 34
A hybrid Kelvin-Voigt viscoelastic and stabilized neural ODE model, identified from DMA and DSC experiments, predicts composite prepreg deformation in robotic 3D printing and generalizes beyond training temperatures.
Adaptive Interpolation-Synthesis for Motion In-Betweening on Keyframe-Based Animation cs.GR · 2026-05-04 · unverdicted · none · ref 45
The Adaptive Interpolation-Synthesis method uses a domain-aligned keypose schedule and adaptive layer to achieve state-of-the-art motion in-betweening with 3.5x speedup when integrated in Autodesk Maya.
Pretraining on Sleep Data Improves non-Sleep Biosignal Tasks cs.LG · 2026-05-04 · unverdicted · none · ref 27
Sleep-only contrastive pretraining improves results on non-sleep EEG and ECG tasks relative to training from scratch and matches or exceeds some specialized models.
Rethinking Publication: A Certification Framework for AI-Enabled Research cs.AI · 2026-04-23 · unverdicted · none · ref 17 · 2 links
A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.
Convergent Evolution: How Different Language Models Learn Similar Number Representations cs.CL · 2026-04-22 · unverdicted · none · ref 48
Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.
ACT: Anti-Crosstalk Learning for Cross-Sectional Stock Ranking via Temporal Disentanglement and Structural Purification cs.LG · 2026-04-22 · unverdicted · none · ref 6
ACT disentangles temporal scales in stock sequences and purifies structural relations in graphs to achieve state-of-the-art cross-sectional stock ranking on CSI300 and CSI500 with up to 74.25% improvement.
Dual Alignment Between Language Model Layers and Human Sentence Processing cs.CL · 2026-04-20 · unverdicted · none · ref 251
Later LLM layers align better with human cognitive effort in syntactic ambiguity than early layers do, indicating dual processing modes and complementary benefits from multi-layer probability updates.
Adaptive RIS Configuration Design with Environmental Sensing for User Localization in Dynamic Rich Scattering Environment eess.SP · 2026-04-19 · unverdicted · none · ref 17
A two-part biLSTM model estimates environmental scattering from sequential pilots and adaptively tunes RIS configurations to achieve lower localization RMSE than random, codebook, or non-adaptive baselines in dynamic rich scattering environments.
AIBuildAI: An AI Agent for Automatically Building AI Models cs.AI · 2026-04-15 · unverdicted · none · ref 62
AIBuildAI uses a manager agent and three LLM sub-agents to fully automate AI model development and achieves a 63.1% medal rate on MLE-Bench, matching experienced human engineers.
Thermodynamic Liquid Manifold Networks: Physics-Bounded Deep Learning for Solar Forecasting in Autonomous Off-Grid Microgrids cs.LG · 2026-04-13 · unverdicted · none · ref 1
A new neural network architecture enforces celestial and thermodynamic constraints to deliver zero nocturnal error and high-accuracy solar forecasts for autonomous microgrids.
Daily Predictions of F10.7 and F30 Solar Indices with Deep Learning astro-ph.SR · 2026-04-11 · unverdicted · none · ref 11
SINet outperforms five prior statistical and deep learning methods on F10.7 predictions and provides the first deep learning forecasts for the F30 solar index.
Time-Warping Recurrent Neural Networks for Transfer Learning cs.LG · 2026-04-02 · unverdicted · none · ref 15
Time-warping enables RNN transfer learning across time scales in physical systems by rescaling time in pretrained LSTMs, matching accuracy of other methods with minimal parameter changes.
Acoustic Feedback for Closed-Loop Force Control in Robotic Grinding cs.RO · 2026-02-24 · unverdicted · none · ref 18
AFRG estimates grinding force from contact microphone audio for closed-loop robotic control, delivering 4-fold better consistency across disc conditions at roughly 200 times lower cost than force sensors.
Optimizing Chlorination in Water Distribution Systems via Surrogate-assisted Neuroevolution cs.NE · 2026-02-07 · unverdicted · none · ref 16
Surrogate-assisted neuroevolution produces Pareto-optimal chlorine dosing policies for water distribution systems that outperform PPO on four practical objectives.
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion cs.RO · 2026-01-31 · unverdicted · none · ref 39
MoE-based locomotion policy with RoboGauge metrics achieves reliable sim-to-real transfer, enabling robust quadrupedal walking on challenging unseen terrains up to 4 m/s.
Bridging Performance and Generalization in Reinforcement Learning for Agile Flight cs.RO · 2026-06-25 · unverdicted · none · ref 44
RL framework for agile drone racing combines task-aware switching and physically informed procedural track generation to achieve 7.4x better zero-shot generalization to unseen tracks while maintaining competitive speeds.
Interpretable Kolmogorov-Arnold Network with Feature-Isolated Temporal Attention Mechanism for Electricity Load Forecasting cs.LG · 2026-06-22 · unverdicted · none · ref 32
LoadKAN combines feature-isolated temporal attention with KAN to produce competitive load forecasts on three U.S. markets and enables quantitative analysis of non-linear mobility-load relationships via learned activation functions.
RankGLU: Residual Gated Score Formation for Cross-Sectional Stock Prediction cs.CE · 2026-06-08 · unverdicted · none · ref 18
RankGLU improves mean information coefficient on CSI300 from 0.0654 to 0.0727 by using a residual bottleneck gated linear unit for cross-sectional stock score formation.

Long short -term memory

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer