super hub Canonical reference

Long short -term memory

Jürgen Schmidhuber, Sepp Hochreiter · 1997 · Neural Computation · DOI 10.1162/neco.1997.9.8.1735 · arXiv gov/9377276

Canonical reference. 74% of citing Pith papers cite this work as background.

127 Pith papers citing it

80.8k external citations · Crossref

Background 74% of classified citations

open at publisher browse 127 citing papers more from Jürgen Schmidhuber arXiv PDF

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 15 baseline 2 method 2

citation-polarity summary

background 14 baseline 2 use method 2 support 1

authors

Jürgen Schmidhuber Sepp Hochreiter

co-cited works

representative citing papers

HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations

cs.LG · 2026-05-10 · conditional · novelty 8.0 · 2 refs

HS-FNO lifts the state to include history and decomposes updates into a learned future-slice predictor plus an exact shift-append transport, yielding lower rollout errors than standard or lag-stack FNO baselines on five non-Markovian PDE families.

RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting

cs.LG · 2026-06-23 · unverdicted · novelty 7.0

RAVEN proposes a regime-aware MoE architecture with cumulative importance thresholding and correlation-aware weighting to adaptively select temporal context for non-stationary financial forecasting.

ConTex: Reformulating Counterfactual Generation For Time Series Forecasting

cs.LG · 2026-06-16 · unverdicted · novelty 7.0

ConTex learns a global intervention strategy via a decomposed temporal-conditional encoder architecture to generate consistent, sparse counterfactuals for time series models in a single forward pass.

Causally Evaluating the Learnability of Formal Language Tasks

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Introduces the binning semiring and causal graphical models to show that correlational evaluation of learnability in formal language tasks leads to incorrect conclusions from confounders.

RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

RESCAST-100K is a large-scale benchmark dataset of simulated and real residential energy data for cross-domain load and temperature forecasting.

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

cs.LG · 2026-05-29 · conditional · novelty 7.0

Repetition rate mismatch between small-scale proxies and target budgets is the main reason data mixture experiments do not scale; a subsampling procedure that equalizes repetition rates recovers optimal mixtures from 1/16-scale experiments.

Faithful Embeddings of Irregular and Asynchronous Data for Online Log-NCDEs

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Introduces a continuous injective embedding for Log-NCDEs that builds log-signatures from data increments without interpolation or imputation while preserving compact-set universality.

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.

BadmintonGRF: A Multimodal Dataset and Benchmark for Markerless Ground Reaction Force Estimation in Badminton

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

BadmintonGRF is a new public multimodal dataset and benchmark that pairs multi-view video with instrumented GRF for markerless load estimation in badminton.

Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

cs.LG · 2026-05-01 · conditional · novelty 7.0

Adding temporal memory via LIF, precision-weighted gating, and anticipatory prediction to MoE routers recovers effective expert selection at distribution transitions, with ablation confirming a super-additive beta-ant interaction.

AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

AsmRAG detects malware at 96% F1 and attributes families at 95% F1 by retrieving functionally similar assembly code via LLM embeddings and density-weighted anchor selection, remaining robust to metamorphic obfuscation.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection

cs.CR · 2026-04-13 · unverdicted · novelty 7.0

BRIDGE creates the first formal heterogeneous multi-dataset benchmark for IoT botnet detection with LODO evaluation, and TCH-Net achieves mean LODO F1 of 0.5577 while reaching F1 0.8296 on standard tests, outperforming twelve baselines.

FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment

cs.AI · 2026-03-17 · unverdicted · novelty 7.0

FactorEngine mines alpha factors as Turing-complete code via LLM-guided directional search, parameter separation, and a multi-agent pipeline that converts financial reports into executable programs, delivering higher IC/ICIR and Sharpe ratios than baselines in backtests.

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

cs.CE · 2026-02-05 · unverdicted · novelty 7.0

Koopman autoencoders with forcings and temporal unrolling deliver accurate year-long predictions for coastal-ocean models at 300-1400x speedup, outperforming POD in two of three cases.

Temporal Graph Networks for Deep Learning on Dynamic Graphs

cs.LG · 2020-06-18 · unverdicted · novelty 7.0

Temporal Graph Networks combine memory modules and graph operators to learn on dynamic graphs as timed event sequences, outperforming prior methods on transductive and inductive tasks while unifying earlier models as special cases.

Language Models as Knowledge Bases?

cs.CL · 2019-09-03 · accept · novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.

Quantization Inflates Reasoning: Token Inflation as a Hidden Cost of Low-Bit Reasoning Models

cs.AI · 2026-06-24 · unverdicted · novelty 6.0

Low-bit post-training quantization of reasoning LLMs increases reasoning token counts while preserving accuracy, introducing a hidden test-time compute cost.

Prediction of Viscoelastic Droplet Impact Dynamics Using a Vision Transformer-Based Approach

physics.flu-dyn · 2026-06-22 · unverdicted · novelty 6.0

ViViT model predicts full viscoelastic droplet impact dynamics from initial 10-20% of VOF simulation data, reducing cost by 80-90% while capturing spreading and bouncing regimes.

Topological Out-of-Domain Generalization in Dynamical Systems Reconstruction

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Proposes feature splitting and a closed-form bound on extrapolation range to enable zero-shot topological out-of-domain generalization in dynamical systems reconstruction across tipping points.

citing papers explorer

Showing 50 of 127 citing papers.

Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion cs.RO · 2026-01-31 · unverdicted · none · ref 39
MoE-based locomotion policy with RoboGauge metrics achieves reliable sim-to-real transfer, enabling robust quadrupedal walking on challenging unseen terrains up to 4 m/s.
CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification cs.CL · 2025-10-19 · unverdicted · none · ref 52
CoGate-LSTM adds prototype-guided cosine feature-space gating to a character-level BiLSTM with multi-source embeddings and focal loss, reaching 0.881 macro-F1 on Jigsaw toxic comments while using 7.3M parameters and outperforming fine-tuned BERT by 6.9 points on minority labels.
Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis cs.CV · 2025-10-18 · conditional · none · ref 19
Cataract-LMM is a new multi-source dataset of 3000 annotated phacoemulsification videos enabling benchmarks for phase recognition, scene segmentation, interaction tracking, and automated skill assessment.
Short window attention enables long-term memorization cs.LG · 2025-09-29 · unverdicted · none · ref 17
Short sliding windows in hybrid attention-xLSTM models boost long-context performance by encouraging long-term memory use, and stochastic window sizing improves both short and long tasks.
Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting cs.LG · 2025-05-16 · unverdicted · none · ref 4
Logo-LLM improves time series forecasting by pulling local dynamics from shallow LLM layers and global trends from deeper layers, then aligning them via new Local-Mixer and Global-Mixer modules.
Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots cs.RO · 2024-07-30 · unverdicted · none · ref 33
A single learned controller called MHC enables real humanoid robots to execute diverse whole-body behaviors from multi-modal inputs via masked target trajectories.
Efficient Training of Language Models to Fill in the Middle cs.CL · 2022-07-28 · unverdicted · none · ref 85
Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 228
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 151
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 109
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
Relating Simple Sentence Representations in Deep Neural Networks and the Brain cs.CL · 2019-06-27 · unverdicted · none · ref 8
BERT activations show strongest correlation with MEG data for simple sentences; DNN representations generate synthetic brain data that improves stimuli decoding accuracy.
Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis cs.CL · 2019-06-24 · unverdicted · none · ref 18
Authors release a new 800-sentence gender-balanced profession dataset and use it to test occupational gender stereotypes in three sentiment analysis models.
Training an Interactive Helper cs.AI · 2019-06-24 · unverdicted · none · ref 8
Meta-learning produces a helper agent that infers and executes tasks for a prime agent using emergent physical communication in cooperative foraging environments.
Bridging Performance and Generalization in Reinforcement Learning for Agile Flight cs.RO · 2026-06-25 · unverdicted · none · ref 44
RL framework for agile drone racing combines task-aware switching and physically informed procedural track generation to achieve 7.4x better zero-shot generalization to unseen tracks while maintaining competitive speeds.
Interpretable Kolmogorov-Arnold Network with Feature-Isolated Temporal Attention Mechanism for Electricity Load Forecasting cs.LG · 2026-06-22 · unverdicted · none · ref 32
LoadKAN combines feature-isolated temporal attention with KAN to produce competitive load forecasts on three U.S. markets and enables quantitative analysis of non-linear mobility-load relationships via learned activation functions.
RankGLU: Residual Gated Score Formation for Cross-Sectional Stock Prediction cs.CE · 2026-06-08 · unverdicted · none · ref 18
RankGLU improves mean information coefficient on CSI300 from 0.0654 to 0.0727 by using a residual bottleneck gated linear unit for cross-sectional stock score formation.
Physics-Informed Graph Neural Network Surrogates for Turbulent Nanoparticle Dispersion in Dental Clinical Environments cs.LG · 2026-05-19 · unverdicted · none · ref 39
ELGIN is a graph-based physics-informed surrogate model that predicts carrier flow and polydisperse particle motion in dental aerosol scenarios, achieving lower tracking errors and 37x speedup versus full OpenFOAM CFD in a preliminary single-case test.
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation cs.LG · 2026-05-15 · unverdicted · none · ref 11
Sparsity-guided distillation enables replacing attention layers in ViTs with simpler sequential modules, with sparser layers showing smaller performance drops.
Dual-axis attribution of zebrafish tectal microcircuits for energy-efficient and robust neurocomputing cs.NE · 2026-05-13 · conditional · none · ref 10
Zebrafish tectal subcircuits are dissociated into spike-efficient information gating and feedback-like robustness stabilization, then transferred to improve ResNet efficiency and noise tolerance.
Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging cs.LG · 2026-05-11 · unverdicted · none · ref 90
Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.
Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages cs.CL · 2026-05-04 · unverdicted · none · ref 20
Biaffine LSTM outperforms transformer parsers like AfroXLMR and RemBERT in low-resource dependency parsing, with transformers gaining advantage as data increases and morphological complexity as a secondary predictor.
Gated Memory Policy cs.RO · 2026-04-21 · unverdicted · none · ref 20
GMP selectively activates and represents memory via a gate and lightweight cross-attention, yielding 30.1% higher success on non-Markovian robotic tasks while staying competitive on Markovian ones.
MambaKick: Early Penalty Direction Prediction from HAR Embeddings cs.CV · 2026-04-17 · unverdicted · none · ref 14
MambaKick reuses pretrained HAR embeddings with Mamba temporal modeling to predict penalty kick direction, reaching 53.1% accuracy on three classes and 64.5% on two classes.
Predicting Associations between Solar Flares and Coronal Mass Ejections Using SDO/HMI Magnetograms and a Hybrid Neural Network astro-ph.SR · 2026-04-11 · unverdicted · none · ref 24
Hybrid neural network predicts eruptive versus confined solar flares from SDO/HMI magnetogram sequences, reports good performance, and links results to magnetic flux cancellation in polarity inversion lines.
ASTRAFier: A Novel and Scalable Transformer-based Stellar Variability Classifier astro-ph.IM · 2026-04-08 · unverdicted · none · ref 36
ASTRAFier is a Transformer-BiLSTM-CNN model that classifies stellar variability from light curves, reporting 94.26% accuracy on Kepler data and 88.22% on TESS, then applied to 2.8 million TESS curves to release a catalog.
SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis cs.LG · 2025-11-14 · accept · none · ref 68
SurvBench supplies a configurable, open-source preprocessing pipeline that standardizes multi-modal EHR data from four critical-care databases for single-risk and competing-risk survival analysis.
AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting cs.LG · 2025-09-03 · unverdicted · none · ref 14
AR-KAN combines a pre-trained AR module with KAN to reduce redundancy while preserving temporal features, delivering lower probabilistic approximation error and stronger forecasting results on synthetic almost-periodic signals and real datasets.
WaveletInception Networks for on-board Vibration-Based Infrastructure Health Monitoring cs.LG · 2025-07-17 · unverdicted · none · ref 35
The WaveletInception-BiGRU network uses learnable wavelet packet transforms, 1D Inception-ResNet modules, and BiGRU layers to generate high-resolution, spatially mapped health profiles from variable-speed vibration data, outperforming prior methods on track stiffness and transition zone tasks.
From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping cs.CV · 2025-07-16 · unverdicted · none · ref 33
A comparative review with experiments identifying optimal preprocessing, models, and transfer strategies for large-scale pixel-wise crop mapping using Landsat 8 data across five sites.
Approximately Equivariant Recurrent Generative Models for Quasi-Periodic Time Series with a Progressive Training Scheme cs.LG · 2025-05-08 · unverdicted · none · ref 8
AEQ-RVAE-ST combines approximate equivariance and progressive sequence lengthening in a recurrent VAE to match or exceed prior generative models on quasi-periodic time series benchmarks.
Gamma-Ray Burst Light Curve Reconstruction: A Comparative Machine and Deep Learning Analysis astro-ph.HE · 2024-12-28 · unverdicted · none · ref 52
MLP and Attention U-Net outperform other models in reconstructing GRB light curves on 521 events, cutting plateau parameter uncertainties by 37-41% versus the Willingale baseline while achieving low MSE.
Mamba-based Deep Learning Approach for Sleep Staging on a Wireless Multimodal Wearable System without Electroencephalography q-bio.QM · 2024-12-20 · unverdicted · none · ref 12
Mamba model reaches 84% balanced accuracy on 3-class sleep staging from multimodal wearable data without EEG in 357 adults with concurrent PSG.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models cs.CL · 2024-01-11 · unverdicted · none · ref 23
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.
PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 64
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
Sketch of a novel approach to a neural model q-bio.NC · 2022-09-14 · unverdicted · none · ref 45
The paper sketches a neuron-centric model of neuroplasticity that separates neural transmission from internal signal selection and storage within each neuron rather than relying solely on synaptic weights.
Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI cs.LG · 2026-06-24 · unverdicted · none · ref 16
A cycle-based reentry architecture is proposed to guarantee self-model emergence, self-preservation, and prompt-injection immunity in AGI via a D-I loop and a new S-measure of integrated information.
Recursive QLSTM with Dynamic Variational Quantum Circuit Adaptation quant-ph · 2026-06-22 · unverdicted · none · ref 21
The paper introduces Recursive QLSTM via metacore recursion, numerically tests variants on sequence lengths, and offers theoretical arguments for better temporal propagation.
MemoryWAM: Efficient World Action Modeling with Persistent Memory cs.RO · 2026-06-18 · unverdicted · none · ref 42
MemoryWAM is a world action model with a hybrid memory design using recent frames, anchor frames, and gist tokens for efficient long-horizon robotic manipulation.
Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting cs.LG · 2026-06-17 · unverdicted · none · ref 50
Mixture-of-experts fusing multiple pretrained forecasters achieves strongest performance on influenza time series, with pretraining gains largest at longer horizons when domain-aligned and LLM methods underperforming.
On Subquadratic Architectures: From Applications to Principles cs.LG · 2026-06-10 · unverdicted · none · ref 7
xLSTM outperforms Mamba-2 and Gated DeltaNet on tasks with complex dependencies because its gating scheme enables more flexible and stable state tracking and memory accumulation.
Interpretable Temporal Facial-Region Motion Analysis for In-the-Wild Parkinson's Disease Video Classification cs.CV · 2026-06-08 · unverdicted · none · ref 17
Normalized velocity descriptors from facial keypoints with Random Forest yield 0.826 balanced accuracy and 0.855 AUROC on YouTubePD video classification, stable across 10 seeds with region ablation and permutation importance.
Attention-Augmented LSTMs for Automatic Homophonic Ciphertext Decipherment cs.CR · 2026-06-03 · unverdicted · none · ref 68
Attention-augmented LSTMs achieve near-perfect character-level decryption of homophonic ciphers from 1500-1899 English and Swedish texts when all ciphertexts share a known code pool.
Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning cs.LG · 2026-06-03 · unverdicted · none · ref 20
A hybrid DRL system for multi-pair crypto trading with deterministic risk shielding outperforms a heuristic baseline at 10% significance on Binance futures data.
Validation-Gated Multi-Agent Governance for Online Adaptation of Thermal-Hydraulic Surrogate Models under Operating-Regime Shift cs.LG · 2026-06-02 · unverdicted · none · ref 37
A validation-gated multi-agent framework enables online adaptation of thermal-hydraulic surrogates and reduces forecast error by 19% under regime shifts on experimental loop data.
AI-Based KPI Prediction Methods in Future 6G Networks: A Survey eess.SY · 2026-06-01 · unverdicted · none · ref 59
A systematic literature survey that classifies data-driven KPI prediction methods for 6G networks across KPI type, data source, protocol stack layer, horizon, model family, and objective.
Neural Network-Based Virtual Wheel-Speed Sensor for Enhanced Low-Velocity State Estimation eess.SY · 2026-05-12 · unverdicted · none · ref 10
A neural network fuses wheel and motor speed signals to cut wheel-speed estimation error by up to 85% versus the production sensor on real Volkswagen ID.7 data.
Transformer-Based Wildlife Species Classification from Daily Movement Trajectories cs.LG · 2026-05-07 · unverdicted · none · ref 13
Transformer models classify seven wildlife species from daily GPS trajectories, outperforming LSTM, CNN, and TCN baselines by 8-22 percentage points in balanced accuracy under region-holdout evaluation.
Risk Models as Mediating Artifacts: A Postphenomenological Analysis of the CIIM Framework in Cybersecurity Practice cs.CR · 2026-04-23 · unverdicted · none · ref 18
The CIIM dynamic risk model functions as a mediating artifact that reveals organizational fragility by treating R(t)=0 as a genuine singularity rather than smoothing it away, analyzed via postphenomenology and hybrid ML to produce new technological intentionality.
A Resource-Efficient Hybrid CNN-LSTM network for image-based bean leaf disease classification cs.CV · 2026-04-15 · unverdicted · none · ref 36
A lightweight hybrid CNN-LSTM network classifies bean leaf diseases at 94.38% accuracy and 1.86 MB size on the ibean dataset, with reported state-of-the-art F1 scores using EfficientNet-B7+LSTM.
Predicting Forecast Error for the HRRR Using LSTM Neural Networks: A Comparative Study Using New York and Oklahoma State Mesonets physics.ao-ph · 2025-12-16 · conditional · none · ref 36
LSTM networks predict HRRR forecast errors with average improvements of 48% for precipitation, 25% for temperature, and 15% for wind using mesonet ground truth.

Long short -term memory

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer