super hub Canonical reference

Long short -term memory

Jürgen Schmidhuber, Sepp Hochreiter · 1997 · Neural Computation · DOI 10.1162/neco.1997.9.8.1735 · arXiv gov/9377276

Canonical reference. 74% of citing Pith papers cite this work as background.

128 Pith papers citing it

80.8k external citations · Crossref

Background 74% of classified citations

open at publisher browse 128 citing papers more from Jürgen Schmidhuber arXiv PDF

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 15 baseline 2 method 2

citation-polarity summary

background 14 baseline 2 use method 2 support 1

authors

Jürgen Schmidhuber Sepp Hochreiter

co-cited works

representative citing papers

HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations

cs.LG · 2026-05-10 · conditional · novelty 8.0 · 2 refs

HS-FNO lifts the state to include history and decomposes updates into a learned future-slice predictor plus an exact shift-append transport, yielding lower rollout errors than standard or lag-stack FNO baselines on five non-Markovian PDE families.

RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting

cs.LG · 2026-06-23 · unverdicted · novelty 7.0

RAVEN proposes a regime-aware MoE architecture with cumulative importance thresholding and correlation-aware weighting to adaptively select temporal context for non-stationary financial forecasting.

ConTex: Reformulating Counterfactual Generation For Time Series Forecasting

cs.LG · 2026-06-16 · unverdicted · novelty 7.0

ConTex learns a global intervention strategy via a decomposed temporal-conditional encoder architecture to generate consistent, sparse counterfactuals for time series models in a single forward pass.

Causally Evaluating the Learnability of Formal Language Tasks

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Introduces the binning semiring and causal graphical models to show that correlational evaluation of learnability in formal language tasks leads to incorrect conclusions from confounders.

RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

RESCAST-100K is a large-scale benchmark dataset of simulated and real residential energy data for cross-domain load and temperature forecasting.

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

cs.LG · 2026-05-29 · conditional · novelty 7.0

Repetition rate mismatch between small-scale proxies and target budgets is the main reason data mixture experiments do not scale; a subsampling procedure that equalizes repetition rates recovers optimal mixtures from 1/16-scale experiments.

Faithful Embeddings of Irregular and Asynchronous Data for Online Log-NCDEs

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Introduces a continuous injective embedding for Log-NCDEs that builds log-signatures from data increments without interpolation or imputation while preserving compact-set universality.

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.

BadmintonGRF: A Multimodal Dataset and Benchmark for Markerless Ground Reaction Force Estimation in Badminton

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

BadmintonGRF is a new public multimodal dataset and benchmark that pairs multi-view video with instrumented GRF for markerless load estimation in badminton.

Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

cs.LG · 2026-05-01 · conditional · novelty 7.0

Adding temporal memory via LIF, precision-weighted gating, and anticipatory prediction to MoE routers recovers effective expert selection at distribution transitions, with ablation confirming a super-additive beta-ant interaction.

AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

AsmRAG detects malware at 96% F1 and attributes families at 95% F1 by retrieving functionally similar assembly code via LLM embeddings and density-weighted anchor selection, remaining robust to metamorphic obfuscation.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection

cs.CR · 2026-04-13 · unverdicted · novelty 7.0

BRIDGE creates the first formal heterogeneous multi-dataset benchmark for IoT botnet detection with LODO evaluation, and TCH-Net achieves mean LODO F1 of 0.5577 while reaching F1 0.8296 on standard tests, outperforming twelve baselines.

FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment

cs.AI · 2026-03-17 · unverdicted · novelty 7.0

FactorEngine mines alpha factors as Turing-complete code via LLM-guided directional search, parameter separation, and a multi-agent pipeline that converts financial reports into executable programs, delivering higher IC/ICIR and Sharpe ratios than baselines in backtests.

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

cs.CE · 2026-02-05 · unverdicted · novelty 7.0

Koopman autoencoders with forcings and temporal unrolling deliver accurate year-long predictions for coastal-ocean models at 300-1400x speedup, outperforming POD in two of three cases.

Temporal Graph Networks for Deep Learning on Dynamic Graphs

cs.LG · 2020-06-18 · unverdicted · novelty 7.0

Temporal Graph Networks combine memory modules and graph operators to learn on dynamic graphs as timed event sequences, outperforming prior methods on transductive and inductive tasks while unifying earlier models as special cases.

Language Models as Knowledge Bases?

cs.CL · 2019-09-03 · accept · novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.

Quantization Inflates Reasoning: Token Inflation as a Hidden Cost of Low-Bit Reasoning Models

cs.AI · 2026-06-24 · unverdicted · novelty 6.0

Low-bit post-training quantization of reasoning LLMs increases reasoning token counts while preserving accuracy, introducing a hidden test-time compute cost.

Prediction of Viscoelastic Droplet Impact Dynamics Using a Vision Transformer-Based Approach

physics.flu-dyn · 2026-06-22 · unverdicted · novelty 6.0

ViViT model predicts full viscoelastic droplet impact dynamics from initial 10-20% of VOF simulation data, reducing cost by 80-90% while capturing spreading and bouncing regimes.

Topological Out-of-Domain Generalization in Dynamical Systems Reconstruction

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Proposes feature splitting and a closed-form bound on extrapolation range to enable zero-shot topological out-of-domain generalization in dynamical systems reconstruction across tipping points.

citing papers explorer

Showing 28 of 128 citing papers.

Predicting Forecast Error for the HRRR Using LSTM Neural Networks: A Comparative Study Using New York and Oklahoma State Mesonets physics.ao-ph · 2025-12-16 · conditional · none · ref 36
LSTM networks predict HRRR forecast errors with average improvements of 48% for precipitation, 25% for temperature, and 15% for wind using mesonet ground truth.
Time Series Forecasting Through the Lens of Dynamics cs.LG · 2025-07-21 · unverdicted · none · ref 16
Proposes dynamics-based analysis of time series models showing partial dynamics learning and end-positioning as key to performance, plus a plug-and-play improvement method.
Tabular GANs for uneven distribution cs.LG · 2020-10-01 · unverdicted · none · ref 3
A modular framework for tabular data generation across GANs, diffusion models, and LLMs is introduced and tested on seven benchmarks, with GAN augmentation shown to boost performance under distribution shift.
Automatically Learning Construction Injury Precursors from Text cs.CL · 2019-07-26 · unverdicted · none · ref 45
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.
Autoencoder Architectures for Athlete Performance Scoring from Wearable Telemetry cs.LG · 2026-06-26 · unverdicted · none · ref 18
Deep autoencoders outperform PCA and VAE variants on a composite of reconstruction MSE and interpretability metrics when reducing runner wearable data to a single latent performance score.
Lifelong In-Context Learning with Transformers Requires Parametric Forms of Attention cs.LG · 2026-06-24 · unverdicted · none · ref 21
Argues that parametric attention forms are necessary for lifelong in-context learning in transformers to maintain constant memory footprint over arbitrary sequence lengths.
Machine Learning Approaches for Improved Scalability of Metallic Magnetic Calorimeters physics.ins-det · 2026-06-23 · unverdicted · none · ref 79
Machine learning methods are explored for pulse classification, artifact rejection, and shape analysis in metallic magnetic calorimeters to improve scalability over traditional signal processing.
An AI Security Agent for Banking: Multi-Vector Fraud and AML Detection Across Retail and Corporate Accounts cs.CR · 2026-06-16 · unverdicted · none · ref 1
A three-component fusion architecture of LSTM, statistical, and graph modules detects fraud and AML on synthetic banking data with F1 scores of 0.787 (transactions) and 0.867 (sessions), outperforming rule-based and LSTM-only baselines.
Towards Robust Arabic Speech Emotion Recognition with Deep Learning cs.SD · 2026-06-09 · unverdicted · none · ref 20
CNN-Transformer hybrid reaches 98.1% accuracy on Arabic SER using EYASE and BAVED datasets, outperforming CNN-LSTM and fine-tuned wav2vec 2.0.
A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling cs.LG · 2026-05-11 · unverdicted · none · ref 31
A simulation-driven digital twin framework is shown to generate interpretable diabetes trajectories for decision-aware analysis by combining benchmark data with controlled synthetic scenarios.
A Machine Learning Framework for EEG-Based Prediction of Treatment Efficacy in Chronic Neck Pain q-bio.QM · 2026-05-05 · unverdicted · none · ref 21
A preprocessing pipeline for resting-state and motor-task EEG is described to support future machine learning models that predict treatment efficacy in chronic neck pain.
Multilevel neural networks with dual-stage feature fusion for human activity recognition cs.CV · 2026-04-17 · unverdicted · none · ref 25
Multilevel CNN-LSTM architectures using both late and intermediate feature fusion achieve higher accuracy in human activity recognition than late fusion alone on two benchmark datasets.
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation cs.CL · 2025-04-02 · unverdicted · none · ref 4
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.
System Misuse Detection via Informed Behavior Clustering and Modeling cs.CR · 2019-07-01 · unverdicted · none · ref 15
An informed machine learning approach using LSTM networks and expert-driven visual clustering to model normal behavior and detect misuse in system logs.
Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation cs.CL · 2019-06-24 · unverdicted · none · ref 12
Attention layers do not improve BiLSTM performance on argument unit segmentation and contextualized embeddings show little benefit.
Artificial Intelligence for Power-Converter-Rich Electrical Systems: A Review eess.SY · 2026-06-14 · unverdicted · none · ref 31
Review of AI applications in power-converter-rich systems across design, control, operations, and governance, highlighting deployment gaps.
cantnlp@DravidianLangTech 2026: organic domain adaptation improves multi-class hope speech detection in Tulu cs.CL · 2026-05-10 · unverdicted · none · ref 13
Organic domain adaptation of XLM-RoBERTa on Tulu social media text improves multi-class hope speech detection in code-mixed Tulu on the development set.
Sentiment Analysis of Indonesian Spotify Reviews Using Machine Learning and BiLSTM cs.CL · 2026-05-05 · unverdicted · none · ref 4
BiLSTM achieves the highest weighted F1-score for three-class sentiment classification of Indonesian Spotify reviews while Decision Tree with SMOTE delivers more balanced performance across classes.
Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model cs.SD · 2026-04-16 · unverdicted · none · ref 19
MFCC features with an LSTM classifier reach 99% accuracy on emotion recognition from the TESS speech dataset, marginally above an SVM baseline.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 61
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Deep Learning in the Automotive Industry: Recent Advances and Application Examples cs.LG · 2019-06-20 · unverdicted · none · ref 29
An overview of deep learning applications and challenges in the automotive industry, covering ADAS, automated driving, virtual sensing, and data-driven development.
Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference astro-ph.IM · 2026-05-21 · unreviewed · ref 12 · 2 links
Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations cs.AR · 2026-05-12 · unreviewed · ref 27
Quantifying Rodda and Graham Gait Classification from 3D Markerless Kinematics derived from a Single-view Video in a Heterogeneous Pediatric Clinical Cohort cs.CV · 2026-05-11 · unreviewed · ref 43 · 2 links
Structured Recurrent Mixers for Massively Parallelized Sequence Generation cs.CL · 2026-05-09 · unreviewed · ref 50 · 2 links
MinMax Recurrent Neural Cascades cs.LG · 2026-05-07 · unreviewed · ref 4 · 2 links
Learning to Emulate Chaos: Adversarial Optimal Transport Regularization stat.ML · 2026-04-22 · unreviewed · ref 5
MSTN: A Lightweight and Fast Model for General TimeSeries Analysis cs.LG · 2025-11-25 · unreviewed · ref 13 · 2 links

Long short -term memory

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer