Kingma and Jimmy Ba

Diederik P · 2015

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 2 dataset 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

Bridging Sequence and Graph Structure for Epigenetic Age Prediction

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

A sequence-graph model using gated modulation of methylation signals by eight handcrafted DNA sequence features achieves 3.149 years MAE on 3707 samples, a 12.8% gain over graph baselines.

Fitting Multilinear Polynomials for Logic Gate Networks

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.

OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

OSDN adds online diagonal preconditioning to the Delta Rule, preserving chunkwise parallelism while proving super-geometric convergence and delivering 32-39% recall gains at 340M-1.3B scales.

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

LBW-Guard is a bounded autonomous control layer above AdamW that improves stability, reduces perplexity, and speeds up training for Qwen2.5 models under learning-rate stress on WikiText-103.

citing papers explorer

Showing 4 of 4 citing papers.

Bridging Sequence and Graph Structure for Epigenetic Age Prediction cs.AI · 2026-05-11 · unverdicted · none · ref 32
A sequence-graph model using gated modulation of methylation signals by eight handcrafted DNA sequence features achieves 3.149 years MAE on 3707 samples, a 12.8% gain over graph baselines.
Fitting Multilinear Polynomials for Logic Gate Networks cs.LG · 2026-05-09 · unverdicted · none · ref 33
Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention cs.LG · 2026-05-13 · unverdicted · none · ref 29
OSDN adds online diagonal preconditioning to the Delta Rule, preserving chunkwise parallelism while proving super-geometric convergence and delivering 32-39% recall gains at 340M-1.3B scales.
Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency cs.AI · 2026-05-18 · unverdicted · none · ref 4
LBW-Guard is a bounded autonomous control layer above AdamW that improves stability, reduces perplexity, and speeds up training for Qwen2.5 models under learning-rate stress on WikiText-103.

Kingma and Jimmy Ba

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer