http://yann

The MNIST database of handwritten digits , author=

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

representative citing papers

cs.LG · 2018-11-27 · unverdicted · novelty 8.0

Dataset distillation creates a tiny synthetic training set that, when used with a fixed network initialization, produces models whose performance approximates that of models trained on the full original dataset.

Continual Learning of Domain-Invariant Representations

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Introduces replay-based continual learning with sequential invariance alignment to learn domain-invariant representations, outperforming baselines on generalization to unseen domains across six datasets in vision, medicine, manufacturing, and ecology.

Implicit Neural Optimal Transport via Fixed-Point Optimization

math.OC · 2026-05-11 · unverdicted · novelty 7.0

A single-network implicit neural optimal transport method that solves the c-transform via proximal fixed-point iteration for stable, non-adversarial training.

On the Stability and Generalization of First-order Bilevel Minimax Optimization

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Provides the first systematic generalization analysis via algorithmic stability for single-timescale and two-timescale stochastic gradient descent-ascent in bilevel minimax problems.

Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.

Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

Zeroth-order SGD learning dynamics are governed by a random low-dimensional projection of the empirical NTK whose approximation error scales with model output dimension, not parameter count.

Possibilistic Predictive Uncertainty for Deep Learning

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

DAPPr projects a possibilistic posterior over network parameters to predictions using supremum operators and approximates it with learnable Dirichlet functions to yield an efficient training objective for epistemic uncertainty.

citing papers explorer

Showing 7 of 7 citing papers after filters.

Dataset Distillation cs.LG · 2018-11-27 · unverdicted · none · ref 51
Dataset distillation creates a tiny synthetic training set that, when used with a fixed network initialization, produces models whose performance approximates that of models trained on the full original dataset.
Continual Learning of Domain-Invariant Representations cs.LG · 2026-05-15 · unverdicted · none · ref 81
Introduces replay-based continual learning with sequential invariance alignment to learn domain-invariant representations, outperforming baselines on generalization to unseen domains across six datasets in vision, medicine, manufacturing, and ecology.
Implicit Neural Optimal Transport via Fixed-Point Optimization math.OC · 2026-05-11 · unverdicted · none · ref 168
A single-network implicit neural optimal transport method that solves the c-transform via proximal fixed-point iteration for stable, non-adversarial training.
On the Stability and Generalization of First-order Bilevel Minimax Optimization cs.LG · 2026-04-22 · unverdicted · none · ref 95
Provides the first systematic generalization analysis via algorithmic stability for single-timescale and two-timescale stochastic gradient descent-ascent in bilevel minimax problems.
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems cs.CL · 2026-05-15 · unverdicted · none · ref 75
Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.
Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective cs.LG · 2026-05-05 · unverdicted · none · ref 47
Zeroth-order SGD learning dynamics are governed by a random low-dimensional projection of the empirical NTK whose approximation error scales with model output dimension, not parameter count.
Possibilistic Predictive Uncertainty for Deep Learning cs.LG · 2026-05-01 · unverdicted · none · ref 6
DAPPr projects a possibilistic posterior over network parameters to predictions using supremum operators and approximates it with learnable Dirichlet functions to yield an efficient training objective for epistemic uncertainty.

http://yann

fields

years

verdicts

representative citing papers

citing papers explorer