Emmanuel Dupoux, Yann LeCun, and Jitendra Malik

· 2026 · arXiv 2603.15381

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

EVA-0: Test-Time Model Evolution with Only Two Forward Passes per Sample

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

EVA-0 is a zeroth-order test-time adaptation method that uses scale-invariant loss, anchor-guided optimization, and symmetric two-sided perturbations to enable inference and adaptation in two forward passes, outperforming prior methods on ImageNet-C with ViT-Base.

Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception

cs.AI · 2026-04-06 · unverdicted · novelty 7.0

Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent self-diagnosed bugs and maintained cross-channel context.

TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

TFGN is an architectural overlay for transformers enabling task-free, replay-free continual pre-training across heterogeneous domains at LLM scale with near-zero backward transfer and high gradient orthogonality.

The Evaluation Trap: Benchmark Design as Theoretical Commitment

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.

MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition

cs.AI · 2026-04-17 · unverdicted · novelty 6.0

MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.

citing papers explorer

Showing 5 of 5 citing papers.

EVA-0: Test-Time Model Evolution with Only Two Forward Passes per Sample cs.LG · 2026-05-15 · unverdicted · none · ref 12
EVA-0 is a zeroth-order test-time adaptation method that uses scale-invariant loss, anchor-guided optimization, and symmetric two-sided perturbations to enable inference and adaptation in two forward passes, outperforming prior methods on ImageNet-C with ViT-Base.
Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception cs.AI · 2026-04-06 · unverdicted · none · ref 2
Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent self-diagnosed bugs and maintained cross-channel context.
TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale cs.LG · 2026-05-14 · unverdicted · none · ref 3
TFGN is an architectural overlay for transformers enabling task-free, replay-free continual pre-training across heterogeneous domains at LLM scale with near-zero backward transfer and high gradient orthogonality.
The Evaluation Trap: Benchmark Design as Theoretical Commitment cs.AI · 2026-05-13 · unverdicted · none · ref 6
AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.
MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition cs.AI · 2026-04-17 · unverdicted · none · ref 20
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.

Emmanuel Dupoux, Yann LeCun, and Jitendra Malik

fields

years

verdicts

representative citing papers

citing papers explorer