EVA-0 is a zeroth-order test-time adaptation method that uses scale-invariant loss, anchor-guided optimization, and symmetric two-sided perturbations to enable inference and adaptation in two forward passes, outperforming prior methods on ImageNet-C with ViT-Base.
Emmanuel Dupoux, Yann LeCun, and Jitendra Malik
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent self-diagnosed bugs and maintained cross-channel context.
TFGN is an architectural overlay for transformers enabling task-free, replay-free continual pre-training across heterogeneous domains at LLM scale with near-zero backward transfer and high gradient orthogonality.
AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.
citing papers explorer
-
EVA-0: Test-Time Model Evolution with Only Two Forward Passes per Sample
EVA-0 is a zeroth-order test-time adaptation method that uses scale-invariant loss, anchor-guided optimization, and symmetric two-sided perturbations to enable inference and adaptation in two forward passes, outperforming prior methods on ImageNet-C with ViT-Base.
-
Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception
Springdrift provides an auditable persistent runtime for long-lived LLM agents with case-based memory, normative safety gating, and ambient self-perception, shown in a 23-day single-instance deployment where the agent self-diagnosed bugs and maintained cross-channel context.
-
TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale
TFGN is an architectural overlay for transformers enabling task-free, replay-free continual pre-training across heterogeneous domains at LLM scale with near-zero backward transfer and high gradient orthogonality.
-
The Evaluation Trap: Benchmark Design as Theoretical Commitment
AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.
-
MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition
MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.