hub

Test-time learning for large language models

[HZC+25] Jinwu Hu, Zhitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, Mingkui Tan · 2025 · arXiv 2505.20633

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

The Power of Test-Time Training for Approximate Sampling

cs.DS · 2026-06-09 · unverdicted · novelty 7.0

Establishes a quadratic lower bound on query complexity for sampling from large classes of distributions given approximate density oracles, answers an open question on optimality of random walks, and shows circumvention for bounded classes as an abstraction of TTT.

Query-Conditioned Test-Time Self-Training for Large Language Models

cs.CL · 2026-05-13 · conditional · novelty 7.0 · 2 refs

QueST adapts LLMs at test time by generating query-specific problem-solution pairs for self-supervised fine-tuning, improving reasoning performance without external data.

Evidence-Informed LLM Beliefs for Continual Scientific Discovery

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

Evidence-informed belief updates make Bayesian surprise non-stationary in LLM hypothesis search, with embedding-based RAG identifying 37.5% spurious static surprisals and modified search (filtering plus diversity) yielding 30.62% higher accumulated non-stationary surprisal across five domains.

EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

EpiEvolve achieves 0.629 accuracy in streaming COVID-19 forecasting by using episodic memory, reflection on delayed labels, and regime-aware retrieval, outperforming static LLMs (0.561) and CDC ensembles (0.325) while halving recovery lag after regime shifts.

Scaling Self-Evolving Agents via Parametric Memory

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

TMEM lets LLM agents evolve their policy mid-episode by absorbing distilled supervision into online LoRA updates, outperforming summary and retrieval baselines on several long-context benchmarks.

HMARS: A Hierarchical Multi-Agent Memory System for Long-Context Reasoning

cs.IR · 2026-06-03 · unverdicted · novelty 6.0

HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.

DeliCIR: Deliberative Test-Time Evolutionary Hierarchical Multi-Agents for Composed Image Retrieval

cs.CV · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

Proposes PDF, a hierarchical multi-agent Perception-to-Deliberation Framework that adds experience self-evolution and test-time scaling to composed image retrieval, claiming SOTA on CIRR, CIRCO, and FashionIQ.

Epistemic Uncertainty for Test-Time Discovery

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

UG-TTT adds epistemic uncertainty measured by adapter disagreement as an exploration bonus in RL for LLMs, raising maximum reward and diversity on scientific discovery benchmarks.

BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception

cs.CV · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

BOLT is a 0.9M-parameter plug-and-play module that uses ego-as-teacher distillation on high-confidence predictions to align neighbor features online, raising AP@50 by up to 32.3 points over unadapted fusion while beating ego-only baselines on DAIR-V2X and OPV2V.

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and allowing a 14B model to beat Gemini-2.5-Flash.

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.

In-Place Test-Time Training

cs.LG · 2026-04-07 · conditional · novelty 6.0

In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.

EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

EASE-TTT creates a soft attention target from evidence chunks to guide query-side test-time adaptation, yielding higher macro-average scores than full-context, retrieval-only, and standard qTTT baselines on six LongBench QA tasks.

SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

cs.AI · 2026-03-23 · unverdicted · novelty 5.0

SOLAR introduces a self-optimizing agent using meta-learning on model weights and RL-driven strategy discovery for lifelong adaptation in LLMs, claiming superior performance on reasoning tasks across domains.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration cs.AI · 2026-04-20 · unverdicted · none · ref 44
LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and allowing a 14B model to beat Gemini-2.5-Flash.

Test-time learning for large language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer