QueST adapts LLMs at test time by generating query-specific problem-solution pairs for self-supervised fine-tuning, improving reasoning performance without external data.
Test-time learning for large language models
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 10representative citing papers
Evidence-informed belief updates make Bayesian surprise non-stationary in LLM hypothesis search, with embedding-based RAG identifying 37.5% spurious static surprisals and modified search (filtering plus diversity) yielding 30.62% higher accumulated non-stationary surprisal across five domains.
HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.
Proposes PDF, a hierarchical multi-agent Perception-to-Deliberation Framework that adds experience self-evolution and test-time scaling to composed image retrieval, claiming SOTA on CIRR, CIRCO, and FashionIQ.
UG-TTT adds epistemic uncertainty measured by adapter disagreement as an exploration bonus in RL for LLMs, raising maximum reward and diversity on scientific discovery benchmarks.
BOLT is a 0.9M-parameter plug-and-play module that uses ego-as-teacher distillation on high-confidence predictions to align neighbor features online, raising AP@50 by up to 32.3 points over unadapted fusion while beating ego-only baselines on DAIR-V2X and OPV2V.
LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and allowing a 14B model to beat Gemini-2.5-Flash.
PreRL applies reward-driven updates to P(y) in pre-train space, uses Negative Sample Reinforcement to prune bad reasoning paths and boost reflection, and combines with standard RL in Dual Space RL to outperform baselines on reasoning tasks.
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
SOLAR introduces a self-optimizing agent using meta-learning on model weights and RL-driven strategy discovery for lifelong adaptation in LLMs, claiming superior performance on reasoning tasks across domains.
citing papers explorer
-
HMARS: A Hierarchical Multi-Agent Memory System for Long-Context Reasoning
HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.