EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.
How to grow a mind: Statistics, structure, and abstraction.science, 331(6022):1279–1285
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.
A Bayesian framework disentangles topic, agreement, and anchoring biases from interaction effects in LLM multi-turn dialogues, revealing convergence to attractors that shift with fine-tuning.
The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.
citing papers explorer
-
Test-Time Learning with an Evolving Library
EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.
-
Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners
Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.
-
Disentangling Interaction and Bias Effects in Opinion Dynamics of Large Language Models
A Bayesian framework disentangles topic, agreement, and anchoring biases from interaction effects in LLM multi-turn dialogues, revealing convergence to attractors that shift with fine-tuning.
-
Measuring and mitigating overreliance to build human-compatible AI
The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.