Figure 12. Response profiles of a ‘state exploration’, ‘parameter exploration’ and ‘random exploration’ agent in a task that requires learning or inference.

Context consumption feedback:After each tool call, the agent receives feedback about remaining context budget, approximate context occupancy · DOI 10.7554/elife.41703.013

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

FutureSim: Replaying World Events to Evaluate Adaptive Agents

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

FutureSim is a benchmark that replays real news from January to March 2026 for AI agents to forecast events, with top accuracy at 25% and some agents worse than no-prediction baselines on Brier skill score.

citing papers explorer

Showing 1 of 1 citing paper.

FutureSim: Replaying World Events to Evaluate Adaptive Agents cs.LG · 2026-05-14 · unverdicted · none · ref 4
FutureSim is a benchmark that replays real news from January to March 2026 for AI agents to forecast events, with top accuracy at 25% and some agents worse than no-prediction baselines on Brier skill score.

Figure 12. Response profiles of a ‘state exploration’, ‘parameter exploration’ and ‘random exploration’ agent in a task that requires learning or inference.

fields

years

verdicts

representative citing papers

citing papers explorer