BehaviorBench reconstructs 2,000 real wallets into 141k belief and 1.4M trade prediction tasks to test if personalization from history improves model performance over non-personalized baselines.
How far are LLMs from being our digital twins? a benchmark for persona-based behavior chain simulation
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces
BehaviorBench reconstructs 2,000 real wallets into 141k belief and 1.4M trade prediction tasks to test if personalization from history improves model performance over non-personalized baselines.