pith. sign in

← back to paper

Review history

arxiv: 2604.08362 · 2 revisions

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

  1. 2026-05-22 UNVERDICTED LOW v0.9.0 novelty 7.0
    66873 ms 5785 in 1357 out 2026-05-22T10:26:50.614032+00:00
  2. 2026-05-10 UNVERDICTED LOW v0.9.0 novelty 7.0
    82018 ms 5555 in 1219 out 2026-05-10T18:16:31.449091+00:00