{"paper":{"title":"OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"The OPERA dataset records real shoppers' observations, personas, rationales, and actions to test LLMs' ability to simulate individual human online behavior.","cross_cats":["cs.HC"],"primary_cat":"cs.CL","authors_text":"Amirali Amini, Bo Sun, Dakuo Wang, Jing Huang, Jiri Gesi, Lydia Chilton, Malihe Alikhani, Tian Wang, Toby Jia-Jun Li, Upol Ehsan, Weimin Lyu, Wenbo Li, Yakov Bart, Yu Su, Yuxuan Lu, Ziyi Wang","submitted_at":"2025-06-05T21:37:49Z","abstract_excerpt":"Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating ``believable'' human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPERA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping session"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"OPERA is the first public dataset that comprehensively captures user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales from real human participants during online shopping sessions.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That participants' self-reported rationales collected via the questionnaire accurately reflect their internal reasoning at the moment of action without distortion from the act of reporting or the presence of the browser plugin.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"OPERA is the first public dataset pairing user personas, live browser observations, fine-grained web actions, and immediate self-reported rationales from real online shopping sessions to benchmark LLM prediction of individual human behavior.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"The OPERA dataset records real shoppers' observations, personas, rationales, and actions to test LLMs' ability to simulate individual human online behavior.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"1169d52b2ab5537ba1428073c769f699424333b10bf8e4783425a6e74b595c68"},"source":{"id":"2506.05606","kind":"arxiv","version":7},"verdict":{"id":"f055a0f6-1c0a-4e13-8581-7963e9ff7936","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T10:27:13.606975Z","strongest_claim":"OPERA is the first public dataset that comprehensively captures user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales from real human participants during online shopping sessions.","one_line_summary":"OPERA is the first public dataset pairing user personas, live browser observations, fine-grained web actions, and immediate self-reported rationales from real online shopping sessions to benchmark LLM prediction of individual human behavior.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That participants' self-reported rationales collected via the questionnaire accurately reflect their internal reasoning at the moment of action without distortion from the act of reporting or the presence of the browser plugin.","pith_extraction_headline":"The OPERA dataset records real shoppers' observations, personas, rationales, and actions to test LLMs' ability to simulate individual human online behavior."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2506.05606/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"897cf45dfd9a9098578b47c28d0421a46037e8f1774061baf328059aa4746b84"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}