pith. sign in

arxiv: 2605.14205 · v2 · pith:N5GYWTFUnew · submitted 2026-05-14 · 💻 cs.AI

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

Pith reviewed 2026-05-19 17:15 UTC · model grok-4.3

classification 💻 cs.AI
keywords buyer personasclickstreamsLLM web agentse-commerce simulationVQ-VAEdiscrete representationspersonalized agentsbehavior modeling
0
0 comments X

The pith

SimPersona learns discrete buyer types from raw clickstreams and maps them to tokens that guide LLM agents to simulate varied real buyers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM-based agents for online shopping tend to collapse into a single average policy instead of reflecting the range of actual customer behaviors. SimPersona extracts distinct buyer types directly from large volumes of historical click data by training a specialized autoencoder that respects behavioral sequences. These types are then linked to dedicated tokens in the agent's vocabulary so that fine-tuning teaches the model to respond differently for each type. At test time a quick encoder pass assigns the right type to each synthetic buyer and population simulations draw types from each store's observed distribution to keep the mix realistic. Tests on millions of buyers across dozens of live stores produce conversion rates that line up closely with real outcomes and show clear differences between the learned types.

Core claim

A behavior-aware VQ-VAE compresses raw clickstreams into a discrete codebook of buyer types that captures both universal shopping patterns and the specific customer mix at each merchant. Each code is mapped to a persona token inserted into the LLM vocabulary; the agent is then fine-tuned on real browsing traces so that the token steers its actions toward the corresponding type. At inference a single forward pass through the encoder selects the type for any new buyer, and population-level rollouts sample types from the merchant's empirical distribution over the codebook to reproduce observed heterogeneity without per-store prompt engineering.

What carries the argument

Behavior-aware VQ-VAE that turns clickstream sequences into discrete buyer-type codes later mapped to dedicated persona tokens for LLM guidance.

If this is right

  • Simulated buyers reach 78 percent conversion-rate alignment with real buyers across 42 held-out live stores.
  • Distinct buyer types produce interpretable and varied behavioral patterns in shopping sessions.
  • The method outperforms a baseline agent that has eight times more parameters on goal-oriented tasks.
  • Merchant-specific population distributions are preserved when sampling buyer types for large-scale simulations.
  • An open data pipeline converts raw event logs into buyer representations and training traces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same discrete types could serve as lightweight conditioning signals for testing how store layout changes affect different customer segments.
  • Extending the codes to capture session-level state changes might allow agents to model evolving intent within a single visit.
  • The persona tokens could transfer to other web-agent domains such as content recommendation or support chat to add population-level realism.

Load-bearing premise

The discrete codes learned from historical clickstreams represent stable buyer types that transfer to new stores and give the LLM effective non-overfitting guidance during fine-tuning and inference.

What would settle it

Running SimPersona agents on additional held-out storefronts and measuring a large gap between their simulated conversion rates and the actual rates recorded by real buyers on those stores.

Figures

Figures reproduced from arXiv: 2605.14205 by Alberto Castelo, Han Li, Lingyun Wang, Shuang Xie, Ted Chaiwachirasak, Zahra Zanjani Foumani.

Figure 1
Figure 1. Figure 1: SIMPERSONA framework overview. Top-left: behavioral features and product embeddings are extracted from raw clickstreams. Top-right: a behavior-aware VQ-VAE maps each buyer to one of K persona tokens. Bottom-right: two-stage SFT grounds the tokens in the LLM; first token warm-up (backbone frozen), then full fine-tuning. Bottom-left: evaluation on unseen storefronts across behavioral alignment, conversion al… view at source ↗
Figure 2
Figure 2. Figure 2: Data pipeline overview. A single enrichment pass over raw clickstream logs produces [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Data enrichment. Raw event-level tables are joined with the product catalog, collection [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: VQ-VAE input construction for a single buyer–shop pair. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: SFT trace generation from enriched clickstreams. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Stratum distribution recovery across all [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Store-level behavioral reconstruction from persona token distributions. The codebook [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-shop error-rate comparison between two-stage and single-stage SFT (sorted by two [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Two-stage persona-grounding SFT examples. Each training example consists of a system [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Persona token ablation under neutral intents. [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
read the original abstract

LLM-based web agents can navigate live storefronts, yet they often collapse to a single "average buyer" policy, failing to capture the heterogeneous and distributional nature of real buyer populations. Existing personalization methods rely on hand-crafted prompt-based personas that are brittle, difficult to scale, context-inefficient, and unable to faithfully represent population-level behavior. We introduce SimPersona, a novel framework that learns discrete buyer types from historical traffic and exposes them to LLM-based web agents as compact persona tokens. Given raw clickstreams, a behavior-aware VQ-VAE induces a discrete buyer-type space that captures the statistical structure of real buyer behavior and merchant-specific buyer population distributions. To provide behavior-specific guidance to LLM-based web agents, SimPersona maps each learned buyer type to a dedicated persona token in the LLM agent vocabulary and fine-tunes the agent with these tokens on real browsing traces. At inference, each synthetic buyer is assigned to a learned buyer type with a single encoder forward pass, requiring no retraining or store-specific prompt engineering. For population-level simulation, SimPersona samples buyer types from each merchant's empirical distribution over the learned VQ-VAE codebook and instantiates agents with the corresponding persona tokens, preserving merchant-specific buyer population distributions. Evaluated on $8.37$M buyers across $42$ held-out live storefronts, SimPersona achieves $78\%$ conversion-rate alignment with real buyers, exhibits interpretable behavioral variation across buyer types, and outperforms a baseline with $8\times$ more parameters on goal-oriented shopping tasks. We further release an open-source data pipeline that converts raw e-commerce event logs into buyer representations and agent-training traces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents SimPersona, a framework that learns discrete buyer types from raw e-commerce clickstreams via a behavior-aware VQ-VAE, maps these types to compact persona tokens in an LLM agent's vocabulary, fine-tunes the agent on real browsing traces, and at inference assigns synthetic buyers to types via a single encoder pass. For population simulation it samples from each merchant's empirical distribution over the learned codebook. The central empirical claim is that, when evaluated on 8.37M buyers across 42 held-out live storefronts, the resulting agents achieve 78% conversion-rate alignment with real buyers, display interpretable behavioral variation, and outperform an 8× larger baseline on goal-oriented shopping tasks. An open-source data pipeline converting event logs to buyer representations is also released.

Significance. If the generalization claims hold, the work offers a scalable, non-hand-crafted alternative to prompt-based personas for grounding LLM web agents in heterogeneous buyer populations. The combination of a learned discrete codebook with token-level fine-tuning and merchant-specific distribution sampling could materially improve simulation fidelity for e-commerce applications while remaining parameter-efficient. The released data pipeline is a concrete positive contribution that lowers the barrier for follow-on research.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (evaluation protocol): the 78% conversion-rate alignment and transfer claims rest on the assumption that the VQ-VAE codebook and empirical distributions were learned from a merchant-disjoint training set. The manuscript must explicitly state the merchant split used for VQ-VAE training versus the 42 held-out storefronts; without this, the alignment metric risks reflecting merchant-specific memorization rather than merchant-agnostic buyer-type generalization.
  2. [§3.2 and §5.1] §3.2 and §5.1: the behavior-aware VQ-VAE is described as capturing both statistical structure and merchant-specific distributions, yet no ablation or sensitivity analysis is reported on codebook size, commitment loss weight, or encoder architecture. These are free parameters that directly affect the induced buyer-type space; their impact on downstream alignment and interpretability should be quantified.
  3. [Table 2 / §5.2] Table 2 / §5.2: the reported outperformance versus the 8× larger baseline lacks error bars, statistical significance tests, and a precise definition of the goal-oriented shopping task success metric. Without these, it is difficult to assess whether the persona-token guidance is the load-bearing factor or whether other training differences explain the gap.
minor comments (2)
  1. [§3.3] Notation: the mapping from VQ-VAE code indices to LLM persona tokens should be given an explicit equation or algorithm box for reproducibility.
  2. [Figure 3] Figure 3 (behavioral variation): axis labels and legend entries are too small for print; increase font size and add a short caption explaining how the plotted trajectories were generated.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the generalization claims and strengthen the empirical analysis. We address each major point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (evaluation protocol): the 78% conversion-rate alignment and transfer claims rest on the assumption that the VQ-VAE codebook and empirical distributions were learned from a merchant-disjoint training set. The manuscript must explicitly state the merchant split used for VQ-VAE training versus the 42 held-out storefronts; without this, the alignment metric risks reflecting merchant-specific memorization rather than merchant-agnostic buyer-type generalization.

    Authors: We agree that explicit clarification is necessary. The VQ-VAE codebook was trained on clickstreams from a merchant-disjoint collection of 87 storefronts, with the 42 evaluation storefronts held out entirely (no overlap in merchants or sessions). We will add this detail to the abstract, §4 (evaluation protocol), and a new paragraph in §3.2 describing the data splits. This ensures the reported 78% alignment measures cross-merchant generalization. revision: yes

  2. Referee: [§3.2 and §5.1] §3.2 and §5.1: the behavior-aware VQ-VAE is described as capturing both statistical structure and merchant-specific distributions, yet no ablation or sensitivity analysis is reported on codebook size, commitment loss weight, or encoder architecture. These are free parameters that directly affect the induced buyer-type space; their impact on downstream alignment and interpretability should be quantified.

    Authors: We acknowledge the value of such analysis. In the revision we will add a sensitivity study in §5.1 (and an accompanying table) varying codebook size (K=32, 64, 128, 256), commitment loss coefficient (0.1–1.0), and encoder depth, reporting effects on conversion-rate alignment, codebook utilization, and qualitative interpretability of the resulting buyer types. This will be computed on a fixed validation split to avoid additional compute overhead. revision: yes

  3. Referee: [Table 2 / §5.2] Table 2 / §5.2: the reported outperformance versus the 8× larger baseline lacks error bars, statistical significance tests, and a precise definition of the goal-oriented shopping task success metric. Without these, it is difficult to assess whether the persona-token guidance is the load-bearing factor or whether other training differences explain the gap.

    Authors: The success metric is the fraction of episodes in which the agent completes a purchase of the target item within a 20-step budget; this definition appears in §5.2 but will be restated more precisely. We will augment Table 2 with standard-deviation error bars computed over 5 independent fine-tuning seeds and add paired t-test p-values comparing SimPersona against the baseline. These additions will be included in the revised §5.2 and Table 2 caption. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation is self-contained

full rationale

The paper trains a behavior-aware VQ-VAE on historical clickstreams to induce discrete buyer-type codes and merchant-specific distributions, then maps codes to persona tokens for fine-tuning LLM agents and evaluates conversion-rate alignment on 42 explicitly held-out live storefronts. The hold-out of storefronts separates the VQ-VAE training data from the evaluation merchants, so the reported 78% alignment and outperformance are measured against independent real-buyer traces rather than reducing to the fitted inputs by construction. No self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation chain. The framework remains empirically testable and does not collapse to tautology.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claim rests on the VQ-VAE successfully extracting a discrete space that reflects real behavioral distributions and on the persona tokens integrating cleanly into the LLM without introducing training artifacts; these choices are fitted to data rather than derived from first principles.

free parameters (2)
  • VQ-VAE codebook size (number of discrete buyer types)
    The size of the discrete codebook is a modeling choice that determines how many buyer types are induced and must be selected to balance coverage and interpretability.
  • VQ-VAE training hyperparameters (e.g., commitment loss weight, encoder architecture)
    These control how the behavior-aware VQ-VAE compresses clickstreams and are tuned on historical traffic.
axioms (2)
  • domain assumption Raw clickstream sequences contain sufficient statistical structure to induce meaningful discrete buyer types that generalize across merchants.
    The framework assumes historical traffic logs faithfully represent buyer population distributions and behavior patterns.
  • domain assumption Mapping learned types to dedicated persona tokens in the LLM vocabulary allows effective behavior-specific guidance without retraining the base model.
    The approach depends on the tokens providing stable conditioning during fine-tuning and inference.
invented entities (2)
  • discrete buyer-type space induced by behavior-aware VQ-VAE no independent evidence
    purpose: To capture the statistical structure of real buyer behavior and merchant-specific distributions in a compact, discrete form.
    This is a new postulated representation learned from data rather than observed directly.
  • persona tokens in LLM agent vocabulary no independent evidence
    purpose: To expose learned buyer types to the LLM for behavior-specific guidance during fine-tuning and inference.
    These tokens are introduced as a bridge between the VQ-VAE output and the agent.

pith-pipeline@v0.9.0 · 5853 in / 1869 out tokens · 62096 ms · 2026-05-19T17:15:29.950906+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 6 internal anchors

  1. [1]

    k-means++: The advantages of careful seeding

    David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, 2007

  2. [2]

    A dendrite method for cluster analysis.Communications in Statistics – Theory and Methods, 3(1):1–27, 1974

    Tadeusz Cali´nski and Jerzy Harabasz. A dendrite method for cluster analysis.Communications in Statistics – Theory and Methods, 3(1):1–27, 1974

  3. [3]

    Beyond demographics: Aligning role-playing llm-based agents using human belief networks

    Yun-Shiuan Chuang, Krirk Nirunwiroj, Zach Studdiford, Agam Goyal, Vincent V Frigo, Sijia Yang, Dhavan V Shah, Junjie Hu, and Timothy T Rogers. Beyond demographics: Aligning role-playing llm-based agents using human belief networks. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 14010–14026, 2024

  4. [4]

    Lawrence Erlbaum Associates, 2 edition, 1988

    Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, 2 edition, 1988

  5. [5]

    Mind2web: Towards a generalist agent for the web

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. InAdvances in Neural Information Processing Systems, volume 36, 2023

  6. [6]

    Fisher.The Design of Experiments

    Ronald A. Fisher.The Design of Experiments. Oliver and Boyd, 1935

  7. [7]

    The behavioral fabric of llm-powered gui agents: Human values and interaction outcomes

    Simret Araya Gebreegziabher, Yukun Yang, Charles Chiang, Hojun Yoo, Chaoran Chen, Hyo Jin Do, Zahra Ashktorab, Werner Geyer, Diego Gómez-Zará, and Toby Jia-Jun Li. The behavioral fabric of llm-powered gui agents: Human values and interaction outcomes. InProceedings of the 31st International Conference on Intelligent User Interfaces, pages 909–927, 2026

  8. [8]

    A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

    Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, and Aleksandra Faust. A real-world WebAgent with planning, long context understanding, and program synthesis.arXiv preprint arXiv:2307.12856, 2023

  9. [9]

    Detecting user exits from online behavior: A duration- dependent latent state model.arXiv preprint arXiv:2208.03937, 2022

    Tobias Hatt and Stefan Feuerriegel. Detecting user exits from online behavior: A duration- dependent latent state model.arXiv preprint arXiv:2208.03937, 2022

  10. [10]

    Kruskal and W

    William H. Kruskal and W. Allen Wallis. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260):583–621, 1952

  11. [11]

    Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory, 37(1):145–151, 1991

    Jianhua Lin. Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory, 37(1):145–151, 1991

  12. [12]

    Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior Data

    Yuxuan Lu, Jing Huang, Yan Han, Bingsheng Yao, Sisong Bei, Jiri Gesi, Yaochen Xie, Zheshen Wang, Qi He, and Dakuo Wang. Can llm agents simulate multi-turn human behavior? evidence from real online customer behavior data.arXiv preprint arXiv:2503.20749, 2025

  13. [13]

    Uxagent: An llm agent-based usability testing framework for web design

    Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing Huang, Zheshen Jessie Wang, Yang Li, Jiri Gesi, Qi He, Toby Jia-Jun Li, and Dakuo Wang. Uxagent: An llm agent-based usability testing framework for web design. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pages 1–12, 2025

  14. [14]

    Sunnie S. Y . Lutz et al. The prompt makes the person(a): A systematic evaluation of sociode- mographic persona prompting for large language models. InFindings of the Association for Computational Linguistics: EMNLP 2025, 2025

  15. [15]

    Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks

    Jianmo Ni et al. Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018. 10

  16. [16]

    O’Brien, Carrie J

    Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

  17. [17]

    LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

    Joon Sung Park et al. Generative agent simulations of 1,000 people.arXiv preprint arXiv:2411.10109, 2024

  18. [18]

    Generating diverse high-fidelity images with VQ-V AE-2

    Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with VQ-V AE-2. InAdvances in Neural Information Processing Systems, 2019

  19. [19]

    Character-llm: A trainable agent for role-playing

    Yunfan Shao et al. Character-llm: A trainable agent for role-playing. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

  20. [20]

    You are what you bought: Generating customer personas for e-commerce applications

    Yimin Shi, Yang Fei, Shiqi Zhang, Haixun Wang, and Xiaokui Xiao. You are what you bought: Generating customer personas for e-commerce applications. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1810–1819, 2025

  21. [21]

    In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V

    Yunxiao Shi, Wujiang Xu, Zeqi Zhang, Xing Zi, Qiang Wu, and Min Xu. Personax: A recommendation agent-oriented user modeling framework for long behavior sequence. In Findings of the Association for Computational Linguistics: ACL 2025, pages 5764–5787, Vienna, Austria, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025. findings-acl.300

  22. [22]

    Neural discrete representation learning

    Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems, 2017

  23. [23]

    Representation Learning with Contrastive Predictive Coding

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

  24. [24]

    Agenta/b: Automated and scalable web a/btesting with interactive llm agents.arXiv preprint arXiv:2504.09723, 2025

    Dakuo Wang, Ting-Yao Hsu, Yuxuan Lu, Limeng Cui, Yaochen Xie, William Headean, Bing- sheng Yao, Akash Veeragouni, Jiapeng Liu, Sreyashi Nag, and Jessie Wang. Agenta/b: Auto- mated and scalable web a/b testing with interactive llm agents.arXiv preprint arXiv:2504.09723, 2025

  25. [25]

    Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y . Zhao. Unsupervised clickstream clustering for user behavior analysis. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 225–236. ACM, 2016. doi: 10.1145/2858036. 2858107

  26. [26]

    OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

    Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-Jun Li, Lydia Chilton, and Dakuo Wang. Opera: A dataset of observation, persona, rationale, and action for evaluating llms on human online shopping behavior simulation.arXiv preprint arXiv:2506.05606...

  27. [27]

    Customer-r1: Personal- ized simulation of human behaviors via rl-based llm agent in online shopping.arXiv preprint arXiv:2510.07230, 2025

    Ziyi Wang, Yuxuan Lu, Yimeng Zhang, Jing Huang, and Dakuo Wang. Customer-r1: Personal- ized simulation of human behaviors via rl-based llm agent in online shopping.arXiv preprint arXiv:2510.07230, 2025

  28. [28]

    B. L. Welch. The generalization of ‘student’s’ problem when several different population variances are involved.Biometrika, 34(1/2):28–35, 1947

  29. [29]

    Qwen3 Technical Report

    An Yang, Baosong Yang, Beichen Zhang, Binyuan Wang, Bo Li, Bowen Liu, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  30. [30]

    TRACE: Transformer-based user representations from attributed clickstream event sequences

    Dale Yang et al. TRACE: Transformer-based user representations from attributed clickstream event sequences. InProceedings of the ACM Web Conference, 2023

  31. [31]

    Webshop: Towards scalable real-world web interaction with grounded language agents

    Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. InAdvances in Neural Information Processing Systems, volume 35, 2022. 11

  32. [32]

    Shop-r1: Rewarding llms to simulate human behavior in online shopping via reinforcement learning.arXiv preprint arXiv:2507.17842, 2025

    Yimeng Zhang, Tian Wang, Jiri Gesi, Ziyi Wang, Yuxuan Lu, Jiacheng Lin, Sinong Zhan, Vianne Gao, Ruochen Jiao, Junze Liu, et al. Shop-r1: Rewarding llms to simulate human behavior in online shopping via reinforcement learning.arXiv preprint arXiv:2507.17842, 2025

  33. [33]

    A deep Markov model for clickstream analytics in online shopping

    Wen Zheng et al. A deep Markov model for clickstream analytics in online shopping. In Proceedings of The Web Conference 2020, 2020

  34. [34]

    Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, et al

    Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, et al. Webarena: A realistic web environment for building autonomous agents. InInternational Conference on Learning Representations, 2024. 12 A Data Pipeline Figure 2 illustrates our end-to-end data pipeline described in Section 2...

  35. [35]

    you are interested in product X

    over encoder outputs from a full pass through the training set. During training, entries are updated via exponential moving averages rather than gradient descent: ek ←γe k + (1−γ) ¯zk,(7) where ¯zk is the mean of encoder outputs assigned to entry k in the current batch and γ∈[0,1) controls the memory of past assignments. To prevent codebook collapse Razav...