arxiv: 2605.14205 · v1 · submitted 2026-05-14 · 💻 cs.AI

Recognition: no theorem link

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

Zahra Zanjani Foumani , Alberto Castelo , Shuang Xie , Ted Chaiwachirasak , Han Li , Lingyun Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:52 UTC · model grok-4.3

classification 💻 cs.AI

keywords buyer personasclickstreamsLLM agentse-commerceVQ-VAEpersonalizationsimulation

0 comments

The pith

SimPersona learns discrete buyer types from clickstreams to let LLM agents simulate diverse real buyer populations in e-commerce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SimPersona to make LLM web agents behave like diverse real buyers instead of collapsing to an average policy. It uses a VQ-VAE to learn discrete buyer types directly from raw clickstream data, capturing how different customers browse and buy across various stores. These types are linked to special tokens that guide the LLM agent during fine-tuning on actual traces. For any merchant, buyer populations are simulated by drawing from the observed distribution of types, producing agents that match real conversion rates at 78 percent without custom prompts per store.

Core claim

By training a behavior-aware VQ-VAE on historical e-commerce clickstreams, SimPersona extracts a compact set of discrete buyer types that reflect the statistical structure of real buyer populations. Each type is assigned a unique persona token in the LLM agent's vocabulary, enabling fine-tuning that teaches type-specific navigation and purchase behaviors. At inference, agents are instantiated by mapping new or simulated buyers to these tokens, preserving merchant-specific distributions and achieving strong alignment with observed real-world outcomes.

What carries the argument

The behavior-aware VQ-VAE inducing the discrete buyer-type codebook from clickstreams, along with the mapping of types to persona tokens for LLM conditioning.

If this is right

Population-level simulations become possible by sampling buyer types according to each store's empirical distribution.
Agent assignment to a persona requires only one forward pass through the encoder with no retraining needed.
Goal-oriented shopping performance improves over baselines that use eight times more parameters.
Distinct behavioral patterns emerge across the learned buyer types, making them interpretable from click data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Store designers could use these simulated populations to test layout changes before deployment.
Updating the codebook periodically with new clickstreams might allow the system to track shifts in buyer behavior over time.
The discrete representation could reduce computational costs when running large-scale agent evaluations compared to fully prompt-based personalization.

Load-bearing premise

The buyer types discovered from past clickstreams will accurately predict behavior in ongoing live interactions without substantial changes in customer preferences or platform features.

What would settle it

If agents using the learned personas produce conversion rates that deviate significantly from real buyer data on additional unseen storefronts, or if the types fail to differentiate between customers with measurably different purchase histories.

Figures

Figures reproduced from arXiv: 2605.14205 by Alberto Castelo, Han Li, Lingyun Wang, Shuang Xie, Ted Chaiwachirasak, Zahra Zanjani Foumani.

**Figure 1.** Figure 1: SIMPERSONA framework overview. Top-left: behavioral features and product embeddings are extracted from raw clickstreams. Top-right: a behavior-aware VQ-VAE maps each buyer to one of K persona tokens. Bottom-right: two-stage SFT grounds the tokens in the LLM; first token warm-up (backbone frozen), then full fine-tuning. Bottom-left: evaluation on unseen storefronts across behavioral alignment, conversion al… view at source ↗

**Figure 2.** Figure 2: Data pipeline overview. A single enrichment pass over raw clickstream logs produces [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Data enrichment. Raw event-level tables are joined with the product catalog, collection [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: VQ-VAE input construction for a single buyer–shop pair. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: SFT trace generation from enriched clickstreams. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Stratum distribution recovery across all [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Store-level behavioral reconstruction from persona token distributions. The codebook [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Per-shop error-rate comparison between two-stage and single-stage SFT (sorted by two [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Two-stage persona-grounding SFT examples. Each training example consists of a system [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Persona token ablation under neutral intents. [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

read the original abstract

LLM-based web agents can navigate live storefronts, yet they often collapse to a single "average buyer" policy, failing to capture the heterogeneous and distributional nature of real buyer populations. Existing personalization methods rely on hand-crafted prompt-based personas that are brittle, difficult to scale, context-inefficient, and unable to faithfully represent population-level behavior. We introduce SimPersona, a novel framework that learns discrete buyer types from historical traffic and exposes them to LLM-based web agents as compact persona tokens. Given raw clickstreams, a behavior-aware VQ-VAE induces a discrete buyer-type space that captures the statistical structure of real buyer behavior and merchant-specific buyer population distributions. To provide behavior-specific guidance to LLM-based web agents, SimPersona maps each learned buyer type to a dedicated persona token in the LLM agent vocabulary and fine-tunes the agent with these tokens on real browsing traces. At inference, each synthetic buyer is assigned to a learned buyer type with a single encoder forward pass, requiring no retraining or store-specific prompt engineering. For population-level simulation, SimPersona samples buyer types from each merchant's empirical distribution over the learned VQ-VAE codebook and instantiates agents with the corresponding persona tokens, preserving merchant-specific buyer population distributions. Evaluated on $8.37$M buyers across $42$ held-out live storefronts, SimPersona achieves $78\%$ conversion-rate alignment with real buyers, exhibits interpretable behavioral variation across buyer types, and outperforms a baseline with $8\times$ more parameters on goal-oriented shopping tasks. We further release an open-source data pipeline that converts raw e-commerce event logs into buyer representations and agent-training traces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SimPersona turns clickstreams into discrete LLM persona tokens via VQ-VAE for e-commerce agent simulation at scale, but live behavioral fidelity remains lightly tested.

read the letter

The core move here is training a VQ-VAE on raw clickstreams to produce a codebook of discrete buyer types, then mapping those codes to dedicated tokens in the LLM vocabulary and fine-tuning the agent on the same traces. At inference the encoder assigns a type in one pass and population sampling draws from each merchant's empirical distribution over the codebook. This replaces brittle prompt personas with something that can be learned directly from traffic and reused across stores without per-merchant engineering.

Referee Report

3 major / 2 minor

Summary. The manuscript presents SimPersona, a framework that trains a behavior-aware VQ-VAE on raw clickstreams to induce a discrete codebook of buyer types, maps each code to a dedicated persona token in an LLM agent's vocabulary, and fine-tunes the agent on the same traces. At inference, buyer types are assigned via a single encoder pass and agents are instantiated by sampling from each merchant's empirical distribution over the codebook, enabling population-level simulation without store-specific prompt engineering. On 8.37M buyers across 42 held-out live storefronts the method reports 78% conversion-rate alignment with real buyers, interpretable behavioral variation across types, and outperformance versus an 8× larger baseline on goal-oriented tasks; an open-source pipeline converting event logs to buyer representations is also released.

Significance. If the transfer from offline VQ-VAE codes to live LLM policies holds, the work supplies a scalable, data-driven alternative to hand-crafted personas for grounding e-commerce agents in real population distributions. The large-scale held-out evaluation and open-source pipeline are concrete strengths that would support reproducibility and further research in sequential behavior modeling.

major comments (3)

[Results section] Results section: the 78% conversion-rate alignment is presented without the precise definition of the metric, the exact baseline architecture, or explicit controls for store-specific confounders (e.g., UI differences or traffic seasonality), making it difficult to assess whether the reported outperformance is robust.
[Method and Evaluation] Method and Evaluation: no quantitative comparison of simulated versus real session trajectories on the 42 held-out stores is reported (e.g., KL divergence or Wasserstein distance on next-action distributions conditioned on state and buyer type), so aggregate conversion alignment may mask per-type policy deviations under live site dynamics.
[Inference procedure] Inference procedure: the claim that a single encoder forward pass plus persona token suffices for faithful transfer to new live interactions rests on the untested assumption that historical clickstream statistics remain representative under actual site response feedback loops; this load-bearing transfer step lacks direct validation.

minor comments (2)

[Abstract] Abstract: adding the VQ-VAE codebook size (number of discrete types) used in the main experiments would give readers immediate context for the scale of the learned persona space.
[Pipeline release] Pipeline release: the main text should include a short usage example or pointer to the exact repository contents so that the open-source contribution can be immediately reproduced.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. Below we provide detailed responses to each major comment, indicating the revisions made to address them.

read point-by-point responses

Referee: [Results section] Results section: the 78% conversion-rate alignment is presented without the precise definition of the metric, the exact baseline architecture, or explicit controls for store-specific confounders (e.g., UI differences or traffic seasonality), making it difficult to assess whether the reported outperformance is robust.

Authors: We agree that greater clarity on the evaluation metric and controls is needed. In the revised manuscript, we have added the precise definition of the conversion-rate alignment metric as the percentage of stores where the absolute difference between simulated and real conversion rates is below 5%. We have also detailed the baseline architecture as an 8× larger LLM agent fine-tuned on the same traces without persona tokens, and incorporated explicit controls by aligning evaluation periods to account for seasonality and using the same storefront interfaces to mitigate UI confounders. revision: yes
Referee: [Method and Evaluation] Method and Evaluation: no quantitative comparison of simulated versus real session trajectories on the 42 held-out stores is reported (e.g., KL divergence or Wasserstein distance on next-action distributions conditioned on state and buyer type), so aggregate conversion alignment may mask per-type policy deviations under live site dynamics.

Authors: We concur that trajectory-level distributional comparisons would strengthen the claims. Although the current evaluation focuses on conversion alignment and interpretable type variations, we have now computed and added Wasserstein distances on the next-action distributions (conditioned on state and buyer type) between simulated and real sessions across the 42 held-out stores in the revised Evaluation section. revision: yes
Referee: [Inference procedure] Inference procedure: the claim that a single encoder forward pass plus persona token suffices for faithful transfer to new live interactions rests on the untested assumption that historical clickstream statistics remain representative under actual site response feedback loops; this load-bearing transfer step lacks direct validation.

Authors: The inference procedure is supported by the overall performance on live held-out interactions. However, we accept that the assumption regarding the representativeness of historical statistics under feedback loops lacks isolated direct validation. We have added a discussion of this assumption, including its potential limitations, to the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core pipeline trains a behavior-aware VQ-VAE on historical clickstreams to induce discrete buyer-type codes, maps those codes to LLM persona tokens, and fine-tunes the agent on the same traces before evaluating conversion-rate alignment on 42 held-out live storefronts. This is a standard empirical training-plus-held-out-evaluation workflow; the reported 78% alignment is measured against external real-buyer distributions rather than being a quantity that equals its own fitted inputs by construction. No self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described method. The derivation therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework depends on the VQ-VAE successfully capturing statistically meaningful buyer clusters from clickstreams and on those clusters remaining useful when injected as tokens into an LLM.

free parameters (1)

VQ-VAE codebook size (number of discrete buyer types)
Hyperparameter that determines how many distinct personas are induced; its value is chosen to balance coverage and interpretability.

axioms (1)

domain assumption Behavior embeddings from clickstreams can be discretized into a finite codebook that preserves population-level statistical structure
Invoked when the VQ-VAE is trained on raw traffic to produce merchant-specific distributions.

pith-pipeline@v0.9.0 · 5622 in / 1248 out tokens · 39175 ms · 2026-05-15T02:52:59.932218+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 5 internal anchors

[1]

k-means++: The advantages of careful seeding

David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, 2007

work page 2007
[2]

A dendrite method for cluster analysis.Communications in Statistics – Theory and Methods, 3(1):1–27, 1974

Tadeusz Cali´nski and Jerzy Harabasz. A dendrite method for cluster analysis.Communications in Statistics – Theory and Methods, 3(1):1–27, 1974

work page 1974
[3]

Beyond demographics: Aligning role-playing llm-based agents using human belief networks

Yun-Shiuan Chuang, Krirk Nirunwiroj, Zach Studdiford, Agam Goyal, Vincent V Frigo, Sijia Yang, Dhavan V Shah, Junjie Hu, and Timothy T Rogers. Beyond demographics: Aligning role-playing llm-based agents using human belief networks. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 14010–14026, 2024

work page 2024
[4]

Lawrence Erlbaum Associates, 2 edition, 1988

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, 2 edition, 1988

work page 1988
[5]

Mind2web: Towards a generalist agent for the web

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. InAdvances in Neural Information Processing Systems, volume 36, 2023

work page 2023
[6]

Fisher.The Design of Experiments

Ronald A. Fisher.The Design of Experiments. Oliver and Boyd, 1935

work page 1935
[7]

The behavioral fabric of llm-powered gui agents: Human values and interaction outcomes

Simret Araya Gebreegziabher, Yukun Yang, Charles Chiang, Hojun Yoo, Chaoran Chen, Hyo Jin Do, Zahra Ashktorab, Werner Geyer, Diego Gómez-Zará, and Toby Jia-Jun Li. The behavioral fabric of llm-powered gui agents: Human values and interaction outcomes. InProceedings of the 31st International Conference on Intelligent User Interfaces, pages 909–927, 2026

work page 2026
[8]

A real-world WebAgent with planning, long context understanding, and program synthesis.arXiv preprint arXiv:2307.12856, 2023

Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, and Aleksandra Faust. A real-world WebAgent with planning, long context understanding, and program synthesis.arXiv preprint arXiv:2307.12856, 2023

work page arXiv 2023
[9]

Detecting user exits from online behavior: A duration- dependent latent state model.arXiv preprint arXiv:2208.03937, 2022

Tobias Hatt and Stefan Feuerriegel. Detecting user exits from online behavior: A duration- dependent latent state model.arXiv preprint arXiv:2208.03937, 2022

work page arXiv 2022
[10]

Kruskal and W

William H. Kruskal and W. Allen Wallis. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260):583–621, 1952

work page 1952
[11]

Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory, 37(1):145–151, 1991

Jianhua Lin. Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory, 37(1):145–151, 1991

work page 1991
[12]

Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior Data

Yuxuan Lu, Jing Huang, Yan Han, Bingsheng Yao, Sisong Bei, Jiri Gesi, Yaochen Xie, Zheshen Wang, Qi He, and Dakuo Wang. Can llm agents simulate multi-turn human behavior? evidence from real online customer behavior data.arXiv preprint arXiv:2503.20749, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Uxagent: An llm agent-based usability testing framework for web design

Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing Huang, Zheshen Jessie Wang, Yang Li, Jiri Gesi, Qi He, Toby Jia-Jun Li, and Dakuo Wang. Uxagent: An llm agent-based usability testing framework for web design. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pages 1–12, 2025

work page 2025
[14]

Sunnie S. Y . Lutz et al. The prompt makes the person(a): A systematic evaluation of sociode- mographic persona prompting for large language models. InFindings of the Association for Computational Linguistics: EMNLP 2025, 2025

work page 2025
[15]

Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks

Jianmo Ni et al. Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018. 10

work page 2018
[16]

O’Brien, Carrie J

Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

work page 2023
[17]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Joon Sung Park et al. Generative agent simulations of 1,000 people.arXiv preprint arXiv:2411.10109, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Generating diverse high-fidelity images with VQ-V AE-2

Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with VQ-V AE-2. InAdvances in Neural Information Processing Systems, 2019

work page 2019
[19]

Character-llm: A trainable agent for role-playing

Yunfan Shao et al. Character-llm: A trainable agent for role-playing. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[20]

You are what you bought: Generating customer personas for e-commerce applications

Yimin Shi, Yang Fei, Shiqi Zhang, Haixun Wang, and Xiaokui Xiao. You are what you bought: Generating customer personas for e-commerce applications. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1810–1819, 2025

work page 2025
[21]

Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, et al

Yunxiao Shi, Wujiang Xu, Zeqi Zhang, Xing Zi, Qiang Wu, and Min Xu. Personax: A recommendation agent-oriented user modeling framework for long behavior sequence. In Findings of the Association for Computational Linguistics: ACL 2025, pages 5764–5787, Vienna, Austria, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025. findings-acl.300

work page doi:10.18653/v1/2025 2025
[22]

Neural discrete representation learning

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems, 2017

work page 2017
[23]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Agenta/b: Auto- mated and scalable web a/b testing with interactive llm agents.arXiv preprint arXiv:2504.09723, 2025

Dakuo Wang, Ting-Yao Hsu, Yuxuan Lu, Limeng Cui, Yaochen Xie, William Headean, Bing- sheng Yao, Akash Veeragouni, Jiapeng Liu, Sreyashi Nag, and Jessie Wang. Agenta/b: Auto- mated and scalable web a/b testing with interactive llm agents.arXiv preprint arXiv:2504.09723, 2025

work page arXiv 2025
[25]

Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y . Zhao. Unsupervised clickstream clustering for user behavior analysis. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 225–236. ACM, 2016. doi: 10.1145/2858036. 2858107

work page doi:10.1145/2858036 2016
[26]

OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-Jun Li, Lydia Chilton, and Dakuo Wang. Opera: A dataset of observation, persona, rationale, and action for evaluating llms on human online shopping behavior simulation.arXiv preprint arXiv:2506.05606...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.05606 2025
[27]

Customer-r1: Personal- ized simulation of human behaviors via rl-based llm agent in online shopping.arXiv preprint arXiv:2510.07230, 2025

Ziyi Wang, Yuxuan Lu, Yimeng Zhang, Jing Huang, and Dakuo Wang. Customer-r1: Personal- ized simulation of human behaviors via rl-based llm agent in online shopping.arXiv preprint arXiv:2510.07230, 2025

work page arXiv 2025
[28]

B. L. Welch. The generalization of ‘student’s’ problem when several different population variances are involved.Biometrika, 34(1/2):28–35, 1947

work page 1947
[29]

Qwen3 Technical Report

An Yang, Baosong Yang, Beichen Zhang, Binyuan Wang, Bo Li, Bowen Liu, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

TRACE: Transformer-based user representations from attributed clickstream event sequences

Dale Yang et al. TRACE: Transformer-based user representations from attributed clickstream event sequences. InProceedings of the ACM Web Conference, 2023

work page 2023
[31]

Webshop: Towards scalable real-world web interaction with grounded language agents

Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. InAdvances in Neural Information Processing Systems, volume 35, 2022. 11

work page 2022
[32]

Shop-r1: Rewarding llms to simulate human behavior in online shopping via reinforcement learning.arXiv preprint arXiv:2507.17842, 2025

Yimeng Zhang, Tian Wang, Jiri Gesi, Ziyi Wang, Yuxuan Lu, Jiacheng Lin, Sinong Zhan, Vianne Gao, Ruochen Jiao, Junze Liu, et al. Shop-r1: Rewarding llms to simulate human behavior in online shopping via reinforcement learning.arXiv preprint arXiv:2507.17842, 2025

work page arXiv 2025
[33]

A deep Markov model for clickstream analytics in online shopping

Wen Zheng et al. A deep Markov model for clickstream analytics in online shopping. In Proceedings of The Web Conference 2020, 2020

work page 2020
[34]

Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, et al

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, et al. Webarena: A realistic web environment for building autonomous agents. InInternational Conference on Learning Representations, 2024. 12 A Data Pipeline Figure 2 illustrates our end-to-end data pipeline described in Section 2...

work page 2024
[35]

you are interested in product X

over encoder outputs from a full pass through the training set. During training, entries are updated via exponential moving averages rather than gradient descent: ek ←γe k + (1−γ) ¯zk,(7) where ¯zk is the mean of encoder outputs assigned to entry k in the current batch and γ∈[0,1) controls the memory of past assignments. To prevent codebook collapse Razav...

work page