Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

Fatemeh Pesaran zadeh; Gunhee Kim; Seyeon Choi; Siva Reddy; Xing Han L\`u

arxiv: 2605.20291 · v1 · pith:AKPME5T6new · submitted 2026-05-19 · 💻 cs.LG

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

Fatemeh Pesaran zadeh , Seyeon Choi , Xing Han L\`u , Siva Reddy , Gunhee Kim This is my paper

Pith reviewed 2026-05-21 08:04 UTC · model grok-4.3

classification 💻 cs.LG

keywords web agentsout-of-domain generalizationtrajectory selectiondata efficiencyimportance and diversitygreedy algorithmAXTree pruningLLM agents

0 comments

The pith

Selecting important and diverse trajectories lets web agents generalize out of domain while cutting training costs by an order of magnitude.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to demonstrate that web agents trained offline on full trajectory datasets can be made to generalize better to new websites by instead using a carefully chosen smaller subset of training data. The core idea is to pick trajectories that are both important on their own and diverse from each other in terms of the states, websites, and interaction patterns they involve, using a greedy algorithm to solve this selection problem under a fixed budget. Additional steps like pruning accessibility trees to focus only on the target of each action and generating reasoning in the model's own style further boost efficiency and reduce mismatch. A sympathetic reader would care because current approaches waste compute on redundant or noisy data and still fail when the agent encounters unfamiliar sites or tasks.

Core claim

The central discovery is that a greedy optimization of an objective combining unary importance scores with pairwise diversity measures across states, websites, and interaction patterns can identify a compact set of trajectories that, when used for fine-tuning, yields superior out-of-domain performance on web agent benchmarks compared to using the entire dataset, while delivering training speedups of approximately 9.7 to 12.5 times.

What carries the argument

The importance-diversity objective solved greedily to select trajectory steps, combined with target-centered AXTree pruning and model-generated rationales.

If this is right

Out-of-domain success rates increase on WebArena, WorkArena, and MiniWob when training with the selected data.
Training time is reduced by factors of 9.7 to 12.5 across Qwen2.5-7B, Gemma3-4B, and Qwen3-8B models.
The method applies to both AgentTrek and NNetNav training datasets.
Style-consistent rationales help reasoning-native models adapt better.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar selection criteria could improve efficiency in training agents for other environments like mobile apps or games.
The focus on diversity over interaction patterns may help address long-tail behaviors in agent tasks.
Reducing data volume this way might lower the barrier to iterating on web agent designs.

Load-bearing premise

A greedy solution to balancing importance and diversity will reliably choose trajectories from which the model learns generalizable behaviors for unseen websites and tasks.

What would settle it

If experiments on held-out websites show that models fine-tuned on Weasel-selected trajectories achieve lower task success rates than those trained on the full dataset or on randomly sampled trajectories of equal size.

Figures

Figures reproduced from arXiv: 2605.20291 by Fatemeh Pesaran zadeh, Gunhee Kim, Seyeon Choi, Siva Reddy, Xing Han L\`u.

**Figure 1.** Figure 1: Overview of WEASEL. Conventional trained web agents show a sharp performance drop under out-of-domain shifts to unseen websites and interaction patterns. WEASEL tackles this challenge via novel trajectory selection: it scores offline demonstration steps for goal relevance and diversity, then applies greedy subset selection under a fixed budget. Agents trained with WEASEL generalize better to unseen test… view at source ↗

**Figure 2.** Figure 2: (Left): An example of a curated trajectory after applying WEASEL. Although the original collected data contain noisy steps (t = 4), and erroneous actions (t = 0), WEASEL selects a compact subset that retains only the most informative steps (in red) for the goal. (Right): Overview of WEASEL. We first perform element-wise score calculation using unary importance and pairwise diversity. WEASEL then applies a … view at source ↗

**Figure 3.** Figure 3: Token distribution of 10K subsamples of AgentTrek (Xu et al., 2024) before pruning (green) and after target-centered pruning (blue). Pruning substantially reduces long-tail states, making the resulting sequences more manageable for training. quality term plus a sum of pairwise distances under a cardinality constraint (Borodin et al., 2017). For metric distances, a greedy algorithm achieves a constant-fa… view at source ↗

**Figure 4.** Figure 4: An illustration of Target-centered Pruning. Given a state st in the form of AXTree and gold action at, we retain only the AXTree elements within a fixed window of size w centered at the target index k ∗ t , producing the pruned state s˜t. The k-th node in the linearized AXTree at step t is denoted vt,k (e.g., vt,1, vt,2), and vt,k∗ t is the gold target node. 2.4. Target-centered Pruning Web states can be p… view at source ↗

**Figure 5.** Figure 5: Success rate decreases as the pruning offset increases. Results are reported on WebArena-Lite [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories and long accessibility-tree (AXTree) states. To address both issues, we propose Weasel, a trajectory selection method for offline training of web agents. Weasel selects a fixed-budget subset of trajectory steps by optimizing an objective that balances unary importance with pairwise diversity over states, websites, and interaction patterns, solving efficiently with a greedy algorithm. We further improve efficiency with target-centered AXTree pruning that keeps only content around the ground-truth action target, and we mitigate style mismatch for reasoning-native models by replacing expert traces with model-generated, style-consistent rationales. Across AgentTrek and NNetNav training datasets, evaluations in WebArena, WorkArena, and MiniWob, and experiments with Qwen2.5-7B, Gemma3-4B, and Qwen3-8B, Weasel improves out-of-domain performance while reducing training cost, producing roughly 9.7-12.5$\times$ training speedups over standard fine-tuning. We make the code available at https://github.com/fatemehpesaran310/weasel.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Weasel shows a practical greedy selection method that cuts training cost and lifts OOD scores on web agents, but the diversity term's contribution to generalization is not clearly isolated.

read the letter

Weasel picks a fixed-budget subset of trajectory steps by balancing unary importance with pairwise diversity over states, websites, and interaction patterns, then adds target-centered AXTree pruning and swaps expert traces for model-generated rationales. The main result is that this produces better out-of-domain performance on WebArena, WorkArena, and MiniWob while delivering roughly 10x training speedups over full fine-tuning across a few models and two training datasets.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Weasel, a trajectory selection method for offline training of web agents. It selects a fixed-budget subset of trajectory steps by optimizing an objective that balances unary importance with pairwise diversity over states, websites, and interaction patterns, solved via a greedy algorithm. Additional components include target-centered AXTree pruning and replacement of expert traces with model-generated rationales for style consistency. Experiments on AgentTrek and NNetNav training data, evaluated on WebArena, WorkArena, and MiniWob with Qwen2.5-7B, Gemma3-4B, and Qwen3-8B models, report improved out-of-domain performance together with 9.7-12.5× training speedups relative to standard fine-tuning. Code is released at the cited GitHub repository.

Significance. If the reported gains prove robust, the approach could meaningfully advance efficient offline training of generalizable web agents by addressing redundancy and noise in trajectory data. The public code release supports reproducibility and is a clear strength.

major comments (2)

[Abstract] Abstract: the reported OOD gains and speedups are presented without error bars, exact baseline implementation details, or an ablation isolating the diversity term from AXTree pruning and rationale replacement; these omissions are load-bearing because they prevent determining whether the central selection procedure, rather than the auxiliary efficiency steps, drives the claimed improvements.
[Method (objective and greedy algorithm)] Method section describing the objective and greedy algorithm: the claim that optimizing unary importance plus pairwise diversity over states/websites/patterns produces trajectories whose induced policies transfer to unseen websites and tasks rests on the untested assumption that the diversity term captures cross-domain interaction patterns rather than merely reducing in-domain redundancy. Without targeted ablations (e.g., diversity term removed, random selection baseline, or correlation analysis between marginal gains and OOD robustness), the observed gains on WebArena/WorkArena/MiniWob could be explained by the other modifications instead.

minor comments (1)

The abstract states a selection budget but does not report its concrete value or sensitivity analysis; adding this would improve clarity without altering the central claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions that will be incorporated to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the reported OOD gains and speedups are presented without error bars, exact baseline implementation details, or an ablation isolating the diversity term from AXTree pruning and rationale replacement; these omissions are load-bearing because they prevent determining whether the central selection procedure, rather than the auxiliary efficiency steps, drives the claimed improvements.

Authors: We agree that error bars, precise baseline details, and an isolating ablation are necessary to attribute gains clearly. In the revised manuscript we will add error bars to all reported OOD and speedup results, expand the experimental section with exact baseline implementation details (including training hyperparameters, data preprocessing, and model versions), and insert a dedicated ablation that holds AXTree pruning and rationale replacement fixed while varying only the selection objective (full importance-diversity vs. importance-only vs. random). These changes will isolate the contribution of the core selection procedure. revision: yes
Referee: [Method (objective and greedy algorithm)] Method section describing the objective and greedy algorithm: the claim that optimizing unary importance plus pairwise diversity over states/websites/patterns produces trajectories whose induced policies transfer to unseen websites and tasks rests on the untested assumption that the diversity term captures cross-domain interaction patterns rather than merely reducing in-domain redundancy. Without targeted ablations (e.g., diversity term removed, random selection baseline, or correlation analysis between marginal gains and OOD robustness), the observed gains on WebArena/WorkArena/MiniWob could be explained by the other modifications instead.

Authors: We acknowledge that the current manuscript does not contain an explicit ablation removing the diversity term or a correlation analysis linking diversity metrics to OOD gains. To address this directly, the revision will add (i) an ablation that removes the pairwise diversity component while retaining importance scoring, AXTree pruning, and rationale replacement, (ii) a random-selection baseline matched for budget, and (iii) a supplementary analysis correlating per-trajectory diversity scores with observed OOD performance deltas across the three evaluation suites. While we continue to hold that the multi-aspect diversity objective (states, websites, interaction patterns) is motivated by the goal of broader coverage, the requested ablations will provide the empirical evidence needed to substantiate its role in OOD transfer. revision: yes

Circularity Check

0 steps flagged

Empirical selection procedure with no definitional circularity

full rationale

The paper describes Weasel as a practical trajectory selection algorithm that optimizes a unary-importance-plus-pairwise-diversity objective via a stated greedy procedure, followed by AXTree pruning and rationale replacement. Reported OOD gains and 9.7-12.5× speedups are obtained from direct experimental comparisons on AgentTrek, NNetNav, WebArena, WorkArena, and MiniWob with multiple base models; these outcomes are not algebraically forced by any fitted parameter, self-referential normalization, or uniqueness theorem internal to the paper. The method is self-contained against external benchmarks and contains no load-bearing self-citations or ansatzes that reduce the central claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only view limits visibility; inferred elements are the fixed selection budget (hyperparameter) and the claim that the greedy algorithm sufficiently approximates the combinatorial objective. No new physical entities or ad-hoc constants are introduced.

free parameters (1)

selection budget
Fixed number of trajectory steps retained; chosen to control training cost.

axioms (1)

domain assumption Greedy algorithm yields a good approximation to the joint importance-diversity objective
Invoked to make selection tractable for large trajectory pools.

pith-pipeline@v0.9.0 · 5792 in / 1342 out tokens · 66143 ms · 2026-05-21T08:04:22.068509+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formulate a fixed-budget subset selection problem with a quadratic objective that balances unary importance with pairwise diversity over states, websites, and interaction patterns, solving efficiently with a greedy algorithm.
IndisputableMonolith/Foundation/BranchSelection.lean RCLCombiner_isCoupling_iff unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

D(i, j) = max(δ(si, sj), δ(yi, yj)) with δ = 1 − BERTScore

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 3 internal anchors

[1]

2023 , eprint=

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining , author=. 2023 , eprint=

work page 2023
[2]

2021 , eprint=

Learning Transferable Visual Models From Natural Language Supervision , author=. 2021 , eprint=

work page 2021
[3]

2025 , eprint=

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features , author=. 2025 , eprint=

work page 2025
[4]

2023 , eprint=

Android in the Wild: A Large-Scale Dataset for Android Device Control , author=. 2023 , eprint=

work page 2023
[5]

Data Diversity Matters for Robust Instruction Tuning

Bukharin, Alexander and Li, Shiyang and Wang, Zhengyang and Yang, Jingfeng and Yin, Bing and Li, Xian and Zhang, Chao and Zhao, Tuo and Jiang, Haoming. Data Diversity Matters for Robust Instruction Tuning. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.195

work page doi:10.18653/v1/2024.findings-emnlp.195 2024
[6]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Scaling Instruction-Finetuned Language Models

Scaling Instruction-Finetuned Language Models , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2210.11416 , author =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.11416 2022
[8]

2024 , eprint=

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents , author=. 2024 , eprint=

work page 2024
[9]

2021 , eprint=

LoRA: Low-Rank Adaptation of Large Language Models , author=. 2021 , eprint=

work page 2021
[10]

2025 , eprint=

LineRetriever: Planning-Aware Observation Reduction for Web Agents , author=. 2025 , eprint=

work page 2025
[11]

2025 , eprint=

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents , author=. 2025 , eprint=

work page 2025
[12]

2025 , eprint=

Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents , author=. 2025 , eprint=

work page 2025
[13]

2025 , eprint=

Less is More: Improving LLM Alignment via Preference Data Selection , author=. 2025 , eprint=

work page 2025
[14]

2021 , eprint=

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning , author=. 2021 , eprint=

work page 2021
[15]

2020 , eprint=

Coresets for Data-efficient Training of Machine Learning Models , author=. 2020 , eprint=

work page 2020
[16]

2018 , eprint=

Active Learning for Convolutional Neural Networks: A Core-Set Approach , author=. 2018 , eprint=

work page 2018
[17]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , title =. CoRR , volume =. 2019 , archivePrefix =. 1907.11692 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2019
[18]

2025 , eprint=

Retrieval-augmented GUI Agents with Generative Guidelines , author=. 2025 , eprint=

work page 2025
[19]

2025 , eprint=

Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation , author=. 2025 , eprint=

work page 2025
[20]

Proceedings of the 34th International Conference on Machine Learning , pages =

World of Bits: An Open-Domain Platform for Web-Based Agents , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

work page 2017
[21]

2018 , eprint=

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration , author=. 2018 , eprint=

work page 2018
[22]

and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , booktitle =

Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , booktitle =. 2024 , editor =

work page 2024
[23]

Transactions on Machine Learning Research , issn=

The BrowserGym Ecosystem for Web Agent Research , author=. Transactions on Machine Learning Research , issn=. 2025 , note=

work page 2025
[24]

2023 , eprint=

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. 2023 , eprint=

work page 2023
[25]

2025 , eprint=

Gemma 3 Technical Report , author=. 2025 , eprint=

work page 2025
[26]

ACM Transactions on Algorithms (TALG) , volume=

Max-sum diversification, monotone submodular functions, and dynamic updates , author=. ACM Transactions on Algorithms (TALG) , volume=. 2017 , publisher=

work page 2017
[27]

2020 , eprint=

BERTScore: Evaluating Text Generation with BERT , author=. 2020 , eprint=

work page 2020
[28]

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning , DOI=

Thil, Lucas-Andrei and Popa, Mirela and Spanakis, Gerasimos , year=. Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning , DOI=. Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing , publisher=

work page
[29]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Wepo: Web element preference optimization for llm-based web navigation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[30]

Proceedings of the ACM on Web Conference 2025 , pages=

Htmlrag: Html is better than plain text for modeling retrieved knowledge in rag systems , author=. Proceedings of the ACM on Web Conference 2025 , pages=

work page 2025
[31]

2002 , publisher=

Computers and intractability , author=. 2002 , publisher=

work page 2002
[32]

STaR: Bootstrapping Reasoning With Reasoning , volume =

Zelikman, Eric and Wu, Yuhuai and Mu, Jesse and Goodman, Noah , booktitle =. STaR: Bootstrapping Reasoning With Reasoning , volume =

work page
[33]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

work page 2025
[34]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025
[35]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

work page 2025
[36]

2023 , eprint=

Mind2Web: Towards a Generalist Agent for the Web , author=. 2023 , eprint=

work page 2023
[37]

2024 , eprint=

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents , author=. 2024 , eprint=

work page 2024
[38]

2024 , eprint=

AutoWebGLM: A Large Language Model-based Web Navigating Agent , author=. 2024 , eprint=

work page 2024
[39]

Doing: Agents that Reason by Scaling Test-Time Interaction , author=

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction , author=. 2025 , eprint=

work page 2025
[40]

2025 , eprint=

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning , author=. 2025 , eprint=

work page 2025
[41]

2025 , eprint=

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning , author=. 2025 , eprint=

work page 2025
[42]

2024 , eprint=

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale , author=. 2024 , eprint=

work page 2024
[43]

2025 , eprint=

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials , author=. 2025 , eprint=

work page 2025
[44]

2025 , eprint=

NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild , author=. 2025 , eprint=

work page 2025
[45]

2023 , eprint=

AgentBench: Evaluating LLMs as Agents , author=. 2023 , eprint=

work page 2023
[46]

2024 , eprint=

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? , author=. 2024 , eprint=

work page 2024
[47]

2024 , eprint=

WebArena: A Realistic Web Environment for Building Autonomous Agents , author=. 2024 , eprint=

work page 2024
[48]

Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning, 2025

Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning , author=. arXiv preprint arXiv:2411.02337 , year=

work page arXiv
[49]

2023 , eprint=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. 2023 , eprint=

work page 2023
[50]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[51]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[52]

M. J. Kearns , title =

work page
[53]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[54]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[55]

Suppressed for Anonymity , author=

work page
[56]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[57]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[58]

arXiv preprint arXiv:2412.09605 , year=

Agenttrek: Agent trajectory synthesis via guiding replay with web tutorials , author=. arXiv preprint arXiv:2412.09605 , year=

work page arXiv
[59]

Weblinx: Real-world website navigation with multi-turn dialogue,

Weblinx: Real-world website navigation with multi-turn dialogue , author=. arXiv preprint arXiv:2402.05930 , year=

work page arXiv

[1] [1]

2023 , eprint=

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining , author=. 2023 , eprint=

work page 2023

[2] [2]

2021 , eprint=

Learning Transferable Visual Models From Natural Language Supervision , author=. 2021 , eprint=

work page 2021

[3] [3]

2025 , eprint=

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features , author=. 2025 , eprint=

work page 2025

[4] [4]

2023 , eprint=

Android in the Wild: A Large-Scale Dataset for Android Device Control , author=. 2023 , eprint=

work page 2023

[5] [5]

Data Diversity Matters for Robust Instruction Tuning

Bukharin, Alexander and Li, Shiyang and Wang, Zhengyang and Yang, Jingfeng and Yin, Bing and Li, Xian and Zhang, Chao and Zhao, Tuo and Jiang, Haoming. Data Diversity Matters for Robust Instruction Tuning. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.195

work page doi:10.18653/v1/2024.findings-emnlp.195 2024

[6] [6]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Scaling Instruction-Finetuned Language Models

Scaling Instruction-Finetuned Language Models , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2210.11416 , author =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.11416 2022

[8] [8]

2024 , eprint=

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents , author=. 2024 , eprint=

work page 2024

[9] [9]

2021 , eprint=

LoRA: Low-Rank Adaptation of Large Language Models , author=. 2021 , eprint=

work page 2021

[10] [10]

2025 , eprint=

LineRetriever: Planning-Aware Observation Reduction for Web Agents , author=. 2025 , eprint=

work page 2025

[11] [11]

2025 , eprint=

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents , author=. 2025 , eprint=

work page 2025

[12] [12]

2025 , eprint=

Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents , author=. 2025 , eprint=

work page 2025

[13] [13]

2025 , eprint=

Less is More: Improving LLM Alignment via Preference Data Selection , author=. 2025 , eprint=

work page 2025

[14] [14]

2021 , eprint=

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning , author=. 2021 , eprint=

work page 2021

[15] [15]

2020 , eprint=

Coresets for Data-efficient Training of Machine Learning Models , author=. 2020 , eprint=

work page 2020

[16] [16]

2018 , eprint=

Active Learning for Convolutional Neural Networks: A Core-Set Approach , author=. 2018 , eprint=

work page 2018

[17] [17]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , title =. CoRR , volume =. 2019 , archivePrefix =. 1907.11692 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2019

[18] [18]

2025 , eprint=

Retrieval-augmented GUI Agents with Generative Guidelines , author=. 2025 , eprint=

work page 2025

[19] [19]

2025 , eprint=

Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation , author=. 2025 , eprint=

work page 2025

[20] [20]

Proceedings of the 34th International Conference on Machine Learning , pages =

World of Bits: An Open-Domain Platform for Web-Based Agents , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

work page 2017

[21] [21]

2018 , eprint=

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration , author=. 2018 , eprint=

work page 2018

[22] [22]

and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , booktitle =

Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre , booktitle =. 2024 , editor =

work page 2024

[23] [23]

Transactions on Machine Learning Research , issn=

The BrowserGym Ecosystem for Web Agent Research , author=. Transactions on Machine Learning Research , issn=. 2025 , note=

work page 2025

[24] [24]

2023 , eprint=

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. 2023 , eprint=

work page 2023

[25] [25]

2025 , eprint=

Gemma 3 Technical Report , author=. 2025 , eprint=

work page 2025

[26] [26]

ACM Transactions on Algorithms (TALG) , volume=

Max-sum diversification, monotone submodular functions, and dynamic updates , author=. ACM Transactions on Algorithms (TALG) , volume=. 2017 , publisher=

work page 2017

[27] [27]

2020 , eprint=

BERTScore: Evaluating Text Generation with BERT , author=. 2020 , eprint=

work page 2020

[28] [28]

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning , DOI=

Thil, Lucas-Andrei and Popa, Mirela and Spanakis, Gerasimos , year=. Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning , DOI=. Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing , publisher=

work page

[29] [29]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Wepo: Web element preference optimization for llm-based web navigation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[30] [30]

Proceedings of the ACM on Web Conference 2025 , pages=

Htmlrag: Html is better than plain text for modeling retrieved knowledge in rag systems , author=. Proceedings of the ACM on Web Conference 2025 , pages=

work page 2025

[31] [31]

2002 , publisher=

Computers and intractability , author=. 2002 , publisher=

work page 2002

[32] [32]

STaR: Bootstrapping Reasoning With Reasoning , volume =

Zelikman, Eric and Wu, Yuhuai and Mu, Jesse and Goodman, Noah , booktitle =. STaR: Bootstrapping Reasoning With Reasoning , volume =

work page

[33] [33]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

work page 2025

[34] [34]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025

[35] [35]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

work page 2025

[36] [36]

2023 , eprint=

Mind2Web: Towards a Generalist Agent for the Web , author=. 2023 , eprint=

work page 2023

[37] [37]

2024 , eprint=

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents , author=. 2024 , eprint=

work page 2024

[38] [38]

2024 , eprint=

AutoWebGLM: A Large Language Model-based Web Navigating Agent , author=. 2024 , eprint=

work page 2024

[39] [39]

Doing: Agents that Reason by Scaling Test-Time Interaction , author=

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction , author=. 2025 , eprint=

work page 2025

[40] [40]

2025 , eprint=

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning , author=. 2025 , eprint=

work page 2025

[41] [41]

2025 , eprint=

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning , author=. 2025 , eprint=

work page 2025

[42] [42]

2024 , eprint=

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale , author=. 2024 , eprint=

work page 2024

[43] [43]

2025 , eprint=

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials , author=. 2025 , eprint=

work page 2025

[44] [44]

2025 , eprint=

NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild , author=. 2025 , eprint=

work page 2025

[45] [45]

2023 , eprint=

AgentBench: Evaluating LLMs as Agents , author=. 2023 , eprint=

work page 2023

[46] [46]

2024 , eprint=

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? , author=. 2024 , eprint=

work page 2024

[47] [47]

2024 , eprint=

WebArena: A Realistic Web Environment for Building Autonomous Agents , author=. 2024 , eprint=

work page 2024

[48] [48]

Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning, 2025

Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning , author=. arXiv preprint arXiv:2411.02337 , year=

work page arXiv

[49] [49]

2023 , eprint=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. 2023 , eprint=

work page 2023

[50] [50]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000

[51] [51]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980

[52] [52]

M. J. Kearns , title =

work page

[53] [53]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983

[54] [54]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000

[55] [55]

Suppressed for Anonymity , author=

work page

[56] [56]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981

[57] [57]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959

[58] [58]

arXiv preprint arXiv:2412.09605 , year=

Agenttrek: Agent trajectory synthesis via guiding replay with web tutorials , author=. arXiv preprint arXiv:2412.09605 , year=

work page arXiv

[59] [59]

Weblinx: Real-world website navigation with multi-turn dialogue,

Weblinx: Real-world website navigation with multi-turn dialogue , author=. arXiv preprint arXiv:2402.05930 , year=

work page arXiv