Skim profiles website patterns offline to enable fast-path speculative execution for web agents, cutting median cost by 1.9x and latency by 33.4% with no accuracy loss on benchmarks.
Understanding html with large language models
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
AndroidWorld is a dynamic, reproducible Android benchmark that generates unlimited natural-language tasks for autonomous agents and shows current agents succeed on only 30.6 percent of them.
Pre-trained LLMs using recursive criticism and improvement prompting achieve state-of-the-art results on the MiniWoB++ computer task benchmark with only a handful of demonstrations and no task-specific reward function.
Plan-and-Act trains a dedicated Planner on synthetic plan-annotated trajectories to generate high-level plans that an Executor follows, reaching 57.58% success on WebArena-Lite and 81.36% on WebVoyager.
citing papers explorer
-
Skim: Speculative Execution for Fast and Efficient Web Agents
Skim profiles website patterns offline to enable fast-path speculative execution for web agents, cutting median cost by 1.9x and latency by 33.4% with no accuracy loss on benchmarks.
-
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld is a dynamic, reproducible Android benchmark that generates unlimited natural-language tasks for autonomous agents and shows current agents succeed on only 30.6 percent of them.
-
Language Models can Solve Computer Tasks
Pre-trained LLMs using recursive criticism and improvement prompting achieve state-of-the-art results on the MiniWoB++ computer task benchmark with only a handful of demonstrations and no task-specific reward function.
-
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Plan-and-Act trains a dedicated Planner on synthetic plan-annotated trajectories to generate high-level plans that an Executor follows, reaching 57.58% success on WebArena-Lite and 81.36% on WebVoyager.