Chatshop: Interactive information seeking with language agents.arXiv preprint arXiv:2404.09911

Sanxing Chen, Sam Wiseman, Bhuwan Dhingra · 2024 · arXiv 2404.09911

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

cs.AI · 2024-06-17 · unverdicted · novelty 7.0

τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

SimGym is a browser-based VLM agent framework that simulates A/B test outcomes on e-commerce storefronts with 77% directional agreement on add-to-cart shifts from real buyer traffic.

A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains

cs.CL · 2025-08-18 · unverdicted · novelty 6.0

The paper proposes Amazon-Bench, a functionality-grounded benchmark for web agents in e-commerce that generates diverse task queries from webpage elements and evaluates both task performance and safety risks.

citing papers explorer

Showing 3 of 3 citing papers.

$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains cs.AI · 2024-06-17 · unverdicted · none · ref 6
τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.
SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents cs.AI · 2026-05-19 · unverdicted · none · ref 4
SimGym is a browser-based VLM agent framework that simulates A/B test outcomes on e-commerce storefronts with 77% directional agreement on add-to-cart shifts from real buyer traffic.
A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains cs.CL · 2025-08-18 · unverdicted · none · ref 4
The paper proposes Amazon-Bench, a functionality-grounded benchmark for web agents in e-commerce that generates diverse task queries from webpage elements and evaluates both task performance and safety risks.

Chatshop: Interactive information seeking with language agents.arXiv preprint arXiv:2404.09911

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer