WebMall is the first offline multi-shop benchmark for evaluating LLM web agents on complex comparison shopping tasks across heterogeneous product data from multiple simulated e-shops.
Fron- tiers of Computer Science 11(5), 746–761 (2017)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
An ontology-aligned framework for atomistic simulations that integrates over 750,000 triples to enable interoperable data querying and automated provenance tracking.
Porting AI-accelerated CFD model training to IPU-POD16 yields 34% data-feeding speedup and scales throughput to 2805 samples/s on 16 IPUs despite inter-IPU communication limits.
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
citing papers explorer
-
WebMall -- A Multi-Shop Benchmark for Evaluating Web Agents
WebMall is the first offline multi-shop benchmark for evaluating LLM web agents on complex comparison shopping tasks across heterogeneous product data from multiple simulated e-shops.
-
Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data
An ontology-aligned framework for atomistic simulations that integrates over 750,000 triples to enable interoperable data querying and automated provenance tracking.
-
Adaptation of AI-accelerated CFD Simulations to the IPU platform
Porting AI-accelerated CFD model training to IPU-POD16 yields 34% data-feeding speedup and scales throughput to 2805 samples/s on 16 IPUs despite inter-IPU communication limits.
-
A Survey of Scaling in Large Language Model Reasoning
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.