{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:BPCVUG4YF75TKRQPMT5BNG6XWN","short_pith_number":"pith:BPCVUG4Y","schema_version":"1.0","canonical_sha256":"0bc55a1b982ffb35460f64fa169bd7b343e8c06c3bf433e0006d6409eb904ae2","source":{"kind":"arxiv","id":"2306.06070","version":3},"attestation_state":"computed","paper":{"title":"Mind2Web: Towards a Generalist Agent for the Web","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Mind2Web supplies over 2000 real-world tasks on 137 live websites so language models can act as generalist agents that follow instructions across unseen sites and domains.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Boshi Wang, Boyuan Zheng, Huan Sun, Samuel Stevens, Shijie Chen, Xiang Deng, Yu Gu, Yu Su","submitted_at":"2023-06-09T17:44:31Z","abstract_excerpt":"We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2306.06070","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2023-06-09T17:44:31Z","cross_cats_sorted":[],"title_canon_sha256":"eff6e31b30051f423c394a542c1f3c9b35a370c8ee9a0b56596318dc1aa40794","abstract_canon_sha256":"4904546ccd9001c8e9c284d9a2400549c9fc60089e24477b9071ce804fdf6d7a"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:50.329975Z","signature_b64":"t95s2plaZie3AYVhkqBMehFUPCKN7TM/s/vmdE2sSlH6Fb8yWVzxkn5x4AfealiEupBCEHuG7S6lSa2EJoWzDQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"0bc55a1b982ffb35460f64fa169bd7b343e8c06c3bf433e0006d6409eb904ae2","last_reissued_at":"2026-05-17T23:38:50.329564Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:50.329564Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Mind2Web: Towards a Generalist Agent for the Web","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Mind2Web supplies over 2000 real-world tasks on 137 live websites so language models can act as generalist agents that follow instructions across unseen sites and domains.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Boshi Wang, Boyuan Zheng, Huan Sun, Samuel Stevens, Shijie Chen, Xiang Deng, Yu Gu, Yu Su","submitted_at":"2023-06-09T17:44:31Z","abstract_excerpt":"We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Mind2Web provides three necessary ingredients for building generalist web agents: diverse domains, websites, and tasks; use of real-world websites instead of simulated ones; and a broad spectrum of user interaction patterns. LLMs with HTML filtering by a small LM achieve decent performance even on unseen websites or domains.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the crowdsourced action sequences collected from workers accurately capture the steps a typical user would take to complete each open-ended task on live websites, and that the 137 sites sufficiently represent the diversity needed for generalization to arbitrary new sites.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Mind2Web is the first large-scale dataset of real-world web tasks for developing generalist language-guided agents that complete complex actions on diverse websites.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Mind2Web supplies over 2000 real-world tasks on 137 live websites so language models can act as generalist agents that follow instructions across unseen sites and domains.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"37cc697a3a3c8bbddda49f537f6cacbf003710aca6f25fc168157319c752d256"},"source":{"id":"2306.06070","kind":"arxiv","version":3},"verdict":{"id":"f7056531-aa8f-45b7-8a78-128e1deaa68d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T20:02:09.262992Z","strongest_claim":"Mind2Web provides three necessary ingredients for building generalist web agents: diverse domains, websites, and tasks; use of real-world websites instead of simulated ones; and a broad spectrum of user interaction patterns. LLMs with HTML filtering by a small LM achieve decent performance even on unseen websites or domains.","one_line_summary":"Mind2Web is the first large-scale dataset of real-world web tasks for developing generalist language-guided agents that complete complex actions on diverse websites.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the crowdsourced action sequences collected from workers accurately capture the steps a typical user would take to complete each open-ended task on live websites, and that the 137 sites sufficiently represent the diversity needed for generalization to arbitrary new sites.","pith_extraction_headline":"Mind2Web supplies over 2000 real-world tasks on 137 live websites so language models can act as generalist agents that follow instructions across unseen sites and domains."},"references":{"count":45,"sample":[{"doi":"","year":2021,"title":"Puppeteer headless chrome node.js api. https://github.com/puppeteer/puppeteer, 2021","work_id":"a5824c85-53a2-4ade-b04e-371de7ad9c44","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.48550/arxiv.2204.01691","year":2022,"title":"Do As I Can, Not As I Say: Grounding Language in Robotic Affordances","work_id":"037320f1-b0a9-4cbe-a639-bfb25409ce71","ref_index":2,"cited_arxiv_id":"2204.01691","is_internal_anchor":true},{"doi":"","year":2021,"title":"On the Opportunities and Risks of Foundation Models","work_id":"a18039e9-928d-47c9-a836-32656a71bf71","ref_index":3,"cited_arxiv_id":"2108.07258","is_internal_anchor":true},{"doi":"","year":1901,"title":"Language models are few-shot learners","work_id":"6b6d3f79-d100-4af7-8cb8-c2670a73c7f5","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, and Bryan A. Plum- mer. A dataset for interactive vision-language navigation with unknown command feasibility. In European Confe","work_id":"2738624f-90fc-4a0a-a90b-3dee81d465bd","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":45,"snapshot_sha256":"a14791859106fb749491bda0ae494f97a48bea18ab70fca3459475b5baf44e28","internal_anchors":13},"formal_canon":{"evidence_count":2,"snapshot_sha256":"0fbfd475296b6e8dc07c9ce6b4886871895ad000fb5c614bdb58a4ee123b13df"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2306.06070","created_at":"2026-05-17T23:38:50.329635+00:00"},{"alias_kind":"arxiv_version","alias_value":"2306.06070v3","created_at":"2026-05-17T23:38:50.329635+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2306.06070","created_at":"2026-05-17T23:38:50.329635+00:00"},{"alias_kind":"pith_short_12","alias_value":"BPCVUG4YF75T","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"BPCVUG4YF75TKRQP","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"BPCVUG4Y","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":40,"internal_anchor_count":40,"sample":[{"citing_arxiv_id":"2605.23262","citing_title":"Design and Report Benchmarks for Knowledge Work","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2505.10924","citing_title":"A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2601.14348","citing_title":"Legal Retrieval for Public Defenders","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2406.09187","citing_title":"GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18636","citing_title":"SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19149","citing_title":"Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19099","citing_title":"DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19743","citing_title":"EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2507.04227","citing_title":"Mobile GUI Agents under Real-world Threats: Are We There Yet?","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2401.03568","citing_title":"Agent AI: Surveying the Horizons of Multimodal Interaction","ref_index":245,"is_internal_anchor":true},{"citing_arxiv_id":"2510.22933","citing_title":"How Can AI Augment Access to Justice? Public Defenders' Perspectives on AI Adoption","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2510.23883","citing_title":"Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges","ref_index":166,"is_internal_anchor":true},{"citing_arxiv_id":"2511.06101","citing_title":"SynthAgent: Adapting Web Agents with Synthetic Supervision","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2401.10935","citing_title":"SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents","ref_index":74,"is_internal_anchor":true},{"citing_arxiv_id":"2511.22074","citing_title":"Real-Time Procedural Learning From Experience for AI Agents","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2309.02427","citing_title":"Cognitive Architectures for Language Agents","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2309.16797","citing_title":"Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution","ref_index":91,"is_internal_anchor":true},{"citing_arxiv_id":"2401.13919","citing_title":"WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09571","citing_title":"Tuning Qwen2.5-VL to Improve Its Web Interaction Skills","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2401.01614","citing_title":"GPT-4V(ision) is a Generalist Web Agent, if Grounded","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09581","citing_title":"Avenir-UX: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13527","citing_title":"MMSkills: Towards Multimodal Skills for General Visual Agents","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2403.07718","citing_title":"WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2602.20867","citing_title":"SoK: Agentic Skills -- Beyond Tool Use in LLM Agents","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11533","citing_title":"Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation","ref_index":5,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN","json":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN.json","graph_json":"https://pith.science/api/pith-number/BPCVUG4YF75TKRQPMT5BNG6XWN/graph.json","events_json":"https://pith.science/api/pith-number/BPCVUG4YF75TKRQPMT5BNG6XWN/events.json","paper":"https://pith.science/paper/BPCVUG4Y"},"agent_actions":{"view_html":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN","download_json":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN.json","view_paper":"https://pith.science/paper/BPCVUG4Y","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2306.06070&json=true","fetch_graph":"https://pith.science/api/pith-number/BPCVUG4YF75TKRQPMT5BNG6XWN/graph.json","fetch_events":"https://pith.science/api/pith-number/BPCVUG4YF75TKRQPMT5BNG6XWN/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN/action/timestamp_anchor","attest_storage":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN/action/storage_attestation","attest_author":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN/action/author_attestation","sign_citation":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN/action/citation_signature","submit_replication":"https://pith.science/pith/BPCVUG4YF75TKRQPMT5BNG6XWN/action/replication_record"}},"created_at":"2026-05-17T23:38:50.329635+00:00","updated_at":"2026-05-17T23:38:50.329635+00:00"}