{"paper":{"title":"AllSERP: Exhaustive Per-Element Enrichment of the Versatile AdSERP Dataset","license":"http://creativecommons.org/licenses/by/4.0/","headline":"AllSERP enriches the AdSERP dataset with pixel-accurate bounding boxes and semantic types for every SERP element.","cross_cats":[],"primary_cat":"cs.IR","authors_text":"K. Andrew Edmonds","submitted_at":"2026-05-06T14:14:35Z","abstract_excerpt":"We release AllSERP, a typed AOI and per-element behavioral enrichment of the AdSERP commercial-intent SERP corpus [4]. AdSERP ships 2,776 trials of full-page screenshots, captured SERP HTML, 150 Hz Gazepoint eye tracking, evtrack mouse telemetry, scroll, and pupil signals against real Google SERPs collected before AI Overviews -- but its bounding boxes cover only ad surfaces (15.5 % of attributable clicks). AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gap"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gapfill), and X+Y click attribution that reaches 91.7 % of the corpus while flagging the rest at trial level. The Phase C ad-vs-non-ad partition is internally consistent with the shipped ad rectangles (0 disagreements across 38,250 classifications).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The computer vision pipeline for extracting bounding boxes from screenshots and the HTML parser for semantic typing produce accurate results without substantial errors or missed elements, as no independent ground-truth validation or error metrics beyond internal ad consistency are described.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"AllSERP enriches the AdSERP SERP corpus with per-element bounding boxes, semantic types, typed gap-fill, and 91.7% click attribution via CV and HTML parsing, with full pipeline and artifacts shipped.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"AllSERP enriches the AdSERP dataset with pixel-accurate bounding boxes and semantic types for every SERP element.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"833e3b327dbaa5d26dac92132d3527a3610e04f6670d9644da8b6e411a154c29"},"source":{"id":"2605.04949","kind":"arxiv","version":2},"verdict":{"id":"403f59bb-06c7-41d2-94c4-e0ec097185e4","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T16:07:38.596684Z","strongest_claim":"AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gapfill), and X+Y click attribution that reaches 91.7 % of the corpus while flagging the rest at trial level. The Phase C ad-vs-non-ad partition is internally consistent with the shipped ad rectangles (0 disagreements across 38,250 classifications).","one_line_summary":"AllSERP enriches the AdSERP SERP corpus with per-element bounding boxes, semantic types, typed gap-fill, and 91.7% click attribution via CV and HTML parsing, with full pipeline and artifacts shipped.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The computer vision pipeline for extracting bounding boxes from screenshots and the HTML parser for semantic typing produce accurate results without substantial errors or missed elements, as no independent ground-truth validation or error metrics beyond internal ad consistency are described.","pith_extraction_headline":"AllSERP enriches the AdSERP dataset with pixel-accurate bounding boxes and semantic types for every SERP element."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.04949/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T21:31:19.941129Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T13:59:49.614609Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"21d2c9c90beef71df6c3fe34d5cf2652c43ec6f703deedba016befd03e515045"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}