pith:VZP4VYJK
AllSERP: Exhaustive Per-Element Enrichment of the Versatile AdSERP Dataset
AllSERP enriches the AdSERP dataset with pixel-accurate bounding boxes and semantic types for every SERP element.
arxiv:2605.04949 v2 · 2026-05-06 · cs.IR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VZP4VYJKSJMZ7ZHSSTAYQQ6ZLX}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gapfill), and X+Y click attribution that reaches 91.7 % of the corpus while flagging the rest at trial level. The Phase C ad-vs-non-ad partition is internally consistent with the shipped ad rectangles (0 disagreements across 38,250 classifications).
The computer vision pipeline for extracting bounding boxes from screenshots and the HTML parser for semantic typing produce accurate results without substantial errors or missed elements, as no independent ground-truth validation or error metrics beyond internal ad consistency are described.
AllSERP enriches the AdSERP SERP corpus with per-element bounding boxes, semantic types, typed gap-fill, and 91.7% click attribution via CV and HTML parsing, with full pipeline and artifacts shipped.
Receipt and verification
| First computed | 2026-05-20T01:05:15.356717Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
ae5fcae12a92599fe4f294c18843d95de6543a27a9e95bbe86387d6c36dda2d3
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VZP4VYJKSJMZ7ZHSSTAYQQ6ZLX \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ae5fcae12a92599fe4f294c18843d95de6543a27a9e95bbe86387d6c36dda2d3
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "c6be80545ba9130ae9084c603ec30bba10514d2526c86580eb2ac3fa41ab605b",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.IR",
"submitted_at": "2026-05-06T14:14:35Z",
"title_canon_sha256": "ee52c958a706b9e173ed214519b6a4595afbaa8eb23a245cf6ad6990f67c89bc"
},
"schema_version": "1.0",
"source": {
"id": "2605.04949",
"kind": "arxiv",
"version": 2
}
}