AllSERP: Exhaustive Per-Element Enrichment of the Versatile AdSERP Dataset
Pith reviewed 2026-05-20 23:22 UTC · model grok-4.3
The pith
AllSERP enriches the AdSERP corpus with pixel-accurate bounding boxes for organic and widget elements plus semantic types for thirteen categories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AllSERP adds pixel-accurate organic and widget bounding boxes via screenshot-anchored computer vision, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor called typed_gapfill, and X+Y click attribution that reaches 91.7 percent of the corpus while flagging the rest at trial level, with the Phase C ad-versus-non-ad partition showing zero disagreements across 38,250 classifications.
What carries the argument
Screenshot-anchored computer vision pipeline for bounding-box generation combined with HTML parsing for semantic element typing and typed_gapfill for inter-result coverage.
If this is right
- Enables per-element click, fixation, regression, and above-fold analyses that the original ads-versus-organic split could not resolve.
- The ad-versus-non-ad partition remains internally consistent with the shipped ad rectangles across the entire corpus.
- Provides a reproducible pipeline, per-trial JSONs, corpus CSV, and browser-based replay viewer for community use.
- Supports finer-grained study of user behavior on search engine result pages before the introduction of AI Overviews.
Where Pith is reading between the lines
- The same pipeline could be applied to post-AI-Overview SERPs to measure how new summary elements alter attention and click patterns.
- Integration with additional eye-tracking or interaction datasets from other search engines would allow cross-platform comparisons of element-level engagement.
- The typed element labels may support training or evaluation of models that predict user attention at the level of individual widgets or organic results.
Load-bearing premise
The computer vision procedure anchored to screenshots produces bounding boxes accurate enough to support reliable per-element behavioral analysis.
What would settle it
A manual audit or independent ground-truth comparison that finds systematic mismatches between the generated organic and widget bounding boxes and the actual rendered positions of those elements on the screenshots.
Figures
read the original abstract
We release AllSERP, a typed AOI and per-element behavioral enrichment of the AdSERP commercial-intent SERP corpus [4]. AdSERP ships 2,776 trials of full-page screenshots, captured SERP HTML, 150 Hz Gazepoint eye tracking, evtrack mouse telemetry, scroll, and pupil signals against real Google SERPs collected before AI Overviews -- but its bounding boxes cover only ad surfaces (15.5 % of attributable clicks). AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gapfill), and X+Y click attribution that reaches 91.7 % of the corpus while flagging the rest at trial level. The Phase C ad-vs-non-ad partition is internally consistent with the shipped ad rectangles (0 disagreements across 38,250 classifications). We ship the pipeline, per-trial JSONs, a corpus CSV, and a browser-based replay viewer; everything is reproducible from the AdSERP Zenodo volume. The release enables per-element click, fixation, regression, and above-fold analyses that the shipped ads-vs-organic split could not resolve.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript releases AllSERP, a typed AOI and per-element behavioral enrichment of the AdSERP corpus of 2,776 real Google SERP trials. It augments the original ad-only bounding boxes (covering 15.5% of clicks) by adding pixel-accurate organic and widget bounding boxes via screenshot-anchored computer vision, semantic types across thirteen element types via an HTML parser, a typed_gapfill inter-result method, and X+Y click attribution reaching 91.7% of the corpus (with the remainder flagged at trial level). The work reports an internal consistency check for the Phase C ad-vs-non-ad partition (zero disagreements across 38,250 classifications) and ships the full pipeline, per-trial JSONs, corpus CSV, and browser replay viewer, all reproducible from the AdSERP Zenodo volume.
Significance. If the CV-derived bounding boxes are sufficiently accurate, the release would enable valuable per-element analyses of clicks, fixations, regressions, and above-fold behavior on SERPs that the prior ads-vs-organic split could not support. The shipped artifacts, reproducibility, high attribution coverage, and direct consistency check against existing ad rectangles are clear strengths for a data-release contribution in information retrieval.
major comments (2)
- [Abstract] Abstract: The central claim that AllSERP supplies 'pixel-accurate organic and widget bboxes via screenshot-anchored CV' (plus semantic types for thirteen element types) is load-bearing for all downstream per-element analyses, yet the only quantitative check described is the ad-vs-non-ad partition consistency (0 disagreements on 38,250 classifications). No IoU, pixel-offset, precision/recall, or ground-truth comparison is reported for the new CV outputs or the organic/widget boxes.
- [Abstract] Abstract (Phase C description): The ad-vs-non-ad consistency check relies on the pre-existing shipped ad rectangles and therefore provides no validation for the accuracy of the newly derived CV bounding boxes, the typed_gapfill method, or the HTML-parser semantic types.
minor comments (2)
- [Abstract] The abstract would be strengthened by briefly stating any available error rates or validation approach for the CV component even if full details appear later in the text.
- Ensure the reference to the original AdSERP work [4] is fully expanded in the bibliography and that any new dependencies (e.g., specific CV libraries) are cited.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review of the AllSERP manuscript. We address each major comment point by point below, clarifying the scope of our reported checks while acknowledging limitations in direct validation for the new components. Revisions have been made to improve transparency.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that AllSERP supplies 'pixel-accurate organic and widget bboxes via screenshot-anchored CV' (plus semantic types for thirteen element types) is load-bearing for all downstream per-element analyses, yet the only quantitative check described is the ad-vs-non-ad partition consistency (0 disagreements on 38,250 classifications). No IoU, pixel-offset, precision/recall, or ground-truth comparison is reported for the new CV outputs or the organic/widget boxes.
Authors: We appreciate the referee drawing attention to the validation gap. The reported consistency check (zero disagreements on 38,250 classifications) confirms that our Phase C ad/non-ad partition aligns with the pre-existing AdSERP ad rectangles, serving as an internal sanity check for the overall enrichment pipeline. However, we acknowledge that this does not provide direct quantitative accuracy metrics such as IoU, pixel offset, precision, or recall for the newly derived CV bounding boxes on organic and widget elements, nor for the semantic types. The CV method is screenshot-anchored and the semantic types are extracted via deterministic HTML parsing, but no independent ground-truth annotations were collected for these additions. We have revised the abstract to qualify the bounding-box claim as 'high-resolution' rather than 'pixel-accurate' and added a dedicated limitations subsection describing the CV and parsing approaches along with the absence of external validation metrics. The complete open-source pipeline is provided to support future community-led accuracy assessments. revision: yes
-
Referee: [Abstract] Abstract (Phase C description): The ad-vs-non-ad consistency check relies on the pre-existing shipped ad rectangles and therefore provides no validation for the accuracy of the newly derived CV bounding boxes, the typed_gapfill method, or the HTML-parser semantic types.
Authors: We agree that the consistency check is scoped specifically to the ad/non-ad partition against the original rectangles and therefore does not constitute validation of the CV-derived organic/widget boxes, the typed_gapfill interpolation, or the HTML-derived semantic labels. The typed_gapfill method is a rule-based procedure for filling inter-result gaps, and semantic types are produced by parsing the captured SERP HTML. We have updated the manuscript text to explicitly delimit the scope of the consistency check, expanded the methods section with pseudocode for typed_gapfill and the HTML parser, and inserted a limitations paragraph noting that accuracy of the new elements rests on the reproducibility of the released pipeline rather than on new ground-truth comparisons. We maintain that the 91.7 % click attribution rate and the release of all per-trial JSONs, CSV, and viewer still provide substantial value for per-element analyses. revision: partial
- We do not possess independent ground-truth annotations for the organic and widget bounding boxes, so cannot supply IoU, precision/recall, or pixel-offset metrics without new data collection.
Circularity Check
No circularity: data release with direct consistency check only
full rationale
The manuscript is a dataset release paper that describes adding bounding boxes via screenshot-anchored CV, semantic labels via HTML parser, and typed gap-fill to the existing AdSERP corpus. The sole quantitative statement is a direct consistency check (0 disagreements on 38,250 ad-vs-non-ad classifications) that compares the new partition against the pre-existing ad rectangles shipped with the original dataset. No equations, fitted parameters, predictions, or derivations are present that could reduce to inputs by construction. The cited prior work [4] is an external corpus release, not a self-citation chain supporting a uniqueness claim or ansatz. The contribution is therefore self-contained as reproducible artifacts and a simple internal consistency verification.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gapfill), and X+Y click attribution
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Andrew T. Duchowski. 2026. Real-Time Cognitive Load Measurement of Pupil- lary Oscillation.Proceedings of the ACM on Computer Graphics and Interactive Techniques9, 2 (2026). doi:10.1145/3803537
-
[2]
Wai-Tat Fu and Peter Pirolli. 2007. SNIF-ACT: A Cognitive Model of User Naviga- tion on the World Wide Web.Human-Computer Interaction22, 4 (2007), 355–412. doi:10.1080/07370020701638806
-
[3]
Yasith Jayawardena, Gavindya Jayawardana, and Jacek Gwizdka. 2025. Real-Time Pupillometry-based Index of Pupillary Activity (RIPA2).Journal of Eye Movement Research(2025)
work page 2025
-
[4]
Kayhan Latifzadeh, Jacek Gwizdka, and Luis A. Leiva. 2025. A Versatile Dataset of Mouse and Eye Movements on Search Engine Results Pages. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). ACM, Padua, Italy, 3412–3421. doi:10.1145/3726 302.3730325 Dataset on Zenodo: https://zenodo...
-
[5]
Luis A. Leiva and Roberto Vivó. 2013. Web Browsing Behavior Analysis and Interactive Hypervideo.ACM Transactions on the Web7, 4 (2013), 1–28. doi:10.114 5/2529995.2529996
-
[6]
Peter Pirolli and Stuart Card. 1999. Information Foraging.Psychological Review 106, 4 (1999), 643–675. doi:10.1037/0033-295X.106.4.643
-
[7]
Mario Villaizán-Vallelado, Matteo Salvatori, Kayhan Latifzadeh, Antonio Penta, Luis A. Leiva, and Ioannis Arapakis. 2025. AdSight: Scalable and Accurate Quantifi- cation of User Attention in Multi-Slot Sponsored Search. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). ACM, Padua...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.