Recognition: unknown
From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms
Pith reviewed 2026-05-07 15:16 UTC · model grok-4.3
The pith
Citation selection and absorption diverge across generative search engines, with Perplexity and Google citing more sources while ChatGPT shows higher average influence from the sources it selects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that citation selection and citation absorption are distinct measurable stages in generative engines. Analysis of over 21,000 citations reveals Perplexity and Google select more sources on average, while ChatGPT selects fewer but incorporates higher average influence from fetched pages into its answers. High-influence pages exhibit greater length, structure, semantic alignment, and evidence richness such as definitions, numerical facts, comparisons, and procedural steps.
What carries the argument
The two-stage measurement framework that separates citation selection (search triggering and source choice) from citation absorption (contribution of language, evidence, structure, or facts to the generated answer).
If this is right
- Optimization for generative engines requires separate tactics for increasing selection probability and increasing absorption influence.
- Content with explicit definitions, comparisons, numerical facts, and procedural steps is more likely to be absorbed once cited.
- Platforms show consistent differences in citation breadth versus depth, so uniform GEO strategies will not work across them.
- Measurement of GEO success must track answer-level absorption rather than stopping at citation counts.
Where Pith is reading between the lines
- Content creators may need to redesign pages to emphasize extractable evidence blocks rather than relying on traditional SEO signals alone.
- The divergence could mean that high-volume citation platforms reward discoverability while low-volume ones reward depth, creating different market niches for publishers.
- If absorption features prove stable, they could be used to predict which pages will shape answers even before a query is issued.
Load-bearing premise
Features extracted from fetched pages such as length, structure, and evidence richness accurately proxy the degree to which the page's content was absorbed into the generated answer.
What would settle it
A side-by-side comparison of the framework's influence scores against human raters' judgments of how much each cited page actually shaped the final answer text.
Figures
read the original abstract
Generative search engines increasingly determine whether online information is merely discoverable, cited as a source, or actually absorbed into generated answers. This paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO): citation selection, where a platform triggers search and chooses sources, and citation absorption, where a cited page contributes language, evidence, structure, or factual support to the final answer. We analyze the public geo-citation-lab dataset covering 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity; 21,143 valid search-layer citations; 23,745 citation-level feature records; 18,151 successfully fetched pages; and 72 extracted features. The central descriptive finding is that citation breadth and citation depth diverge. Perplexity and Google cite more sources on average, while ChatGPT cites fewer sources but shows substantially higher average citation influence among fetched pages. High-influence pages tend to be longer, more structured, semantically aligned, and richer in extractable evidence such as definitions, numerical facts, comparisons, and procedural steps. The results suggest that GEO should be measured beyond citation counts, with answer-level absorption treated as a separate outcome.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO) that separates citation selection (platforms triggering search and choosing sources) from citation absorption (the degree to which a cited page contributes language, evidence, structure, or facts to the generated answer). Using the public geo-citation-lab dataset of 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity, the analysis covers 21,143 search-layer citations, 23,745 citation-level feature records, 18,151 fetched pages, and 72 extracted features. The central descriptive finding is a divergence between citation breadth and depth: Perplexity and Google cite more sources on average, while ChatGPT cites fewer sources but exhibits substantially higher average citation influence among fetched pages. High-influence pages are longer, more structured, semantically aligned, and richer in extractable evidence such as definitions, facts, comparisons, and procedural steps. The results argue that GEO measurement must extend beyond citation counts to treat absorption as a distinct outcome.
Significance. If the absorption proxy holds, the work is significant for shifting GEO research from selection-only metrics to a fuller two-stage view, supported by a large-scale, controlled, public dataset that enables reproducibility. The empirical contrasts across three major platforms provide concrete data on how generative engines differ in source usage, and the feature-based characterization of high-influence pages offers actionable insights for content optimization. Strengths include the dataset scale, the public release of geo-citation-lab, and the clear separation of selection versus absorption stages, which could ground future platform comparisons and optimization studies.
major comments (3)
- [§3.2 and §4.2] §3.2 (Citation Absorption Measurement) and §4.2 (Feature Extraction): The citation influence score is computed from the 72 page features (length, structure, semantic alignment, evidence richness) without any reported direct validation against content-contribution metrics such as token overlap, entity alignment, sentence-level similarity, or human judgments of absorption. This assumption is load-bearing for the central breadth-depth divergence claim, as the higher influence reported for ChatGPT could reflect page quality that predicts selection rather than post-selection absorption.
- [Results section] Results section (descriptive statistics and platform comparisons): The reported differences in average citation counts and influence scores across platforms are presented without statistical tests, error bars, or confidence intervals, despite the large sample (over 21k citations). This weakens the claim of 'substantially higher' influence for ChatGPT and the overall divergence finding.
- [§5] §5 (Discussion and Implications): The recommendation that GEO should be measured beyond citation counts treats absorption as a separate outcome, but no ablation study, sensitivity analysis, or robustness check on the 72-feature proxy is reported; post-hoc feature selection could therefore drive the characterization of high-influence pages.
minor comments (3)
- [Abstract] Abstract: Dataset breakdowns by platform (e.g., citations per engine) are not provided, which would help readers interpret the platform-specific contrasts.
- [§2] §2 (Related Work): Limited discussion of how the proposed framework differs from prior citation analysis in traditional web search or from existing GEO studies; a few additional references would clarify novelty.
- [Figures and Tables] Figure captions and Table 1: The exact weighting or aggregation formula used to combine the 72 features into the influence score is not shown; adding this would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below with point-by-point responses. Revisions have been made to strengthen the presentation of the proxy measure, add statistical support, and include robustness checks while preserving the descriptive focus of the study.
read point-by-point responses
-
Referee: [§3.2 and §4.2] §3.2 (Citation Absorption Measurement) and §4.2 (Feature Extraction): The citation influence score is computed from the 72 page features (length, structure, semantic alignment, evidence richness) without any reported direct validation against content-contribution metrics such as token overlap, entity alignment, sentence-level similarity, or human judgments of absorption. This assumption is load-bearing for the central breadth-depth divergence claim, as the higher influence reported for ChatGPT could reflect page quality that predicts selection rather than post-selection absorption.
Authors: We agree that the influence score is a proxy and that direct validation against token overlap, entity alignment, or human judgments is absent. The 72 features were chosen a priori to operationalize absorption potential based on content attributes generative models are known to utilize. The dataset does not currently include token-level alignments between fetched pages and generated answers, precluding such validation without new data collection. In the revision we have added an explicit limitations paragraph in §3.2 acknowledging that high-influence pages may also be more selectable, and we report a qualitative review of 50 randomly sampled high-influence citations to illustrate absorption patterns. This is noted as an area for future extension rather than a completed validation. revision: partial
-
Referee: [Results section] Results section (descriptive statistics and platform comparisons): The reported differences in average citation counts and influence scores across platforms are presented without statistical tests, error bars, or confidence intervals, despite the large sample (over 21k citations). This weakens the claim of 'substantially higher' influence for ChatGPT and the overall divergence finding.
Authors: The results section is intentionally descriptive to characterize platform behaviors at scale. We accept that adding inferential statistics would improve rigor. The revised manuscript now includes Welch’s t-tests (with Bonferroni correction) for mean differences in citation counts and influence scores, 95% confidence intervals, and Cohen’s d effect sizes. These confirm statistical significance (p < 0.001) for the reported platform divergences, including the higher average influence for ChatGPT. revision: yes
-
Referee: [§5] §5 (Discussion and Implications): The recommendation that GEO should be measured beyond citation counts treats absorption as a separate outcome, but no ablation study, sensitivity analysis, or robustness check on the 72-feature proxy is reported; post-hoc feature selection could therefore drive the characterization of high-influence pages.
Authors: The 72 features were assembled from prior SEO and content-quality literature before any analysis, not selected post-hoc. To address robustness concerns we have added a sensitivity analysis in the revised §5 that recomputes influence scores under alternative aggregation schemes (equal weighting, category-only subsets, and exclusion of length). The platform divergence and the profile of high-influence pages (longer, structured, evidence-rich) remain stable. We also report a feature-category ablation showing that evidence-richness and structure contribute most to the score. revision: yes
Circularity Check
Empirical measurement framework with no circular derivation
full rationale
The paper presents a descriptive two-stage measurement framework applied to an external public dataset (geo-citation-lab) of 602 prompts, 21k citations, and 18k fetched pages. Central findings consist of direct empirical counts (average sources cited) and feature-based statistics (72 page attributes such as length and evidence richness correlated with influence scores). No equations, fitted parameters, or derivations are described that reduce by construction to the inputs or to self-citations; the analysis remains self-contained against the collected data without self-definitional loops or load-bearing prior-author results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Fetched pages and their extracted features are sufficient proxies for the content actually absorbed by the generative model.
invented entities (1)
-
citation influence score
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A two-stage formalization of GEO that separates citation selection from citation absorption
-
[2]
A cross-platform empirical summary of ChatGPT, Google AI Overview/Gemini, and Per- plexity using the public geo-citation-lab dataset
-
[3]
A measurement interpretation of influence_score as an answer-level absorption proxy, including its mathematical components and the modeling restrictions that follow from its construction
-
[4]
A set of counter-intuitive empirical findings that challenge shallow GEO heuristics such as maximizing citation count or converting all content into Q&A pages
-
[5]
3 Related Work 3.1 Generative Engine Optimization GEO was formalized by Aggarwal et al
A scientific self-audit and reproducibility checklist designed to support independent review and replication. 3 Related Work 3.1 Generative Engine Optimization GEO was formalized by Aggarwal et al. as a framework for improving content visibility in generative engine responses [1]. That work introduced GEO-bench and showed that black-box content inter- ven...
- [6]
-
[7]
(2025).Generative Engine Optimization: How to Dominate AI Search
Chen, M., Wang, X., Chen, K., and Koudas, N. (2025).Generative Engine Optimization: How to Dominate AI Search. arXiv:2509.08919
-
[8]
(2026).Diagnosing and Repairing Citation Failures in Generative Engine Optimization
Tian, Z., Chen, Y., Tang, Y., Liu, J., and Jia, R. (2026).Diagnosing and Repairing Citation Failures in Generative Engine Optimization. arXiv:2603.09296
-
[9]
Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility
Liu, Z., and Xu, P. (2026).Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility. arXiv:2604.19113
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[10]
(2026).AgenticGEO: A Self-Evolving Agentic System for Generative Engine Optimization
Yuan, J., Wang, J., Wang, Z., Sun, Q., Wang, R., and Li, J. (2026).AgenticGEO: A Self-Evolving Agentic System for Generative Engine Optimization. arXiv:2603.20213
-
[11]
Yu, J., Yang, M., Ding, Y., and Sato, H. (2026).Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior. arXiv:2603.29979
-
[12]
Narayanan Venkit, P., Laban, P., Zhou, Y., Mao, Y., and Wu, C.-S. (2024).Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses. arXiv:2410.22349
-
[13]
(2025).News Source Citing Patterns in AI Search Systems
Yang, K.-C. (2025).News Source Citing Patterns in AI Search Systems. arXiv:2507.05301
-
[14]
(2025).CiteEval: Principle-Driven Citation Evaluation for Source Attribution
Xu, Y., Qi, P., Chen, J., Liu, K., Han, R., Liu, L., Min, B., Castelli, V., Gupta, A., and Wang, Z. (2025).CiteEval: Principle-Driven Citation Evaluation for Source Attribution. arXiv:2506.01829
-
[15]
(2024).On the Capacity of Citation Generation by Large Language Models
Qian, H., Fan, Y., Zhang, R., and Guo, J. (2024).On the Capacity of Citation Generation by Large Language Models. arXiv:2410.11217
-
[16]
Gummadi, and Muhammad Bilal Zafar
Kirsten, E., Grosse Perdekamp, J., Upadhyay, M., Gummadi, K. P., and Zafar, M. B. (2025). Characterizing Web Search in The Age of Generative AI. arXiv:2510.11560
-
[17]
(2020).Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., et al. (2020).Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Process- ing Systems
2020
-
[18]
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense Passage Retrieval for Open-Domain Question Answering. EMNLP. From Citation Selection to Citation Absorption 27
2020
-
[19]
WebGPT: Browser-assisted question-answering with human feedback
Nakano, R., et al. (2021).WebGPT: Browser-assisted question-answering with human feedback. arXiv:2112.09332
work page internal anchor Pith review arXiv 2021
-
[20]
Teaching Language Models to Support Answers with Verified Quotes.CoRR, abs/2203.11147,
Menick, J., et al. (2022).Teaching language models to support answers with verified quotes. arXiv:2203.11147
-
[21]
(2026).A dataset and analysis pipeline for studying how AI search engines select and use citations
geo-citation-lab repository. (2026).A dataset and analysis pipeline for studying how AI search engines select and use citations. GitHub: https://github.com/yaojingang/geo-citation-lab. Accessed April 28, 2026
2026
-
[22]
(2026).Overseas GEO Research Long Report, recalculated version
geo-citation-lab final report. (2026).Overseas GEO Research Long Report, recalculated version. https://yaojingang.github.io/geo-citation-lab/04-repet/final_report.html. Accessed April 28, 2026
2026
-
[23]
(2026).GitHub profile
Yao Jingang. (2026).GitHub profile. https://github.com/yaojingang. Accessed April 29, 2026
2026
-
[24]
(2026).X profile
Yao Jingang. (2026).X profile. https://x.com/yaojingang. Accessed April 29, 2026
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.