Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility
Pith reviewed 2026-05-10 02:32 UTC · model grok-4.3
The pith
Optimizing high-level features of webpages before generating text raises citation rates in generative engines more effectively than direct word edits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FeatGEO represents webpages as vectors of structural, content, and linguistic features, performs multi-objective optimization over this space to raise citation visibility without degrading quality, and then uses a language model to realize the chosen feature values as natural language. This yields higher citation rates than token-level baselines on GEO-Bench across three generative engines, shows that broader document properties influence citations more strongly than lexical edits, and demonstrates that the learned feature configurations generalize across language models of varying scales.
What carries the argument
An interpretable feature space of structural, content, and linguistic properties that decouples high-level multi-objective optimization from text realization by a downstream language model.
If this is right
- Content creators can adjust high-level webpage traits to increase the likelihood of citation in AI-generated answers.
- Multi-objective search enables explicit control over the trade-off between visibility and quality.
- Document-level content properties matter more for citations than isolated token edits.
- Optimized feature configurations transfer to generative models of different sizes.
Where Pith is reading between the lines
- Optimization strategies for AI visibility may shift focus from phrasing tweaks to planning overall structure and emphasis.
- The same feature-decoupling approach could be tested for improving other generative outputs such as factual grounding or user engagement.
- Further experiments could check whether human raters agree that the optimized features produce more citable content.
Load-bearing premise
The selected structural, content, and linguistic features form a sufficient abstraction of the factors that drive citation decisions inside generative engines.
What would settle it
A test in which pages are rewritten to match the optimized feature values but show no increase in citation frequency relative to unmodified pages or pages altered by random feature changes when submitted to the same generative engines.
Figures
read the original abstract
Generative answer engines expose content through selective citation rather than ranked retrieval, fundamentally altering how visibility is determined. This shift calls for new optimization methods beyond traditional search engine optimization. Existing generative engine optimization (GEO) approaches primarily rely on token-level text rewriting, offering limited interpretability and weak control over the trade-off between citation visibility and content quality. We propose FeatGEO, a feature-level, multi-objective optimization framework that abstracts webpages into interpretable structural, content, and linguistic properties. Instead of directly editing text, FeatGEO optimizes over this feature space and uses a language model to realize feature configurations into natural language, decoupling high-level optimization from surface-level generation. Experiments on GEO-Bench across three generative engines demonstrate that FeatGEO consistently improves citation visibility while maintaining or improving content quality, substantially outperforming token-level baselines. Further analyses show that citation behavior is more strongly influenced by document-level content properties than by isolated lexical edits, and that the learned feature configurations generalize across language models of different scales.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FeatGEO, a feature-level multi-objective optimization framework for improving citation visibility in generative answer engines. Webpages are abstracted into interpretable structural, content, and linguistic features; these are optimized under a multi-objective trade-off between visibility and quality; an LM then realizes the optimal feature vector as natural language text. Experiments on GEO-Bench across three generative engines report that FeatGEO improves citation visibility while maintaining or improving content quality and substantially outperforms token-level baselines. Additional analyses indicate that document-level content properties exert stronger influence on citation decisions than isolated lexical edits and that the learned feature configurations generalize across LMs of different scales.
Significance. If the empirical results prove robust, the work offers a principled shift from token-level rewriting to higher-level, interpretable feature optimization in the emerging area of generative engine optimization. The explicit multi-objective formulation, the decoupling of optimization from surface generation, and the cross-LM generalization finding are potentially valuable contributions. The post-hoc observation that document-level properties dominate could inform future modeling of generative citation behavior. Significance hinges on the strength and completeness of the experimental evidence.
major comments (2)
- [Experiments] Experiments section: the central claim of consistent improvement and outperformance over token-level baselines is presented without reported numerical metrics, confidence intervals, ablation results on the feature vocabulary, or statistical significance tests. Because the paper's primary contribution is empirical, the absence of these details prevents verification of effect sizes and reproducibility; this directly affects the soundness of the strongest claim.
- [Analysis] Analysis section: the finding that document-level properties matter more than lexical edits is derived from post-hoc correlation on the same experimental data. This does not constitute a test of feature-set sufficiency (e.g., whether adding omitted drivers such as query-document semantic alignment or recency signals would yield further gains). Without such a test, it remains possible that the reported advantage over token baselines is an artifact of the particular realization LM rather than a property of the feature-level abstraction itself.
minor comments (2)
- [Abstract] Abstract: positive results are asserted without any quantitative values, error bars, or ablation summaries; a concise numerical statement of the main gains would strengthen the summary.
- [Method] Method: the multi-objective trade-off weights are listed as free parameters; their selection procedure and sensitivity analysis should be described explicitly.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important areas for strengthening the empirical rigor of our work. We address each major comment below and commit to revisions that enhance the presentation of results and analysis without altering the core contributions.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the central claim of consistent improvement and outperformance over token-level baselines is presented without reported numerical metrics, confidence intervals, ablation results on the feature vocabulary, or statistical significance tests. Because the paper's primary contribution is empirical, the absence of these details prevents verification of effect sizes and reproducibility; this directly affects the soundness of the strongest claim.
Authors: We agree that the Experiments section requires more detailed quantitative support to substantiate the claims of consistent improvement and outperformance. In the revised manuscript, we will add specific numerical metrics for citation visibility gains and quality scores across the three generative engines, 95% confidence intervals, ablation studies on the feature vocabulary, and statistical significance tests (such as paired t-tests) against the token-level baselines. These additions will directly address concerns about effect sizes, reproducibility, and verification of the primary empirical contribution. revision: yes
-
Referee: [Analysis] Analysis section: the finding that document-level properties matter more than lexical edits is derived from post-hoc correlation on the same experimental data. This does not constitute a test of feature-set sufficiency (e.g., whether adding omitted drivers such as query-document semantic alignment or recency signals would yield further gains). Without such a test, it remains possible that the reported advantage over token baselines is an artifact of the particular realization LM rather than a property of the feature-level abstraction itself.
Authors: We acknowledge that the current analysis relies on post-hoc correlations from the existing data and does not fully test feature-set sufficiency against all potential omitted drivers. In revision, we will expand the Analysis section with a clearer discussion of this limitation and include additional experiments incorporating drivers such as query-document semantic alignment to assess further gains. The manuscript already demonstrates that the learned feature configurations generalize across language models of different scales, which provides evidence that the advantages arise from the feature-level abstraction rather than a specific realization LM; we will highlight and strengthen this point to address potential artifact concerns. revision: partial
Circularity Check
No significant circularity: empirical validation independent of inputs
full rationale
The paper's core contribution is a proposed FeatGEO framework that abstracts webpages into structural/content/linguistic features, performs multi-objective optimization over that space, and delegates realization to a downstream LM. Claims rest on experimental results from GEO-Bench across three engines, showing visibility gains over token-level baselines. No equations, fitted parameters, or self-citations are presented that reduce the reported improvements to the optimization inputs by construction. The feature abstraction and LM realization step are treated as independent design choices whose effectiveness is tested externally rather than assumed. Further analyses on document-level vs. lexical influence are post-hoc correlations on the same data but do not serve as load-bearing premises for the main result. The derivation chain is therefore self-contained and falsifiable via the reported benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- multi-objective trade-off weights
axioms (1)
- domain assumption The selected structural, content, and linguistic features capture the factors that drive citation decisions in generative engines
Forward citations
Cited by 1 Pith paper
-
From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms
A measurement study of 602 prompts across ChatGPT, Google AI Overview, and Perplexity finds that citation selection breadth and absorption depth diverge, with high-influence pages being longer, structured, and evidence-rich.
Reference graph
Works this paper leans on
-
[1]
Amer, Eslam and Elboghdadly, Tamer , booktitle=. The End of the Search Engine Era and the Rise of Generative AI: A Paradigm Shift in Information Retrieval , year=
-
[2]
Investment in Human Capital: A Theoretical Analysis
Aggarwal, Pranjal and Murahari, Vishvak and Rajpurohit, Tanmay and Kalyan, Ashwin and Narasimhan, Karthik and Deshpande, Ameet , title =. 2024 , publisher =. doi:10.1145/3637528.3671900 , booktitle =
-
[3]
What Generative Search Engines Like and How to Optimize Web Content Cooperatively , author=. 2025 , eprint=
work page 2025
-
[4]
Adversarial Search Engine Optimization for Large Language Models , author=. 2024 , eprint=
work page 2024
-
[5]
American Economic Review , volume=
Internet Advertising and the Generalized Second-Price Auction: Selling Billions of Dollars Worth of Keywords , author=. American Economic Review , volume=. 2007 , publisher=
work page 2007
-
[6]
Deep reinforcement learning for sponsored search real-time bidding , author=. Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining , pages=. 2018 , doi =
work page 2018
-
[7]
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining , pages=
Real-Time Bidding by Reinforcement Learning in Display Advertising , author=. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining , pages=. 2017 , doi =
work page 2017
-
[8]
Proceedings of the ACM Web Conference 2024 , pages=
Mechanism design for large language models , author=. Proceedings of the ACM Web Conference 2024 , pages=. 2024 , doi =
work page 2024
-
[9]
Ad Auctions for LLMs via Retrieval Augmented Generation , url =
Hajiaghayi, MohammadTaghi and Lahaie, S\'. Ad Auctions for LLMs via Retrieval Augmented Generation , url =. Advances in Neural Information Processing Systems , doi =
-
[10]
Truthful Aggregation of LLMs with an Application to Online Advertising , author=. 2025 , eprint=
work page 2025
-
[11]
LLM-Auction: Generative Auction towards LLM-Native Advertising , author=. 2025 , eprint=
work page 2025
-
[12]
Proceedings of the 2023 conference on empirical methods in natural language processing , pages=
G-eval: NLG evaluation using gpt-4 with better human alignment , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=. 2023 , doi =
work page 2023
-
[13]
IEEE transactions on evolutionary computation , volume=
A fast and elitist multiobjective genetic algorithm: NSGA-II , author=. IEEE transactions on evolutionary computation , volume=. 2002 , publisher=
work page 2002
-
[14]
Persuasion with Large Language Models: a Survey , author=. 2024 , eprint=
work page 2024
-
[15]
NIM Marketing Intelligence Review , volume=
Moving Beyond ChatGPT: Applying Large Language Models in Marketing Contexts , author=. NIM Marketing Intelligence Review , volume=. 2024 , publisher=
work page 2024
-
[16]
Harnessing the Potential of Large Language Models in Modern Marketing Management: Applications, Future Directions, and Strategic Recommendations , author=. 2025 , eprint=
work page 2025
-
[17]
Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
Evaluating verifiability in generative search engines , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=. 2023 , doi =
work page 2023
-
[18]
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias
Algaba, Andres and Mazijn, Carmen and Holst, Vincent and Tori, Floriano and Wenmackers, Sylvia and Ginis, Vincent. Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias. Findings of the Association for Computational Linguistics: NAACL 2025. 2025. doi:10.18653/v1/2025.findings-naacl.381
-
[19]
Liu, Nelson F. and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy. Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00638
- [20]
-
[21]
Manipulating Large Language Models to Increase Product Visibility , author=. 2024 , eprint=
work page 2024
-
[22]
Controllable Text Generation for Large Language Models: A Survey , author=. 2024 , eprint=
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.