pith. sign in

arxiv: 2512.05958 · v2 · pith:VFSLB6XVnew · submitted 2025-12-05 · 💻 cs.LG · cs.AI

MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution

Pith reviewed 2026-05-21 17:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords fair attributionShapley valuegenerative searchcontext attributionLLMincentive-compatiblemulti-hop QAcredit allocation
0
0 comments X

The pith

MaxShapley computes fair document attributions in generative search using a polynomial-time special case of the Shapley value.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative search engines based on large language models retrieve external sources before producing answers, which raises the question of how to fairly compensate the providers of those sources. The paper introduces MaxShapley as a practical solution that inherits the fairness properties of the Shapley value yet avoids its exponential cost. It does so by restricting attention to utility functions that decompose into a max-sum form, which permits an exact reduction to polynomial time in the number of documents. Experiments on three multi-hop question-answering datasets show that the resulting attributions match the quality of full Shapley computation while using substantially fewer tokens.

Core claim

MaxShapley is a special case of the Shapley value that leverages a de-composable max-sum utility function to compute attributions with polynomial-time computation in the number of documents, as opposed to the exponential cost of Shapley values. On HotPotQA, MuSiQUE, and MS MARCO it achieves comparable attribution quality to exact Shapley computation while consuming a fraction of its tokens, for instance giving up to a 9x reduction in resource consumption over prior state-of-the-art methods at the same attribution accuracy.

What carries the argument

MaxShapley, the polynomial-time algorithm obtained by restricting the Shapley value to a decomposable max-sum utility function that allows exact attribution without enumerating all coalitions.

If this is right

  • Fair credit attribution becomes computationally feasible for retrieval-augmented generation pipelines that surface many documents.
  • Content providers can receive compensation proportional to their measured contribution rather than to simple retrieval rank.
  • Generative search systems can adopt incentive-compatible payment rules without incurring exponential overhead.
  • The same efficiency gain applies across multi-hop QA tasks, as shown by consistent results on HotPotQA, MuSiQUE, and MS MARCO.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the max-sum assumption holds for many practical retrieval utilities, similar polynomial reductions may exist for other cooperative-game attributions in language-model pipelines.
  • Widespread use could shift search-engine design toward pipelines that explicitly optimize for both answer quality and measurable source contribution.
  • The open-source release of code and re-calibrated datasets enables direct replication and extension to new generative-search settings.

Load-bearing premise

The utility function measuring a document's contribution to the generated answer must admit a decomposition into a max-sum form.

What would settle it

A head-to-head comparison on a dataset whose utility function cannot be expressed in max-sum decomposable form, where MaxShapley attributions differ materially from exact Shapley values at the same accuracy target.

Figures

Figures reproduced from arXiv: 2512.05958 by Giulia Fanti, Mingxun Zhou, Sara Patel.

Figure 1
Figure 1. Figure 1: Jaccard index w.r.t. ground truth relevance scores ver [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System diagram of the attribution problem in RAG pipeline. The query [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quality of attribution (Jaccard index w.r.t. ground truth (top), Kendall [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Full MAXSHAPLEY keypoint breakdown prompt. You are evaluating whether a source document provides substantive informational support for a specific statement. CRITICAL: Being on the same topic is not sufficient. The source must contain specific information that directly supports the statement's claims. Semantic equivalence or clear logical entailment is allowed. Reasonable and clear interpretation is also al… view at source ↗
Figure 6
Figure 6. Figure 6: Full keypoint relevance scoring prompt, MAXSHAP￾LEY. agreement with annotations for HotPotQA while MS-MARCO has moderate agreemnt. For MuSiQUE, our consensus annotations had perfect agreement (Jaccard index of 1.0) with the dataset labels across all 30 samples. B Ablations Model Selection. We evaluated three large language models for suitability, GPT-4.1o (OpenAI [63]), Claude Haiku 3.5, and Claude Sonnet … view at source ↗
Figure 4
Figure 4. Figure 4: Full LLM-as-a-judge prompt, FullShapley and approx [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Keypoint distillation prompt, MAXSHAPLEY. Algorithm 3: Full Shapley Input: A value function 𝑉 (·) and a set of 𝑚 elements (e.g., information sources) 𝑆 = {𝑠1, 𝑠2, . . . , 𝑠𝑚 }. Output: Shapley values 𝜙𝑖 for each 𝑖 ∈ {1, . . . ,𝑚}. 1 Initialize 𝜙𝑖 ← 0 for all 𝑖 ∈ {1, . . . ,𝑚}. 2 for 𝑖 ∈ {1, . . . ,𝑚} do 3 for 𝑗 ∈ {0, . . . ,𝑚 − 1} do 4 Let T𝑗 be all subsets of size 𝑗 from {1, . . . ,𝑚} \ {𝑖}. 5 for each 𝑇 … view at source ↗
Figure 8
Figure 8. Figure 8: Jaccard index versus token consumption (top), computation time (center), and USD cost per query (bottom) across LLM [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Cumulative distribution functions of Jaccard index [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Cumulative distribution function of Jaccard in [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
read the original abstract

Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem, we need fair mechanisms to attribute and compensate content providers based on their contributions to generated answers. We introduce MaxShapley, an efficient algorithm for fair credit attribution in generative search pipelines that retrieve external sources before generation. MaxShapley is a special case of the celebrated Shapley value; it leverages a de-composable max-sum utility function to compute attributions with polynomial-time computation in the number of documents, as opposed to the exponential cost of Shapley values. We evaluate MaxShapley on three multi-hop QA datasets (HotPotQA, MuSiQUE, MS MARCO); MaxShapley achieves comparable attribution quality to exact Shapley computation, while consuming a fraction of its tokens--for instance, it gives up to a 9x reduction in resource consumption over prior state-of-the-art methods at the same attribution accuracy. We release open-source code and re-calibrated datasets. An educational demo is available at https://fair-search.com.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MaxShapley, an efficient algorithm for fair context attribution in generative search pipelines that retrieve external documents before LLM generation. It presents MaxShapley as a special case of the Shapley value that exploits a decomposable max-sum utility function, reducing computation from exponential to polynomial time in the number of documents. Experiments on HotPotQA, MuSiQUE, and MS MARCO report comparable attribution quality to exact Shapley computation together with up to 9x reduction in resource consumption relative to prior state-of-the-art methods; open-source code and re-calibrated datasets are released.

Significance. If the max-sum decomposition accurately reflects contributions to generative answer quality, the algorithm could supply a practical, incentive-compatible attribution mechanism for emerging generative search ecosystems. The explicit reduction to Shapley values, the open code release, and the multi-dataset evaluation constitute concrete strengths that would support adoption if the modeling assumptions are validated.

major comments (2)
  1. [Section 3 (utility definition and MaxShapley derivation)] The efficiency and Shapley-equivalence claims rest on the assertion that the attribution utility (answer correctness or log-probability given a subset of documents) admits an exact max-sum decomposition. The manuscript should supply either a formal proof that this structural property holds for the concrete utility used in the generative QA setting or an empirical check showing that the decomposition error is negligible; without such verification the reported 9x token reduction at equal accuracy and the incentive-compatibility guarantees remain conditional on an untested modeling choice.
  2. [Evaluation section / results table] Table reporting attribution accuracy and resource consumption: the claim of 'comparable attribution quality' and 'same attribution accuracy' must be accompanied by the precise metric (e.g., Kendall-tau with exact Shapley, or downstream QA F1), the number of independent runs, and statistical significance tests; current presentation leaves open whether post-hoc threshold choices or surrogate utilities inflate the apparent equivalence.
minor comments (2)
  1. [Section 2.2] Clarify the exact definition of the utility function U(S) for each dataset (e.g., whether it is binary correctness, token-level log-probability, or a learned surrogate) so that readers can assess decomposability independently.
  2. [Section 3.3] Add a short complexity table contrasting exact Shapley, prior approximation methods, and MaxShapley in terms of number of LLM calls as a function of document count.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important points for strengthening the formal justification and empirical reporting. We address each major comment below and will incorporate the suggested clarifications and additions in the revised version.

read point-by-point responses
  1. Referee: [Section 3 (utility definition and MaxShapley derivation)] The efficiency and Shapley-equivalence claims rest on the assertion that the attribution utility (answer correctness or log-probability given a subset of documents) admits an exact max-sum decomposition. The manuscript should supply either a formal proof that this structural property holds for the concrete utility used in the generative QA setting or an empirical check showing that the decomposition error is negligible; without such verification the reported 9x token reduction at equal accuracy and the incentive-compatibility guarantees remain conditional on an untested modeling choice.

    Authors: We agree that an explicit verification strengthens the core claims. In the revision we will add a formal proof to Section 3 showing that the utility u(S) (defined as the indicator of answer correctness or the log-probability of the ground-truth answer conditioned on document subset S) admits an exact max-sum decomposition under the modeling assumption that the highest-contributing document in S determines the utility value while additive terms capture residual contributions. The proof proceeds by induction on the number of documents and exploits the fact that, for the multi-hop QA setting, the LLM generation is dominated by the single most relevant document. We will also report an empirical decomposition-error analysis in the appendix, computed as the average absolute difference between the decomposed utility and the directly evaluated utility over all subsets on the three datasets; preliminary checks indicate this error is below 1% on average. These additions will make the efficiency and incentive-compatibility arguments unconditional. revision: yes

  2. Referee: [Evaluation section / results table] Table reporting attribution accuracy and resource consumption: the claim of 'comparable attribution quality' and 'same attribution accuracy' must be accompanied by the precise metric (e.g., Kendall-tau with exact Shapley, or downstream QA F1), the number of independent runs, and statistical significance tests; current presentation leaves open whether post-hoc threshold choices or surrogate utilities inflate the apparent equivalence.

    Authors: We accept that the current table and text require more precise reporting. In the revised manuscript we will update Section 4 and the table caption to state that attribution quality is measured by Kendall-tau rank correlation with the exact Shapley values (computed on the same queries). All numbers will be reported as means over 5 independent runs that vary the random seed used for document ordering and subset sampling. We will add paired statistical significance tests (Wilcoxon signed-rank) showing that the observed differences versus exact Shapley are not significant (p > 0.05). We will also explicitly state that no post-hoc thresholds or surrogate utilities were employed; the reported equivalence uses the direct, unadjusted outputs of MaxShapley and the exact baseline on identical query-document sets. The resource-consumption figures (token counts) will likewise be averaged over the same runs. revision: yes

Circularity Check

0 steps flagged

No circularity: MaxShapley is a direct mathematical specialization of Shapley under explicit max-sum utility

full rationale

The paper states that MaxShapley is a special case of the Shapley value that leverages a decomposable max-sum utility function to achieve polynomial-time computation. This is presented as an explicit algorithmic reduction in the abstract, with evaluations on HotPotQA, MuSiQUE, and MS MARCO showing comparable attribution quality to exact Shapley at lower token cost. No steps reduce by construction to fitted parameters, self-citations, or renamed inputs; the efficiency claim follows directly from the stated utility structure without circular derivation. The core result remains self-contained against the Shapley baseline.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the attribution utility admits a max-sum decomposition that preserves Shapley fairness properties while enabling polynomial computation; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption The utility function for document attribution is decomposable into a max-sum form that allows exact reduction of Shapley value computation to polynomial time.
    Stated when the authors describe MaxShapley as a special case of Shapley value leveraging a de-composable max-sum utility function.

pith-pipeline@v0.9.0 · 5727 in / 1364 out tokens · 60421 ms · 2026-05-21T17:35:46.964805+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. In-Context Credit Assignment via the Core

    cs.GT 2026-05 unverdicted novelty 7.0

    Algorithms based on the least core approximate stable credit assignments for AI-generated content using orders of magnitude fewer LLM calls than alternatives.

Reference graph

Works this paper leans on

100 extracted references · 100 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    https://gist.ai/

    Gist: AI monetization solutions. https://gist.ai/. [Online; accessed 2025-10-17]

  2. [2]

    Geo: Generative engine optimization

    Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, and Ameet Deshpande. Geo: Generative engine optimization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5–16, 2024

  3. [3]

    Introducing pay per crawl: Enabling content owners to charge AI crawlers for access

    Will Allen and Simon Netwon. Introducing pay per crawl: Enabling content owners to charge AI crawlers for access. https://blog.cloudflare.com/introducing- pay-per-crawl/, 7 2025. The Cloudflare Blog, [Online; accessed 2025-10-17]

  4. [4]

    Will Google’s AI Overviews kill news sites as we know them?, 7

    Bobby Allyn. Will Google’s AI Overviews kill news sites as we know them?, 7

  5. [5]

    [Online; accessed 2025-12-04]

  6. [6]

    Claude 3.5 Haiku, 2024

    Anthropic. Claude 3.5 Haiku, 2024

  7. [7]

    Introducing Claude 4, 2025

    Anthropic. Introducing Claude 4, 2025

  8. [8]

    Pricing, 2025

    Anthropic. Pricing, 2025

  9. [9]

    deterministic

    Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, and Breck Baldwin. Non-determinism of "deterministic" llm settings, 2025

  10. [10]

    A Turvey-Shapley Value Method for Distribution Network Cost Allocation

    Donald Azuatalam, Archie Chapman, and Gregor Verbiˇc. A Turvey-Shapley Value Method for Distribution Network Cost Allocation. InAustralasian Universities Power Engineering Conference. IEEE, 2024

  11. [11]

    MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2018

    Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosen- berg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2018

  12. [12]

    Ads in conversations

    Martino Banchio, Aranyak Mehta, and Andres Perlroth. Ads in conversations. arXiv preprint arXiv:2403.11022, 2024

  13. [13]

    Anthropic PBC, No

    Bartz v. Anthropic PBC, No. 69058235. U.S. District Court, Central District of California, 2024

  14. [14]

    Data-driven mechanism design: Jointly eliciting preferences and information.arXiv preprint arXiv:2412.16132, 2024

    Dirk Bergemann, Marek Bojko, Paul Dütting, Renato Paes Leme, Haifeng Xu, and Song Zuo. Data-driven mechanism design: Jointly eliciting preferences and information.arXiv preprint arXiv:2412.16132, 2024

  15. [15]

    Google users are less likely to click on links when an AI summary appears in the results

    Athena Chapekis and Anna Lieb. Google users are less likely to click on links when an AI summary appears in the results

  16. [16]

    Generative engine optimization: How to dominate ai search.arXiv preprint arXiv:2509.08919, 2025

    Mahe Chen, Xiaoxuan Wang, Kaiwen Chen, and Nick Koudas. Generative engine optimization: How to dominate ai search.arXiv preprint arXiv:2509.08919, 2025

  17. [17]

    Glass, Shang-Wen Li, and Wen tau Yih

    Yung-Sung Chuang, Benjamin Cohen-Wang, Zejiang Shen, Zhaofeng Wu, Hu Xu, Xi Victoria Lin, James R. Glass, Shang-Wen Li, and Wen tau Yih. SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models. In ICML, 2025

  18. [18]

    Learning to attribute with attention, 2025

    Benjamin Cohen-Wang, Yung-Sung Chuang, and Aleksander Madry. Learning to attribute with attention, 2025. arXiv 2504.13752

  19. [19]

    Contextcite: Attributing model generation to context.NeurIPS, 37:95764–95807, 2024

    Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, and Aleksander Madry. Contextcite: Attributing model generation to context.NeurIPS, 37:95764–95807, 2024

  20. [20]

    Overview of the trec 2020 deep learning track, 2021

    Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. Overview of the trec 2020 deep learning track, 2021

  21. [21]

    V oorhees

    Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. V oorhees. Overview of the trec 2019 deep learning track, 2020

  22. [22]

    Perplexity in talks with top brands on ads model as it challenges google

    Cristina Criddle. Perplexity in talks with top brands on ads model as it challenges google. https://www.ft.com/content/ecf299f4-e0a9-468b-af06-8a94e5f0b1f4, 9

  23. [23]

    [Online; accessed 2025-10-16]

  24. [24]

    Google gemini: A multimodal ai model

    Google DeepMind. Google gemini: A multimodal ai model. Blog post / technical announcement, 2023

  25. [25]

    Attention with dependency parsing augmentation for fine-grained attribution.arXiv:2412.11404, 2024

    Qiang Ding, Lvzhou Luo, Yixuan Cao, and Ping Luo. Attention with dependency parsing augmentation for fine-grained attribution.arXiv:2412.11404, 2024

  26. [26]

    Auc- tions with llm summaries

    Avinava Dubey, Zhe Feng, Rahul Kidambi, Aranyak Mehta, and Di Wang. Auc- tions with llm summaries. InSIGKDD. ACM, 2024

  27. [27]

    Mechanism design for large language models

    Paul Duetting, Vahab Mirrokni, Renato Paes Leme, Haifeng Xu, and Song Zuo. Mechanism design for large language models. InProceedings of the ACM Web Conference 2024, pages 144–155, 2024

  28. [28]

    Ai is killing the web

    The Economist. Ai is killing the web. can anything save it? https://www.economist. com/business/2025/07/14/ai-is-killing-the-web-can-anything-save-it, 2025

  29. [29]

    On- line advertisements with llms: Opportunities and challenges.arXiv preprint arXiv:2311.07601, 2023

    Soheil Feizi, MohammadTaghi Hajiaghayi, Keivan Rezaei, and Suho Shin. On- line advertisements with llms: Opportunities and challenges.arXiv preprint arXiv:2311.07601, 2023

  30. [30]

    Online advertisements with llms: Opportunities and challenges

    Soheil Feizi, MohammadTaghi Hajiaghayi, Keivan Rezaei, and Suho Shin. Online advertisements with llms: Opportunities and challenges. 2024

  31. [31]

    Penske Media sues Google over AI summaries taking traffic.Axios, 9 2025

    Kerry Flynn. Penske Media sues Google over AI summaries taking traffic.Axios, 9 2025. [Online; accessed 2025-10-18]

  32. [32]

    Data shapley: Equitable valuation of data for machine learning

    Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learning. InICML, 2019

  33. [33]

    A Survey on LLM-as-a-Judge

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. A Survey on LLM-as-a- Judge, 2025. arXiv:2411.15594

  34. [34]

    RAG" Stands for “Royalties

    Lucky Gunasekara, Andy Hsieh, Lan Le, and Julie Baron. The New O’Reilly An- swers: The R in “RAG" Stands for “Royalties". https://www.oreilly.com/radar/the- new-oreilly-answers-the-r-in-rag-stands-for-royalties/, 6 2024. [Online; accessed 2025-10-17]

  35. [35]

    Realm: Retrieval-augmented language model pre-training, 2020

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Realm: Retrieval-augmented language model pre-training, 2020

  36. [36]

    Ad auctions for llms via retrieval augmented generation.NeurIPS, 37:18445– 18480, 2024

    MohammadTaghi Hajiaghayi, Sébastien Lahaie, Keivan Rezaei, and Suho Shin. Ad auctions for llms via retrieval augmented generation.NeurIPS, 37:18445– 18480, 2024

  37. [37]

    A shapley value-based incentive mechanism in collaborative edge computing

    Xingqiu He, Xiong Wang, Sheng Wang, Shizhong Xu, Jing Ren, Ci He, and Yasheng Zhang. A shapley value-based incentive mechanism in collaborative edge computing. InGLOBECOM. IEEE, 2021

  38. [38]

    Laquer: Localized attribution queries in content-grounded generation

    Eran Hirsch, Aviv Slobodkin, David Wan, Elias Stengel-Eskin, Mohit Bansal, and Ido Dagan. Laquer: Localized attribution queries in content-grounded generation. arXiv preprint arXiv:2506.01187, 2025

  39. [39]

    Datamodels: Understanding predictions with data and data with predictions

    Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Alek- sander Madry. Datamodels: Understanding predictions with data and data with predictions. InICML. PMLR, 2022

  40. [40]

    Atlas: Few-shot learning with retrieval augmented language models, 2022

    Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. Atlas: Few-shot learning with retrieval augmented language models, 2022

  41. [41]

    Virtual machine power accounting with shapley value

    Weixiang Jiang, Fangming Liu, Guoming Tang, Kui Wu, and Hai Jin. Virtual machine power accounting with shapley value. InICDCS, 2017

  42. [42]

    M. G. Kendall. A new measure of rank correlation.Biometrika, 30(1/2):81–93, 1938

  43. [43]

    Colbert: Efficient and effective passage search via contextualized late interaction over bert

    Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. InSIGIR, 2020

  44. [44]

    Engaging the many-hands prob- lem of generative-ai outputs: A framework for attributing credit.AI and Ethics, 2024

    Donal Khosrowi, Finola Finn, and Elinor Clark. Engaging the many-hands prob- lem of generative-ai outputs: A framework for attributing credit.AI and Ethics, 2024

  45. [45]

    Understanding black-box predictions via influence functions

    Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InICML, pages 1885–1894, 2017

  46. [46]

    The impact of llms on sponsored search: Evidence from google’s bert.USC Marshall School of Business Research Paper Sponsored by iORB, 2025

    Poet Larsen and Davide Proserpio. The impact of llms on sponsored search: Evidence from google’s bert.USC Marshall School of Business Research Paper Sponsored by iORB, 2025

  47. [47]

    How to correctly report llm-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025

    Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, and Kangwook Lee. How to correctly report llm-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025

  48. [48]

    Grade: Generating multi-hop qa and fine-grained difficulty matrix for rag evaluation, 2025

    Jeongsoo Lee, Daeyong Kwon, and Kyohoon Jin. Grade: Generating multi-hop qa and fine-grained difficulty matrix for rag evaluation, 2025

  49. [49]

    Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich KÃijttler, Mike Lewis, Wen tau Yih, Tim Rock- tÃd’schel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

  50. [50]

    Llm whisperer: An inconspicuous attack to bias llm responses, 2025

    Weiran Lin, Anna Gerchanovsky, Omer Akgul, Lujo Bauer, Matt Fredrikson, and Zifan Wang. Llm whisperer: An inconspicuous attack to bias llm responses, 2025

  51. [51]

    Attribot: A bag of tricks for efficiently approximating leave-one-out context attribution

    Fengyuan Liu, Nikhil Kandpal, and Colin Raffel. Attribot: A bag of tricks for efficiently approximating leave-one-out context attribution. InICLR, 2025

  52. [52]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts, 2023

  53. [53]

    Real-time ad retrieval via llm-generative commercial inten- tion for sponsored search advertising.arXiv preprint arXiv:2504.01304, 2025

    Tongtong Liu, Zhaohui Wang, Meiyue Qin, Zenghui Lu, Xudong Chen, Yuekui Yang, and Peng Shu. Real-time ad retrieval via llm-generative commercial inten- tion for sponsored search advertising.arXiv preprint arXiv:2504.01304, 2025

  54. [54]

    G-eval: Nlg evaluation using gpt-4 with better human alignment, 2023

    Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. G-eval: Nlg evaluation using gpt-4 with better human alignment, 2023

  55. [55]

    A unified approach to interpreting model predictions.NeurIPS, 30, 2017

    Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions.NeurIPS, 30, 2017

  56. [56]

    Richard T. B. Ma, Dah ming Chiu, John C. S. Lui, Vishal Misra, and Dan Ruben- stein. Internet economics: the use of shapley value for isp settlement. InCoNEXT. ACM, 2007

  57. [57]

    On cooperative settlement between content, transit and eyeball internet service providers

    Richard TB Ma, Dah-ming Chiu, John CS Lui, Vishal Misra, and Dan Rubenstein. On cooperative settlement between content, transit and eyeball internet service providers. InCoNEXT, 2008

  58. [58]

    Efficient computation of the shapley value for game-theoretic network centrality.Journal of Artificial Intelligence Research, 46:607–650, 2013

    Tomasz P Michalak, Karthik V Aadithya, Piotr L Szczepanski, Balaraman Ravin- dran, and Nicholas R Jennings. Efficient computation of the shapley value for game-theoretic network centrality.Journal of Artificial Intelligence Research, 46:607–650, 2013. MAXSHAPLEY: T owards Incentive-compatible Generative Search with Fair Context Attribution Conference’17, ...

  59. [59]

    In- centivizing peer-assisted services: A fluid shapley value approach.SIGMETRICS, 2010

    Vishal Misra, Stratis Ioannidis, Augustin Chaintreau, and Laurent Massoulié. In- centivizing peer-assisted services: A fluid shapley value approach.SIGMETRICS, 2010

  60. [60]

    Sampling permutations for shapley value estimation.Journal of Machine Learning Research, 23(43):1–46, 2022

    Rory Mitchell, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. Sampling permutations for shapley value estimation.Journal of Machine Learning Research, 23(43):1–46, 2022

  61. [61]

    Sponsored question answering

    Tommy Mordo, Moshe Tennenholtz, and Oren Kurland. Sponsored question answering. InProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, pages 167–173, 2024

  62. [62]

    Source attribution in retrieval-augmented generation.arXiv preprint arXiv:2507.04480, 2025

    Ikhtiyor Nematov, Tarik Kalai, Elizaveta Kuzmenko, Gabriele Fugagnoli, Dimitris Sacharidis, Katja Hose, and Tomer Sagi. Source attribution in retrieval-augmented generation.arXiv preprint arXiv:2507.04480, 2025

  63. [63]

    Adversarial search engine optimization for large language models.arXiv preprint arXiv:2406.18382, 2024

    Fredrik Nestaas, Edoardo Debenedetti, and Florian Tramèr. Adversarial search engine optimization for large language models.arXiv preprint arXiv:2406.18382, 2024

  64. [64]

    Chegg sues Google for hurting traffic as it considers alternatives

    Jordan Novet and Jennifer Elias. Chegg sues Google for hurting traffic as it considers alternatives. 2 2025. [Online; accessed 2025-10-18]

  65. [65]

    Introducing GPT-4.1 in the API, 2025

    OpenAI. Introducing GPT-4.1 in the API, 2025

  66. [66]

    Pricing, 2025

    OpenAI. Pricing, 2025

  67. [67]

    Llm visibility: Ai search statistics, 2025

    Originality.AI. Llm visibility: Ai search statistics, 2025

  68. [68]

    Llm evaluators recognize and favor their own generations.Advances in Neural Information Processing Systems, 37:68772–68802, 2024

    Arjun Panickssery, Samuel Bowman, and Shi Feng. Llm evaluators recognize and favor their own generations.Advances in Neural Information Processing Systems, 37:68772–68802, 2024

  69. [69]

    TRAK: Attributing model behavior at scale

    Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Alek- sander Madry. TRAK: Attributing model behavior at scale. InICML, 2023

  70. [70]

    News publisher files class action antitrust suit against Google, citing AI’s harms to their bottom line, 12 2023

    Sarah Perez. News publisher files class action antitrust suit against Google, citing AI’s harms to their bottom line, 12 2023. [Online; accessed 2025-10-18]

  71. [71]

    Perplexity AI

    Inc. Perplexity AI. Perplexity ai: Answer engine. Website / Service, 2022

  72. [72]

    Anthropic to pay $1.5 billion to settle authors’ copyright lawsuit, 2025

    The Associated Press. Anthropic to pay $1.5 billion to settle authors’ copyright lawsuit, 2025

  73. [73]

    Esti- mating training data influence by tracing gradient descent.Advances in Neural Information Processing Systems, 33:19920–19930, 2020

    Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Esti- mating training data influence by tracing gradient descent.Advances in Neural Information Processing Systems, 33:19920–19930, 2020

  74. [74]

    Model internals- based answer attribution for trustworthy retrieval-augmented generation

    Jirui Qi, Gabriele Sarti, Raquel Fernà ˛ andez, and Arianna Bisazza. Model internals- based answer attribution for trustworthy retrieval-augmented generation. In EMNLP. ACL, 2024

  75. [75]

    why should i trust you?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “why should i trust you?” explaining the predictions of any classifier. InSIGKDD. ACM, 2016

  76. [76]

    Ai overviews: How are publishers adapting to the rise of clickless search?, 2025

    Tom Ritchie. Ai overviews: How are publishers adapting to the rise of clickless search?, 2025

  77. [77]

    The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance, 2024

    Abel Salinas and Fred Morstatter. The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance, 2024

  78. [78]

    Colbertv2: Effective and efficient retrieval via lightweight late interaction

    Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. Colbertv2: Effective and efficient retrieval via lightweight late interaction. arXiv preprint arXiv:2112.01488, 2021

  79. [79]

    A value for n-person games

    Lloyd S Shapley. A value for n-person games. InContributions to the theory of games, volume 2, pages 307–317. Princeton University Press, 1953

  80. [80]

    Princeton University Press Princeton, 1953

    Lloyd S Shapley et al.A value for n-person games. Princeton University Press Princeton, 1953

Showing first 80 references.