pith. sign in

arxiv: 1907.09328 · v1 · pith:BESUJ7NYnew · submitted 2019-07-22 · 💻 cs.IR

A Conceptual Framework for Evaluating Fairness in Search

Pith reviewed 2026-05-24 17:54 UTC · model grok-4.3

classification 💻 cs.IR
keywords distributional fairnesssearch evaluationfairness axiomsTREC collectionsrelevance metricsmetric interpolationinformation retrieval
0
0 comments X

The pith

A conceptual framework evaluates search fairness via axioms for distributional fairness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines distributional fairness as a property of search result distributions and builds a conceptual framework around it. It formulates axioms that any ideal fairness evaluation framework must satisfy. Existing TREC collections are shown to be reusable for fairness studies once data bias is measured. Analyses demonstrate divergence between relevance and fairness metrics, and a simple interpolation combines the two into one score. A sympathetic reader would care because search systems have long optimized only for relevance, and this supplies a principled way to incorporate fairness without discarding prior evaluation practices.

Core claim

We define a notion of distributional fairness and provide a conceptual framework for evaluating search results based on it. As part of this, we formulate a set of axioms which an ideal evaluation framework should satisfy for distributional fairness. We show how existing TREC test collections can be repurposed to study fairness, measure potential data bias to inform test collection design, demonstrate metric divergence between relevance and fairness, and describe a simple but flexible interpolation strategy for integrating relevance and fairness into a single metric.

What carries the argument

The set of axioms that an ideal distributional fairness evaluation framework must satisfy, around which the conceptual framework is constructed.

If this is right

  • Fairness metrics diverge from relevance metrics on real collections, requiring explicit trade-off handling.
  • An interpolation strategy produces a single metric usable for both optimization and evaluation.
  • Repurposed TREC collections become viable for fairness studies after bias quantification.
  • Test collection design for fair search can be guided by measured data bias levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same axiomatic approach could be tested on fairness in recommendation or question-answering systems.
  • Future collections might be built from the start to satisfy the axioms rather than retrofitted.
  • The framework offers a template for defining fairness axioms in other ranked-output domains.

Load-bearing premise

The axioms correctly capture what an ideal distributional fairness evaluation framework must satisfy, and repurposing existing TREC collections introduces no critical new biases.

What would settle it

A concrete case in which search results judged fair by external criteria violate one or more of the stated axioms, or in which the interpolated metric produces rankings that are worse on both relevance and fairness than optimizing the two separately.

Figures

Figures reproduced from arXiv: 1907.09328 by Anubrata Das, Matthew Lease.

Figure 1
Figure 1. Figure 1: Distribution of relevant documents across topics by category for two test collections. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Correlation in system scores by metrics for relevance vs. fairness (for uniform vs. dataset target distributions). [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

While search efficacy has been evaluated traditionally on the basis of result relevance, fairness of search has attracted recent attention. In this work, we define a notion of distributional fairness and provide a conceptual framework for evaluating search results based on it. As part of this, we formulate a set of axioms which an ideal evaluation framework should satisfy for distributional fairness. We show how existing TREC test collections can be repurposed to study fairness, and we measure potential data bias to inform test collection design for fair search. A set of analyses show metric divergence between relevance and fairness, and we describe a simple but flexible interpolation strategy for integrating relevance and fairness into a single metric for optimization and evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper defines a notion of distributional fairness for search results and presents a conceptual framework for evaluating search systems with respect to it. It formulates a set of axioms that any ideal evaluation framework for distributional fairness should satisfy, demonstrates how existing TREC test collections can be repurposed for fairness studies while quantifying associated data biases, reports analyses showing divergence between relevance-based and fairness-based metrics, and describes a flexible interpolation strategy for combining relevance and fairness into a single optimization metric.

Significance. If the proposed axioms and framework gain acceptance, the work could provide a useful foundation for standardizing fairness evaluation in information retrieval, moving beyond ad-hoc fairness measures. The practical elements—repurposing of TREC collections with bias measurement and the interpolation approach—are concrete contributions that could aid adoption. The paper is explicitly conceptual rather than empirical or axiomatic-derivational, so its value lies in the clarity and utility of the proposed definitions and strategy.

major comments (2)
  1. [Axioms formulation section] The central contribution rests on the set of axioms for an ideal distributional fairness framework, yet the manuscript provides no formal argument, completeness proof, or comparison showing why these particular axioms (as opposed to alternatives) are necessary and sufficient; without this, the framework's status as 'ideal' remains a definitional choice rather than a derived property.
  2. [TREC repurposing and bias measurement section] The repurposing of TREC collections for fairness analysis includes a data-bias measurement, but the manuscript does not quantify how large the measured bias must be before it invalidates downstream fairness conclusions or provide a mitigation strategy; this directly affects the claim that the collections can be reliably used for fairness studies.
minor comments (2)
  1. [Interpolation strategy section] Notation for the interpolation strategy should be introduced with an explicit equation rather than described only in prose, to allow readers to reproduce the combined metric exactly.
  2. [Introduction] The abstract and introduction use 'distributional fairness' without an immediate formal definition; a one-sentence mathematical characterization early in the paper would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and recommendation of minor revision. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Axioms formulation section] The central contribution rests on the set of axioms for an ideal distributional fairness framework, yet the manuscript provides no formal argument, completeness proof, or comparison showing why these particular axioms (as opposed to alternatives) are necessary and sufficient; without this, the framework's status as 'ideal' remains a definitional choice rather than a derived property.

    Authors: The paper is explicitly positioned as conceptual rather than axiomatic-derivational. The axioms are offered as an initial set of desirable properties motivated by the requirements of distributional fairness in search, not as a formally proven minimal or complete basis. We agree that necessity, sufficiency, and comparisons to alternatives are not established here. We will revise the axioms section to state this scope explicitly and to frame the axioms as a proposal open to refinement by the community. revision: partial

  2. Referee: [TREC repurposing and bias measurement section] The repurposing of TREC collections for fairness analysis includes a data-bias measurement, but the manuscript does not quantify how large the measured bias must be before it invalidates downstream fairness conclusions or provide a mitigation strategy; this directly affects the claim that the collections can be reliably used for fairness studies.

    Authors: The referee correctly notes the absence of a specific bias threshold or mitigation strategy. The bias quantification is presented to inform users of the repurposed collections rather than to certify them as suitable without qualification. We will revise the relevant section to state this limitation clearly and to identify the development of such thresholds and strategies as an open research question. revision: yes

Circularity Check

0 steps flagged

Conceptual proposal with no circular derivation chain

full rationale

The paper introduces a definition of distributional fairness, formulates axioms that an ideal framework should satisfy, demonstrates repurposing of existing TREC collections, measures associated data bias, and describes an interpolation between relevance and fairness metrics. These steps are presented as definitional and conceptual contributions rather than empirical predictions or derivations from first principles. No equations reduce outputs to fitted inputs by construction, no self-citations serve as load-bearing uniqueness theorems, and the work remains self-contained against external benchmarks without renaming known results or smuggling ansatzes. The reader's assessment of score 2 aligns with minor self-citation potential that is not load-bearing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests primarily on the appropriateness of the newly formulated axioms for distributional fairness and on the validity of repurposing TREC collections; no free parameters are mentioned and the distributional fairness notion is the main invented entity.

axioms (1)
  • domain assumption A set of axioms exists that any ideal evaluation framework for distributional fairness in search must satisfy
    The paper formulates these axioms as the foundation of the proposed framework.
invented entities (1)
  • Distributional fairness no independent evidence
    purpose: To provide a measurable notion of fairness based on result distribution across groups
    New concept introduced to ground the evaluation framework

pith-pipeline@v0.9.0 · 5629 in / 1228 out tokens · 37125 ms · 2026-05-24T17:54:14.635123+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Software Fairness: An Analysis and Survey

    cs.SE 2022-05 unverdicted novelty 4.0

    A literature survey of 164 papers on software fairness reveals gaps in requirements engineering, intersectional measures, unstructured data, and white-box ML methods.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In Proceedings of the second ACM international con- ference on web search and data mining . ACM, 5–14

  2. [2]

    Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and WB Croft. 2018. Unbiased learning to rank with unbiased propensity estimation. arXiv:1804.05938 (2018)

  3. [3]

    Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of attention: Amortizing individual fairness in rankings. arXiv:1805.01788 (2018)

  4. [4]

    Elisa Celis, Damian Straszak, and Nisheeth K

    L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. 2018. Ranking with Fairness Constraints. In ICALP. A Conceptual Framework for Evaluating Fairness in Search , July, 2019,

  5. [5]

    Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. 2018. Investigating the impact of gender on rank in resume search engines. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems . ACM, 651

  6. [6]

    Ekstrand, Robin Burke, and Fernando Diaz

    Michael D. Ekstrand, Robin Burke, and Fernando Diaz. 2019. Fairness and Discrimination in Retrieval and Recommendation. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval (SIGIR’19) . ACM, New York, NY, USA, 1403–1404. https: //doi.org/10.1145/3331184.3331380

  7. [7]

    Danielle Ensign, Sorelle A Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2017. Runaway feedback loops in predictive policing.arXiv preprint arXiv:1706.09847 (2017)

  8. [8]

    Robert Epstein and Ronald E Robertson. 2015. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences 112, 33 (2015), E4512–E4521

  9. [9]

    Matthew Lease. 2018. Fact Checking and Information Retrieval. (2018)

  10. [10]

    Q Vera Liao and Wai-Tat Fu. 2013. Beyond the filter bubble: interactive effects of perceived threat and topic involvement on selective exposure to information. In Proceedings of CHI. ACM, 2359–2368

  11. [11]

    Christina Lioma, Jakob Grue Simonsen, and Birger Larsen. 2017. Evaluation measures for relevance and credibility in ranked lists. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval . ACM, 91–98

  12. [12]

    Craig MacDonald, Iadh Ounis, and Ian Soboroff. 2007. Overview of the TREC 2007 Blog Track. In TREC

  13. [13]

    Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. 2018. Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2243–2251

  14. [14]

    Safiya Umoja Noble. 2018. Algorithms of oppression: How search engines reinforce racism. NYU Press

  15. [15]

    Piotr Sapiezynski, Wesley Zeng, Ronald E Robertson, Alan Mislove, and Christo Wilson. 2019. Quantifying the Impact of User Attentionon Fair Group Represen- tation in Ranked Lists. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 553–562

  16. [16]

    Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2219–2228

  17. [17]

    Voorhees and Donna K

    Ellen M. Voorhees and Donna K. Harman. 1999. Overview of the Eighth Text REtrieval Conference (TREC-8). In TREC

  18. [18]

    Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In SSDBM

  19. [19]

    Baeza-Yates

    Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Mega- hed, and Ricardo A. Baeza-Yates. 2017. FA*IR: A Fair Top-k Ranking Algorithm. In CIKM

  20. [20]

    Meike Zehlike and Carlos Castillo. 2018. Reducing Disparate Exposure in Ranking: A Learning To Rank Approach. CoRR abs/1805.08716 (2018)