pith. machine review for the scientific record. sign in

arxiv: 2604.25032 · v1 · submitted 2026-04-27 · 💻 cs.IR

Recognition: unknown

Offline Evaluation Measures of Fairness in Recommender Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-08 01:31 UTC · model grok-4.3

classification 💻 cs.IR
keywords recommender systemsfairness evaluationoffline evaluationevaluation measuresinterpretabilityuser fairnessitem fairnessevaluation guidelines
0
0 comments X

The pith

Existing offline fairness measures for recommender systems have interpretability and applicability limits that new methods and guidelines can fix.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The thesis analyzes offline fairness evaluation measures in recommender systems by separating them into categories for users and items, and for groups versus single subjects. It conducts theoretical examinations and empirical tests to reveal problems including cases where scores cannot be calculated due to division by zero, unclear patterns in score distributions, and difficulty determining which model outputs count as fairest or least fair. The work then develops new evaluation approaches designed to remove these problems and supplies guidelines that help select the most suitable measure for a given scenario. These contributions matter because reliable fairness scores are needed to meet regulatory expectations for responsible recommender systems and to allow clear comparisons across different recommendation models.

Core claim

The central claim is that a wide range of existing offline fairness measures, when examined by evaluation subject and granularity, exhibit theoretical, empirical, and conceptual limitations that reduce their robustness, and that targeted analysis combined with new measures and usage guidelines can overcome these limitations to support more precise fairness assessment in recommender systems.

What carries the argument

Categorization of fairness measures by subject (users or items) and granularity (group or individual), serving as the structure for theoretical proofs of flaws, empirical distribution studies, and the derivation of replacement measures plus selection rules.

If this is right

  • Model outputs that produce the highest or lowest fairness scores under each measure can be identified through the empirical analysis.
  • Distributions of scores for each measure can be characterized to show typical ranges and outliers.
  • Cases where measures cannot be computed are documented, allowing avoidance of invalid applications.
  • New measures provide alternatives that maintain computability and improve expressiveness for both group and individual fairness.
  • Guidelines enable selection of measures matched to the specific fairness notion and evaluation subject at hand.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of the guidelines could lead to more uniform reporting of fairness results in published recommender system studies.
  • The same combination of theoretical checks and empirical testing could be applied to fairness metrics used in other machine learning tasks beyond recommendation.
  • Integration of the new measures into evaluation toolkits would allow practitioners to run side-by-side comparisons of fairness across algorithms without manual adjustments for edge cases.

Load-bearing premise

That the limitations found for the examined measures cover the main barriers to reliable use and that the new approaches will improve interpretability without introducing their own unexamined drawbacks in practice.

What would settle it

A test that applies one of the proposed new measures to recommender outputs where an original measure returns an undefined result due to division by zero and shows that the new scores remain consistent and match expected fairness orderings across multiple independent datasets.

Figures

Figures reproduced from arXiv: 2604.25032 by Theresia Veronika Rampisela.

Figure 1.1
Figure 1.1. Figure 1.1: Overview of the three types of limitations identified, analysed, or resolved view at source ↗
Figure 1.2
Figure 1.2. Figure 1.2: In this example, Model A is best for fairness, Model B is best for relevance, view at source ↗
Figure 1.3
Figure 1.3. Figure 1.3: Left: NDCG score distribution of two handpicked, non-overlapping user view at source ↗
Figure 2.1
Figure 2.1. Figure 2.1: Correlation (Kendall’s τ ) between relevance and fairness measures for Lastfm. Asterisk (∗ ) denotes a statistically significant correlation (α = 0.05), after applying the Benjamini-Hochberg procedure. 60 view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: Correlation (Kendall’s τ ) between relevance and fairness measures for Ml-1m. Asterisk (∗ ) denotes a statistically significant correlation (α = 0.05), after applying the Benjamini-Hochberg procedure. 61 view at source ↗
Figure 2.3
Figure 2.3. Figure 2.3: Most fair scores with varying k for higher-is-fairer fairness measures for Lastfm and Ml-1m. All scores from the corrected measures (denoted by ‘our’) measures overlap with each other view at source ↗
Figure 2.4
Figure 2.4. Figure 2.4: Most fair scores with varying k for lower-is-fairer fairness measures for Lastfm and Ml-1m. 63 view at source ↗
Figure 2.5
Figure 2.5. Figure 2.5: Most unfair scores with varying k for higher-is-fairer fairness measures for Lastfm and Ml-1m. On Repeatable MostUnfair, all scores from the corrected measures (denoted by ‘our’) overlap with each other for the shown values of k > 1 for Lastfm and for all shown values of k for Ml-1m view at source ↗
Figure 2.6
Figure 2.6. Figure 2.6: Most unfair scores with varying k for lower-is-fairer fairness measures for Lastfm and Ml-1m. On Repeatable MostUnfair, all scores from the corrected measures (denoted by ‘our’) overlap with each other for all shown values of k. 64 view at source ↗
Figure 2.7
Figure 2.7. Figure 2.7: Sliding window evaluation for BPR model, on Lastfm and Ml-1m. Each view at source ↗
Figure 2.8
Figure 2.8. Figure 2.8: Results for jointly LE and relevant item insertion. All measures are at view at source ↗
Figure 2.9
Figure 2.9. Figure 2.9: Correlation (Kendall’s τ ) between relevance and fairness measures for Amazon-lb. Asterisk (∗ ) denotes a statistically significant correlation (α = 0.05), after applying the Benjamini-Hochberg procedure. 87 view at source ↗
Figure 2.10
Figure 2.10. Figure 2.10: Correlation (Kendall’s τ ) between relevance and fairness measures for Book-x. Asterisk (∗ ) denotes a statistically significant correlation (α = 0.05), after applying the Benjamini-Hochberg procedure. 88 view at source ↗
Figure 2.11
Figure 2.11. Figure 2.11: Correlation (Kendall’s τ ) between relevance and fairness measures for Amazon-is. Asterisk (∗ ) denotes a statistically significant correlation (α = 0.05), after applying the Benjamini-Hochberg procedure. 89 view at source ↗
Figure 2.12
Figure 2.12. Figure 2.12: Correlation (Kendall’s τ ) between relevance and fairness measures for Amazon-dm. Asterisk (∗ ) denotes a statistically significant correlation (α = 0.05), after applying the Benjamini-Hochberg procedure. 90 view at source ↗
Figure 2.13
Figure 2.13. Figure 2.13: Most fair scores with varying k for higher-is-fairer fairness measures on Amazon-* and Book-x. 91 view at source ↗
Figure 2.14
Figure 2.14. Figure 2.14: Most fair scores with varying k for lower-is-fairer fairness measures on Amazon-* and Book-x. 92 view at source ↗
Figure 2.15
Figure 2.15. Figure 2.15: Most unfair scores with varying k for higher-is-fairer fairness measures on Amazon-* and Book-x. 93 view at source ↗
Figure 2.16
Figure 2.16. Figure 2.16: Most unfair scores with varying k for lower-is-fairer fairness measures on Amazon-* and Book-x. 94 view at source ↗
Figure 2.17
Figure 2.17. Figure 2.17: Sliding window evaluation for BPR model, on Amazon-lb and Book-x. view at source ↗
Figure 2.18
Figure 2.18. Figure 2.18: Sliding window evaluation for BPR model, on Amazon-is and Amazon view at source ↗
Figure 2.19
Figure 2.19. Figure 2.19: Results for jointly LE and relevant item insertion for view at source ↗
Figure 2.20
Figure 2.20. Figure 2.20: Results for jointly most exposed (ME) and irrelevant item insertion. All view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: Kendall’s τ correlation between joint Fair+Rel measures, Rel, and Fair measures. them being lower-is-better measures. E.g., in ML-10M, ↓IWO ≈ 1 (very unfair) based on its theoretical [0, 1]-range, ↓HD is about a quarter of the ↓IWO score (somewhat fair), while ↓AI-F ≈ 0 (extremely fair). This discrepancy causes confusion in score interpretation. Finally, we group all Fair+Rel measures into 3 clusters: (i… view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: Sliding window evaluation (k = 5) of NCL for Lastfm, Amazon-lb, and ML-10M. The last column is in exponential scale. 118 view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: Artificial insertion of items with m = 1000 (users). 120 view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Kendall’s τ correlation between Joint, Eff, and Fair measures. As￾terisk (∗ ) denotes a statistically significant correlation (α = 0.05) after applying Bonferroni’s correction. 158 view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: The fairest achievable ↓Joint measure scores for varying k, both the original (ori) and corrected (our) versions. IAAori is incomputable at k = 1 due to the undefinedness limitation. Both IFD× versions overlap. 162 view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: The unfairest achievable ↓Joint measure scores for varying k, both the original (ori) and corrected (our) versions. IAAori is incomputable at k = 1 due to the undefinedness limitation. 163 view at source ↗
Figure 4.4
Figure 4.4. Figure 4.4: The original and corrected versions of IFD view at source ↗
Figure 4.5
Figure 4.5. Figure 4.5: Joint measure scores for different numbers of artificially added relevant items per user. We treat unobserved items or low-rated items as ‘relevant’, starting from position k + 1 downwards (top) or from position n upwards (bottom). 168 view at source ↗
Figure 4.6
Figure 4.6. Figure 4.6: Artificial insertion of items with m = 1000 (users). IAAori e is IAAour computed with the original examination function eli. 171 view at source ↗
Figure 5.1
Figure 5.1. Figure 5.1: (x, y) denotes the pair of relevance and fairness score. Example: Model A is best for fairness, Model B is best for relevance, and Model C is the closest to the Pareto Frontier (PF) midpoint, when relevance and fairness are equally weighted (α = 0.5). Averaging relevance and fairness (Avg) leads to falsely concluding that Model A is best for both aspects. Note that distance to PF also beats other existin… view at source ↗
Figure 5.2
Figure 5.2. Figure 5.2: Pareto Frontier of fairness and relevance (in blue) and recommender view at source ↗
Figure 5.3
Figure 5.3. Figure 5.3: Kendall’s τ correlation heatmap between the rank ordering of existing joint evaluation measures (including the average of Fair and Rel scores, avg), and DPFR. 202 view at source ↗
Figure 5.4
Figure 5.4. Figure 5.4: Pareto Frontier of fairness and relevance (in blue), together with recom view at source ↗
Figure 6.1
Figure 6.1. Figure 6.1: Kendall’s τ correlation between Eff measures, existing Fair measures, and PUFs for all recommenders. 6.4.3 Measure Agreement An important aspect when comparing evaluation measures is how much they agree when their scores are used to rank models from best to worst. If one measure can be used to estimate the rank ordering given by another, there is no point in using both measures if we are only interested … view at source ↗
Figure 6.2
Figure 6.2. Figure 6.2: Effectiveness (Eff) and fairness (Fair) scores of QK-video and ML-20M, when artificially varying % of users with all irrelevant items (zero relevance), and the rest of the users receiving all relevant items. All PUF variants overlap. Gini is missing points at 100% users with zero relevance as it is undefined when each user has zero Eff scores. for Lastfm, τ ∈ [−0.71, −0.33]. As no Fair measure consistent… view at source ↗
Figure 6.3
Figure 6.3. Figure 6.3: Artificially varying the skewness of the user similarity distribution for view at source ↗
Figure 6.4
Figure 6.4. Figure 6.4: Artificially varying the % of users with zero relevance for QK-video and view at source ↗
Figure 6.5
Figure 6.5. Figure 6.5: UF and PUF computed with user similarity view at source ↗
Figure 6.6
Figure 6.6. Figure 6.6: Effectiveness (Eff) and fairness (Fair) scores of Lastfm and ML-10M, when artificially varying % of users with all irrelevant items (zero relevance), and the rest of the users receiving all relevant items. All PUF variants overlap. Gini is missing points at 100% users with zero relevance as it is undefined when each user has zero Eff scores. 0 2 4 6 skewness of user similarity distribution 0.0 0.2 0.4 0.… view at source ↗
Figure 6.7
Figure 6.7. Figure 6.7: Artificially varying the skewness of the user similarity distribution view at source ↗
Figure 6.8
Figure 6.8. Figure 6.8: Artificially varying the % of users with zero relevance for Lastfm and view at source ↗
Figure 6.9
Figure 6.9. Figure 6.9: PUF-Prec computed with user similarity simUF (Eq. (6.8)), varying the weighted sum of the users’ past interactions (simJacc) and their item feature distribution (simJS) with γ. 0.0 0.2 0.4 0.6 0.8 ML-10M unfairness UF PUF-Prec PUF-NDCG 0 20 40 60 80 100 % of users with zero relevance 0.0 0.2 0.4 0.6 0.8 ML-20M unfairness 0 20 40 60 80 100 % of users with zero relevance 0 20 40 60 80 100 % of users with z… view at source ↗
Figure 6.10
Figure 6.10. Figure 6.10: Varying the weighted sum of users’ past interactions ( view at source ↗
Figure 7.1
Figure 7.1. Figure 7.1: Agreement (Kendall’s τ ) of NDCG-based measures for individual Fair (y-axis) and group Fair (x-axis) in ranking LLMRecs. User groups are based on 3 sensitive attributes. Due to 8-way ties, τ cannot be computed for Min (JobRec). 257 view at source ↗
Figure 7.2
Figure 7.2. Figure 7.2: NDCG-based Group (Grp) and individual (Ind) view at source ↗
Figure 7.3
Figure 7.3. Figure 7.3: NDCG-based individual, between- and within-group unfairness of GLM view at source ↗
Figure 7.4
Figure 7.4. Figure 7.4: Agreement (Kendall’s τ ) between the same family of measure in ranking LLMRecs for NDCG-based group fairness (y-axis) and individual fairness (x-axis). Group fairness is computed for each combination of users’ sensitive attributes. 273 view at source ↗
Figure 7.5
Figure 7.5. Figure 7.5: NDCG-based individual, between- and within-group unfairness of GLM view at source ↗
read the original abstract

The evaluation of recommender system fairness has become increasingly important, especially with recent legislation that emphasises the development of fair and responsible artificial intelligence. This has led to the emergence of various fairness evaluation measures, which quantify fairness based on different definitions. However, many of such measures are simply proposed and used without further analysis on their robustness. As a result, there is insufficient understanding and awareness of the measures' limitations. Among other issues, it is not known what kind of model outputs produce the (un)fairest score, how the measure scores are empirically distributed, and whether there are cases where the measures cannot be computed (e.g., due to division by zero). These issues cause difficulty in interpreting the measure scores and confusion on which measure(s) should be used for a specific case. This thesis presents a series of papers that assess and overcome various theoretical, empirical, and conceptual limitations of existing recommender system fairness evaluation measures. We investigate a wide range of offline evaluation measures for different fairness notions, divided based on the evaluation subjects (users and items) and for different evaluation granularities (groups of subjects and individual subjects). Firstly, we perform theoretical and empirical analysis on the measures, exposing flaws that limit their interpretability, expressiveness, or applicability. Secondly, we contribute novel evaluation approaches and measures that overcome these limitations. Finally, considering the measures' limitations, we recommend guidelines for the appropriate measure usage, thereby allowing for more precise selection of fairness evaluation measures in practical scenarios. Overall, this thesis contributes to advancing the state-of-the-art offline evaluation of fairness in recommender systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. This thesis analyzes theoretical, empirical, and conceptual limitations of offline fairness evaluation measures in recommender systems. It covers measures for user and item fairness at both group and individual levels, identifies issues such as limited interpretability, unclear score distributions, and cases where measures cannot be computed (e.g., division by zero), proposes new evaluation approaches and measures to address these flaws, and provides guidelines for selecting appropriate measures in practice.

Significance. If the analyses and new approaches hold, the work advances fairness evaluation in recommender systems by improving the robustness and interpretability of measures, which is timely given legislative emphasis on responsible AI. The empirical distributions and theoretical breakdowns of existing measures, along with the proposed alternatives, could help practitioners avoid misinterpretation and select measures more precisely.

minor comments (2)
  1. The abstract mentions specific limitations (e.g., division by zero cases and empirical score distributions) but does not indicate whether the full thesis includes concrete examples or proofs for each; a dedicated section summarizing these across all papers would strengthen the synthesis.
  2. As a collection of papers, the thesis would benefit from an explicit cross-paper comparison table showing which limitations each paper addresses and how the new measures compare to baselines in terms of applicability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our thesis, which correctly identifies its focus on theoretical, empirical, and conceptual limitations of offline fairness measures in recommender systems, along with the proposed alternatives and guidelines. We appreciate the recommendation for minor revision and the recognition of the work's timeliness given legislative emphasis on responsible AI. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The thesis abstract and description outline a series of papers performing theoretical/empirical analysis on existing fairness measures in recommender systems, identifying limitations in interpretability and applicability, proposing novel approaches/measures, and issuing usage guidelines. No equations, derivation steps, fitted parameters presented as predictions, or load-bearing self-citations are visible in the provided text. The work is self-contained with independent analytical and prescriptive content that does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no details on parameters, axioms, or new entities; assessment limited to high-level description only.

pith-pipeline@v0.9.0 · 5582 in / 966 out tokens · 34110 ms · 2026-05-08T01:31:03.442985+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

297 extracted references · 217 canonical work pages · 6 internal anchors

  1. [1]

    Evaluation of Fairness in Recommender Systems: A Review

    Syed Wajid Aalam, Abdul Basit Ahanger, Muzafar Rasool Bhat, and Assif Assad. Evaluation of Fairness in Recommender Systems: A Review. In Balas Valentina E., G R. Sinha, Agarwal Basant, Sharma Tarun Kumar, Dadheech Pankaj, and Mahrishi Mehul, editors,Emerging Technologies in Computer En- gineering: Cognitive Computing and Intelligent IoT, pages 456–465, Cham,

  2. [2]

    ISBN 978-3-031-07012-9

    Springer International Publishing. ISBN 978-3-031-07012-9. Cited on pages 2, 8, 12, 180, and 226

  3. [3]

    On over-specialization and concentration bias of recommendations: probabilistic neighborhood selection in collaborative filtering systems

    Panagiotis Adamopoulos and Alexander Tuzhilin. On over-specialization and concentration bias of recommendations: probabilistic neighborhood selection in collaborative filtering systems. InProceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 153–160, New York, NY, USA,

  4. [4]

    ISBN 9781450326681

    Association for Computing Machinery. ISBN 9781450326681. doi: 10. 1145/2645710.2645752. URLhttps://doi.org/10.1145/2645710.2645752. Cited on page 237

  5. [5]

    Context-aware recommender systems.AI Magazine, 32(3):67–80, Oct

    Gediminas Adomavicius, Bamshad Mobasher, Francesco Ricci, and Alexander Tuzhilin. Context-aware recommender systems.AI Magazine, 32(3):67–80, Oct. 2011. doi: 10.1609/aimag.v32i3.2364. URLhttps://ojs.aaai.org/ aimagazine/index.php/aimagazine/article/view/2364. Cited on page 6

  6. [6]

    Aggarwal.Recommender Systems: The Textbook

    Charu C. Aggarwal.Recommender Systems: The Textbook. Springer Publishing Company, Incorporated, 1st edition, 2016. ISBN 3319296574. Cited on pages 1 and 6

  7. [7]

    Desirable properties for diversity and truncated effectiveness met- rics

    Ameer Albahem, Damiano Spina, Falk Scholer, Alistair Moffat, and Lawrence Cavedon. Desirable properties for diversity and truncated effectiveness met- rics. InProceedings of the 23rd Australasian Document Computing Sym- posium, ADCS ’18, New York, NY, USA, 2018. Association for Comput- ing Machinery. ISBN 9781450365499. doi: 10.1145/3291992.3291996. URL h...

  8. [8]

    Survey on the objec- tives of recommender systems: Measures, solutions, evaluation methodology, and new perspectives.ACM Comput

    Bushra Alhijawi, Arafat Awajan, and Salam Fraihat. Survey on the objec- tives of recommender systems: Measures, solutions, evaluation methodology, and new perspectives.ACM Comput. Surv., 55(5), December 2022. ISSN 0360-0300. doi: 10.1145/3527449. URLhttps://doi.org/10.1145/3527449. Cited on page 8

  9. [9]

    Measures of Inequality.American Sociological Review, 43(6): 865–880, 12 1978

    Paul D Allison. Measures of Inequality.American Sociological Review, 43(6): 865–880, 12 1978. Cited on pages 15 and 45

  10. [10]

    An axiomatic analysis of diversity evaluation metrics: Introducing the rank-biased utility metric

    Enrique Amig´ o, Damiano Spina, and Jorge Carrillo-de Albornoz. An axiomatic analysis of diversity evaluation metrics: Introducing the rank-biased utility metric. InThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, page 625–634, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781...

  11. [11]

    A unifying and general account of fairness measurement in recommender systems

    Enrique Amig´ o, Yashar Deldjoo, Stefano Mizzaro, and Alejandro Bellog´ ın. A unifying and general account of fairness measurement in recommender systems. Information Processing & Management, 60(1):103115, 1 2023. ISSN 0306-4573. doi: 10.1016/J.IPM.2022.103115. Cited on pages 2, 8, 12, 13, 15, 71, 72, 76, 102, 103, 119, 180, 188, 226, 253, and 264

  12. [12]

    On the measurement of inequality.Journal of Eco- nomic Theory, 2(3):244–263, 1970

    Anthony B Atkinson. On the measurement of inequality.Journal of Eco- nomic Theory, 2(3):244–263, 1970. ISSN 0022-0531. doi: https://doi. org/10.1016/0022-0531(70)90039-6. URLhttps://www.sciencedirect.com/ science/article/pii/0022053170900396. Cited on pages 254 and 264

  13. [13]

    Nicolás, The bar derived category of a curved dg algebra, Journal of Pure and Applied Algebra 212 (2008) 2633–2659

    Charles Audet, Jean Bigeon, Dominique Cartier, S´ ebastien Le Digabel, and Ludovic Salomon. Performance indicators in multiobjective optimization.Eu- ropean Journal of Operational Research, 292(2):397–422, 2020. doi: 10.1016/j. ejor.2020.11.016. URLhttps://hal.science/hal-03048871. Cited on page 190

  14. [14]

    rapidfuzz/RapidFuzz: Release 3.8.1

    Max Bachmann. rapidfuzz/rapidfuzz: Release 3.8.1, April 2024. URLhttps: //doi.org/10.5281/zenodo.10938887. Cited on page 261

  15. [15]

    Rankers, judges, and assis- tants: Towards understanding the interplay of llms in information retrieval evaluation, 2025

    Krisztian Balog, Donald Metzler, and Zhen Qin. Rankers, judges, and assis- tants: Towards understanding the interplay of llms in information retrieval evaluation, 2025. URLhttps://arxiv.org/abs/2503.19092. Cited on page 31. 277

  16. [16]

    MIT Press, 2023

    Solon Barocas, Moritz Hardt, and Arvind Narayanan.Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023. Cited on page 10

  17. [17]

    Evaluation Perspectives of Recommender Systems: Driving Research and Education (Dagstuhl Seminar 24211).Dagstuhl Reports, 14(5):58–172, 2024

    Christine Bauer, Alan Said, and Eva Zangerle. Evaluation Perspectives of Recommender Systems: Driving Research and Education (Dagstuhl Seminar 24211).Dagstuhl Reports, 14(5):58–172, 2024. ISSN 2192-5283. doi: 10.4230/ DagRep.14.5.58. URLhttps://drops.dagstuhl.de/entities/document/ 10.4230/DagRep.14.5.58. Cited on page 3

  18. [18]

    A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems

    Joeran Beel and Stefan Langer. A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In Sarantos Kapidakis, Cezary Mazurek, and Marcin Werla, editors, Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL), volume 9316 ofLecture Notes in ...

  19. [19]

    Yoav Benjamini and Yosef Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1 1995. ISSN 2517-6161. doi: 10.1111/J.2517-6161.1995.TB02031.X. Cited on page 59

  20. [20]

    Chi, and Cristos Goodrow

    Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, and Cristos Goodrow. Fairness in rec- ommendation ranking through pairwise comparisons. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, page 2212–2220, New York, NY, USA, 2019. Associa...

  21. [21]

    Biega, Krishna P

    Asia J. Biega, Krishna P. Gummadi, and Gerhard Weikum. Equity of attention: Amortizing individual fairness in rankings. In41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, volume 18, pages 405–414. Association for Computing Machinery, Inc, 6 2018. ISBN 9781450356572. doi: 10.1145/3209978.3210063. URL...

  22. [22]

    Toward Fair Recommendation in Two-sided Platforms

    Arpita Biswas, Gourab K Patro, Niloy Ganguly, Krishna P Gummadi, and Abhijnan Chakraborty. Toward Fair Recommendation in Two-sided Platforms. ACM Trans. Web, 16(2), 12 2021. ISSN 1559-1131. doi: 10.1145/3503624. URL https://doi.org/10.1145/3503624. Cited on pages 25, 225, 226, and 227. 278

  23. [23]

    Springer Netherlands, Dordrecht, 1999

    Charles Blackorby, Walter Bossert, and David Donaldson.Income Inequality Measurement: The Normative Approach, pages 133–161. Springer Netherlands, Dordrecht, 1999. ISBN 978-94-011-4413-1. doi: 10.1007/978-94-011-4413-1 4. URLhttps://doi.org/10.1007/978-94-011-4413-1_4. Cited on pages 254 and 265

  24. [24]

    Learning Recom- mendations from User Actions in the Item-poor Insurance Domain

    Simone Borg Bruun, Maria Maistro, and Christina Lioma. Learning Recom- mendations from User Actions in the Item-poor Insurance Domain. InRec- Sys 2022 - Proceedings of the 16th ACM Conference on Recommender Sys- tems, pages 113–123. Association for Computing Machinery, Inc, 9 2022. ISBN 9781450392785. doi: 10.1145/3523227.3546775. URLhttps://dl.acm.org/ d...

  25. [25]

    Enhancing Long Term Fairness in Recommendations with Variational Autoencoders

    Rodrigo Borges and Kostas Stefanidis. Enhancing Long Term Fairness in Recommendations with Variational Autoencoders. InProceedings of the 11th International Conference on Management of Digital EcoSystems, New York, NY, USA, 2019. ACM. ISBN 9781450362382. doi: 10.1145/3297662. URL https://doi.org/10.1145/3297662.3365798. Cited on pages 30, 71, 103, 104, 10...

  26. [26]

    Decomposable Income Inequality Measures.Economet- rica, 47(4):901–920, 1979

    Francois Bourguignon. Decomposable Income Inequality Measures.Economet- rica, 47(4):901–920, 1979. Cited on pages 15, 254, and 265

  27. [27]

    Indi- vidually Fair Ranking

    Amanda Bower, Hamid Eftekhari, Mikhail Yurochkin, and Yuekai Sun. Indi- vidually Fair Ranking. InICLR 2021 - 9th International Conference on Learn- ing Representations. OpenReview.net, 2021. URLhttps://openreview.net/ forum?id=71zCSP_HuBN. Cited on page 102

  28. [28]

    Voorhees

    Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure sta- bility. InProceedings of the 23rd Annual International ACM SIGIR Confer- ence on Research and Development in Information Retrieval, SIGIR ’00, page 33–40, New York, NY, USA, 2000. Association for Computing Machinery. ISBN 1581132263. doi: 10.1145/345508.345543. URLhttps://doi.org/10.1...

  29. [29]

    Multisided Fairness for Recommendation

    Robin Burke. Multisided fairness for recommendation, 2017. URLhttps: //arxiv.org/abs/1707.00093. Cited on page 9

  30. [30]

    Balanced Neigh- borhoods for Multi-sided Fairness in Recommendation, 1 2018

    Robin Burke, Nasim Sonboli, and Aldo Ordonez-Gauger. Balanced Neigh- borhoods for Multi-sided Fairness in Recommendation, 1 2018. ISSN 2640- 279

  31. [31]

    Cited on pages 1 and 12

    URLhttps://proceedings.mlr.press/v81/burke18a.html. Cited on pages 1 and 12

  32. [32]

    De-centering the (Traditional) User: Multistake- holder Evaluation of Recommender Systems, 2025

    Robin Burke, Gediminas Adomavicius, Toine Bogers, Tommaso Di Noia, Do- minik Kowald, Julia Neidhardt, ¨Ozlem ¨Ozg¨ obek, Maria Soledad Pera, Nava Tintarev, and J¨ urgen Ziegler. De-centering the (Traditional) User: Multistake- holder Evaluation of Recommender Systems, 2025. URLhttps://arxiv.org/ abs/2501.05170. Cited on page 182

  33. [33]

    Inherent Limitations of AI Fairness.Commun

    Maarten Buyl and Tijl De Bie. Inherent Limitations of AI Fairness.Commun. ACM, 67(2):48–55, 1 2024. ISSN 0001-0782. doi: 10.1145/3624700. URL https://doi.org/10.1145/3624700. Cited on pages 229 and 242

  34. [34]

    Offline evaluation options for recommender systems

    Roc´ ıo Ca˜ namares, Pablo Castells, and Alistair Moffat. Offline evaluation options for recommender systems.Information Retrieval Journal, 23(4): 387–410, 2020. ISSN 15737659. doi: 10.1007/s10791-020-09371-3. URL https://doi.org/10.1007/s10791-020-09371-3. Cited on pages 6, 7, 8, and 142

  35. [35]

    2nd Workshop on Infor- mation Heterogeneity and Fusion in Recommender Systems (HetRec 2011)

    Iv´ an Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2nd Workshop on Infor- mation Heterogeneity and Fusion in Recommender Systems (HetRec 2011). InProceedings of the 5th ACM conference on Recommender systems, RecSys 2011, New York, NY, USA, 2011. ACM. Cited on pages 110, 150, 194, 195, and 230

  36. [36]

    Offline recommender system evaluation: Challenges and new directions.AI Magazine, 43(2):225–238, 2022

    Pablo Castells and Alistair Moffat. Offline recommender system evaluation: Challenges and new directions.AI Magazine, 43(2):225–238, 2022. doi: https://doi.org/10.1002/aaai.12051. URLhttps://onlinelibrary.wiley. com/doi/abs/10.1002/aaai.12051. Cited on pages 7, 8, and 29

  37. [37]

    Hurley, and Saul Vargas.Novelty and Diversity in Recommender Systems, pages 881–918

    Pablo Castells, Neil J. Hurley, and Saul Vargas.Novelty and Diversity in Recommender Systems, pages 881–918. Springer US, Boston, MA, 2015. ISBN 978-1-4899-7637-6. doi: 10.1007/978-1-4899-7637-6 26. URLhttps://doi. org/10.1007/978-1-4899-7637-6_26. Cited on page 7

  38. [38]

    Springer,

    O Celma.Music Recommendation and Discovery in the Long Tail. Springer,

  39. [39]

    Pareto optimality in multiobjective problems.Applied Mathe- matics & Optimization, 4(1):41–59, 3 1977

    Yair Censor. Pareto optimality in multiobjective problems.Applied Mathe- matics & Optimization, 4(1):41–59, 3 1977. ISSN 14320606. doi: 10.1007/ 280 BF01442131/METRICS. URLhttps://link.springer.com/article/10. 1007/BF01442131. Cited on page 189

  40. [40]

    The origins of the Gini index: extracts from Variabilit` a e Mutabilit` a (1912) by Corrado Gini.J Econ Inequal, 10:421–443,

    Lidia Ceriani and Paolo Verme. The origins of the Gini index: extracts from Variabilit` a e Mutabilit` a (1912) by Corrado Gini.J Econ Inequal, 10:421–443,

  41. [41]

    URLhttp://www.umass.edu/wsp/ statistics/tales/gini.html

    doi: 10.1007/s10888-011-9188-x. URLhttp://www.umass.edu/wsp/ statistics/tales/gini.html. Cited on page 38

  42. [42]

    FairGap: Fairness-Aware Recommendation via Generating Counterfactual Graph.ACM Trans

    Wei Chen, Yiqing Wu, Zhao Zhang, Fuzhen Zhuang, Zhongshi He, Ruobing Xie, and Feng Xia. FairGap: Fairness-Aware Recommendation via Generating Counterfactual Graph.ACM Trans. Inf. Syst., 42(4), 2 2024. ISSN 1046-8188. doi: 10.1145/3638352. URLhttps://doi.org/10.1145/3638352. Cited on page 242

  43. [43]

    Investigating user-side fairness in outcome and process for multi-type sensitive attributes in recommendations

    Weixin Chen, Li Chen, and Yuhan Zhao. Investigating user-side fairness in outcome and process for multi-type sensitive attributes in recommendations. ACM Trans. Recomm. Syst., April 2025. doi: 10.1145/3731568. URLhttps: //doi.org/10.1145/3731568. Just Accepted. Cited on page 10

  44. [44]

    Bruce Croft

    Sachin Pathiyan Cherumanal, Damiano Spina, Falk Scholer, and W. Bruce Croft. Evaluating Fairness in Argument Retrieval. InIn- ternational Conference on Information and Knowledge Management, Proceedings, pages 3363–3367. Association for Computing Machinery, 10 2021. ISBN 9781450384469. doi: 10.1145/3459637.3482099. URL https://dl.acm.org/doi/10.1145/345963...

  45. [45]

    Q&R: A Two-Stage Approach toward Interactive Recommendation

    Konstantina Christakopoulou, Alex Beutel, Rui Li, Sagar Jain, and Ed H Chi. Q&R: A Two-Stage Approach toward Interactive Recommendation. InPro- ceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, pages 139–148, New York, NY, USA,

  46. [46]

    ISBN 9781450355520

    Association for Computing Machinery. ISBN 9781450355520. doi: 10. 1145/3219819.3219894. URLhttps://doi.org/10.1145/3219819.3219894. Cited on page 140

  47. [47]

    Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alam- mar, Milad Alizadeh, Yazeed Alnumay, Sophia Althammer, Arkady Arkhang- orodsky, Viraat Aryabumi, Dennis Aumiller, Rapha¨ el Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Bj¨ orn Bebensee, 281 Neeral Beladia, Walter Beller-Morales, Alexandre B´ erard, Andrew ...

  48. [48]

    Performance of rec- ommender algorithms on top-n recommendation tasks

    Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. Performance of rec- ommender algorithms on top-n recommendation tasks. InProceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, page 39–46, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605589060. doi: 10.1145/1864708.1864721. URLhttps://doi.org/10. 1145/1864...

  49. [49]

    Mapping the Margins: Intersectionality, Identity Politics, and Violence against Women of Color.Stanford Law Review, 43(6):1241–1299,

    Kimberle Crenshaw. Mapping the Margins: Intersectionality, Identity Politics, and Violence against Women of Color.Stanford Law Review, 43(6):1241–1299,

  50. [50]

    URLhttp://www.jstor.org/stable/1229039

    ISSN 00389765. URLhttp://www.jstor.org/stable/1229039. Cited on pages 11 and 102

  51. [51]

    Offline eval- uation of recommender systems in a user interface with multiple carousels

    Maurizio Ferrari Dacrema, Nicol` o Felicioni, and Paolo Cremonesi. Offline eval- uation of recommender systems in a user interface with multiple carousels. Frontiers in Big Data, 5:910030, 6 2022. ISSN 2624909X. doi: 10.3389/FDATA. 2022.910030/BIBTEX. URLwww.frontiersin.org. Cited on pages 5 and 8

  52. [52]

    The youtube video recommendation system

    James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. The youtube video recommendation system. In Proceedings of the Fourth ACM Conference on Recommender Systems, Rec- Sys ’10, page 293–296, New York, NY, USA, 2010. Association for Comput- ing ...

  53. [53]

    Imputed rents and regional income inequality in turkey: A subgroup decomposition of the atkinson index.Regional Studies, 40(8):889–905, 2006

    Meltem Dayio˘ glu and Cem Ba¸ slevent. Imputed rents and regional income inequality in turkey: A subgroup decomposition of the atkinson index.Regional Studies, 40(8):889–905, 2006. doi: 10.1080/00343400600984395. URLhttps: //doi.org/10.1080/00343400600984395. Cited on pages 254 and 265. 283

  54. [54]

    Fernando G. De Maio. Income inequality measures.Journal of Epidemiol- ogy and Community Health, 61(10):849, 10 2007. ISSN 0143005X. doi: 10. 1136/JECH.2006.052969. URLhttps://pmc.ncbi.nlm.nih.gov/articles/ PMC2652960/. Cited on page 265

  55. [55]

    DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

  56. [56]

    Cfairllm: Consumer fairness eval- uation in large-language model recommender system.ACM Trans

    Yashar Deldjoo and Tommaso Di Noia. Cfairllm: Consumer fairness eval- uation in large-language model recommender system.ACM Trans. Intell. Syst. Technol., March 2025. ISSN 2157-6904. doi: 10.1145/3725853. URL https://doi.org/10.1145/3725853. Just Accepted. Cited on pages 11, 31, 252, and 253

  57. [57]

    A normative framework for benchmark- ing consumer fairness in large language model recommender system, 2024

    Yashar Deldjoo and Fatemeh Nazary. A normative framework for benchmark- ing consumer fairness in large language model recommender system, 2024. URL https://arxiv.org/abs/2405.02219. Cited on pages 31 and 253

  58. [58]

    Recommender Systems Fairness Evaluation via General- ized Cross Entropy

    Yashar Deldjoo, Vito Walter Anelli, Hamed Zamani, Alejandro Bellog´ ın, and Tommaso Di Noia. Recommender Systems Fairness Evaluation via General- ized Cross Entropy. InProceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019). CEUR-WS, 2019. Cited on pages 1...

  59. [59]

    A flexible framework for evaluating user and item fairness in recommender systems.User Modeling and User-Adapted Interaction, 31:457–511, 2021

    Yashar Deldjoo, Vito Walter Anelli, Hamed Zamani, Alejandro Bellog´ ın,• Tommaso, and Di Noia. A flexible framework for evaluating user and item fairness in recommender systems.User Modeling and User-Adapted Interaction, 31:457–511, 2021. doi: 10.1007/s11257-020-09285-1. URLhttps://doi.org/ 10.1007/s11257-020-09285-1. Cited on pages 253 and 264

  60. [60]

    Fairness in recommender systems: research landscape and future directions.User Modeling and User-Adapted Interaction, 34(1):59– 108, 2024

    Yashar Deldjoo, Dietmar Jannach, Alejandro Bellogin, Alessandro Difonzo, and Dario Zanzonelli. Fairness in recommender systems: research landscape and future directions.User Modeling and User-Adapted Interaction, 34(1):59– 108, 2024. ISSN 1573-1391. doi: 10.1007/s11257-023-09364-z. URLhttps: //doi.org/10.1007/s11257-023-09364-z. Cited on pages 2, 6, 9, 12...

  61. [61]

    Item-based top-N recommendation algorithms.ACM Transactions on Information Systems, 22(1):143–177, 1 2004

    Mukund Deshpande and George Karypis. Item-based top-N recommendation algorithms.ACM Transactions on Information Systems, 22(1):143–177, 1 2004. ISSN 10468188. doi: 10.1145/963770.963776. URLhttps://dl.acm.org/doi/ 10.1145/963770.963776. Cited on pages 6, 53, 110, 151, 195, and 230

  62. [62]

    Ekstrand

    Fernando Diaz, Bhaskar Mitra, Michael D Ekstrand, Asia J Biega, and Ben Carterette. Evaluating Stochastic Rankings with Expected Exposure. InPro- 285 ceedings of the 29th ACM International Conference on Information & Knowl- edge Management, New York, NY, USA, 2020. ACM. ISBN 9781450368599. doi: 10.1145/3340531. URLhttps://doi.org/10.1145/3340531.3411962. ...

  63. [63]

    Amplifying artists’ voices: Item provider perspectives on influence and fairness of music streaming platforms

    Karlijn Dinnissen and Christine Bauer. Amplifying artists’ voices: Item provider perspectives on influence and fairness of music streaming platforms. InProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization, UMAP ’23, page 238–249, New York, NY, USA, 2023. Associ- ation for Computing Machinery. ISBN 9781450399326. doi: 10....

  64. [64]

    Theses, Universit´ e Paris sciences et lettres, July 2023

    Virginie Do.Fairness in recommender systems: insights from social choice. Theses, Universit´ e Paris sciences et lettres, July 2023. URLhttps://theses. hal.science/tel-04213955. Cited on page 249

  65. [65]

    InACM Conference on Research and Development in Information Retrieval (SIGIR)

    Virginie Do and Nicolas Usunier. Optimizing Generalized Gini Indices for Fairness in Rankings. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, volume 1, pages 737–747, New York, NY, USA, 2022. ACM. ISBN 9781450387323. doi: 10.1145/3477495. URLhttps://doi.org/10.1145/3477495.3532035. Cited ...

  66. [66]

    Two-sided fairness in rankings via Lorenz dominance

    Virginie Do, Sam Corbett-Davies, Jamal Atif, and Nicolas Usunier. Two-sided fairness in rankings via Lorenz dominance. InAdvances in Neural Information Processing Systems, volume 34, pages 8596–8608, 2021. Cited on pages 20, 35, 36, 38, 39, 48, and 71

  67. [67]

    Online Certification of Preference-Based Fairness for Personalized Recommender Sys- tems.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6): 6532–6540, 6 2022

    Virginie Do, Sam Corbett-Davies, Jamal Atif, and Nicolas Usunier. Online Certification of Preference-Based Fairness for Personalized Recommender Sys- tems.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6): 6532–6540, 6 2022. doi: 10.1609/aaai.v36i6.20606. URLhttps://ojs.aaai. org/index.php/AAAI/article/view/20606. Cited on pages 25, 225...

  68. [68]

    Yijiang River Dong, Tiancheng Hu, and Nigel Collier. Can LLM be a person- alized judge? In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, edi- tors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10126–10141, Miami, Florida, USA, November 2024. Association for 286 Computational Linguistics. doi: 10.18653/v1/2024.findings-e...

  69. [69]

    Fairness through awareness , isbn =

    Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. InITCS 2012 - Innovations in Theoretical Computer Science Conference, pages 214–226, 2012. ISBN 9781450311151. doi: 10.1145/2090236.2090255. URLhttps://dl.acm.org/doi/10.1145/ 2090236.2090255. Cited on pages 3, 10, 14, 17, 25, 40, 102, 111, 125, 151...

  70. [70]

    All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Rec- ommender Evaluation and Effectiveness

    Michael D Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D Ekstrand, Oghenemaro Anuyah, David McNeill, and Maria Soledad Pera. All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Rec- ommender Evaluation and Effectiveness. In Sorelle A Friedler and Christo Wilson, editors,Proceedings of the 1st Conference on Fairness, Account...

  71. [71]

    Ekstrand, Anubrata Das, Robin Burke, and Fernando Diaz

    Michael D. Ekstrand, Anubrata Das, Robin Burke, and Fernando Diaz. Fair- ness in information access systems.Foundations and Trends®in Information Retrieval, 16(1-2):1–177, 2022. ISSN 1554-0669. doi: 10.1561/1500000079. URLhttp://dx.doi.org/10.1561/1500000079. Cited on pages 2, 4, 8, 10, 11, 29, 102, 125, and 180

  72. [72]

    Distributionally- Informed Recommender System Evaluation.ACM Trans

    Michael D Ekstrand, Ben Carterette, and Fernando Diaz. Distributionally- Informed Recommender System Evaluation.ACM Trans. Recomm. Syst., 8 2023. doi: 10.1145/3613455. URLhttps://doi.org/10.1145/3613455. Cited on page 181

  73. [73]

    Beyond algorithmic fairness in recommender systems

    Mehdi Elahi, Himan Abdollahpouri, Masoud Mansoury, and Helma Torka- maan. Beyond algorithmic fairness in recommender systems. InAdjunct Pro- ceedings of the 29th ACM Conference on User Modeling, Adaptation and Per- sonalization, UMAP ’21, page 41–46, New York, NY, USA, 2021. Associa- tion for Computing Machinery. ISBN 9781450383677. doi: 10.1145/3450614. ...

  74. [74]

    Subgroup decomposability of income-related inequality of health, with an application to 287 australia.Economic Record, 94(304):39–50, 2018

    Guido Erreygers, Roselinde Kessels, Linkun Chen, and Philip Clarke. Subgroup decomposability of income-related inequality of health, with an application to 287 australia.Economic Record, 94(304):39–50, 2018. doi: https://doi.org/10. 1111/1475-4932.12373. URLhttps://onlinelibrary.wiley.com/doi/abs/ 10.1111/1475-4932.12373. Cited on page 265

  75. [75]

    Regulation (EU) 2024/1689: Artificial Intelligence Act, 2024

    European Parliament and Council. Regulation (EU) 2024/1689: Artificial Intelligence Act, 2024. URLhttps://eur-lex.europa.eu/eli/reg/2024/ 1689/oj/eng. Recital 27 discusses principles for trustworthy and ethical AI. Cited on page 1

  76. [76]

    Young people on the labour market - statistics, 2018

    Eurostat. Young people on the labour market - statistics, 2018. URL https://ec.europa.eu/eurostat/statistics-explained/index.php? title=Young_people_on_the_labour_market_-_statistics. Cited on page 252

  77. [77]

    Pairwise Fairness in Ranking as a Dissatisfaction Measure

    Alessandro Fabris, Gianmaria Silvello, Gian Antonio Susto, and Asia J Biega. Pairwise Fairness in Ranking as a Dissatisfaction Measure. InProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM ’23, pages 931–939, New York, NY, USA, 2023. Association for Com- puting Machinery. ISBN 9781450394079. doi: 10.1145/3539597....

  78. [78]

    Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian

    Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, page 259–268, New York, NY, USA,

  79. [79]

    ISBN 9781450336642

    Association for Computing Machinery. ISBN 9781450336642. doi: 10. 1145/2783258.2783311. URLhttps://doi.org/10.1145/2783258.2783311. Cited on page 11

  80. [80]

    Break the loop: Gen- der imbalance in music recommenders

    Andres Ferraro, Xavier Serra, and Christine Bauer. Break the loop: Gen- der imbalance in music recommenders. InProceedings of the 2021 Con- ference on Human Information Interaction and Retrieval, CHIIR ’21, page 249–254, New York, NY, USA, 2021. Association for Computing Machin- ery. ISBN 9781450380553. doi: 10.1145/3406522.3446033. URLhttps: //doi.org/10...

Showing first 80 references.