Unbiased Learning to Rank: Counterfactual and Online Approaches

Harrie Oosterhuis; Maarten de Rijke; Rolf Jagerman

arxiv: 1907.07260 · v1 · pith:DWRALWZZnew · submitted 2019-07-16 · 💻 cs.IR

Unbiased Learning to Rank: Counterfactual and Online Approaches

Harrie Oosterhuis , Rolf Jagerman , Maarten de Rijke This is my paper

Pith reviewed 2026-05-24 20:26 UTC · model grok-4.3

classification 💻 cs.IR

keywords unbiased learning to rankcounterfactual LTRonline LTRposition biasimplicit feedbackuser interactionsranking systemsbias correction

0 comments

The pith

Both counterfactual and online methods achieve unbiased learning to rank from biased user feedback but differ substantially in guarantees, performance, user effects, and applicability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This tutorial establishes that counterfactual LTR corrects biases in historical interaction data through explicit models, while online LTR removes bias effects via randomization during live user interactions. Both approaches aim to produce unbiased rankings despite position bias and other distortions in implicit feedback. A sympathetic reader would care because the documented differences affect which method suits a given search system. The paper contrasts their theoretical guarantees, empirical results, impacts on users during learning, and practical applicability to guide selection. It positions the overview as an essential reference for understanding the trade-offs without new experiments.

Core claim

The paper claims that both counterfactual LTR and online LTR lead to unbiased learning to rank, but their approaches differ considerably in theoretical guarantees, empirical results, effects on the user experience during learning, and applicability, making the choice between them substantial for practitioners who must weigh these factors when deploying systems that learn from user interactions.

What carries the argument

The side-by-side contrast of counterfactual methods, which explicitly model and correct biases in logged data, versus online methods, which rely on randomization to neutralize bias during interactive learning.

If this is right

Practitioners gain concrete criteria for selecting between historical correction and live randomization based on available data and tolerance for randomization.
Theoretical analysis can be used to predict when one method will provide stronger bias removal than the other.
Empirical benchmarks from prior work can be consulted to anticipate performance gaps in new ranking tasks.
System designers must account for different user-experience costs during the learning phase when choosing a method.
Applicability is limited by whether historical logs exist or live user traffic can be randomized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The contrast suggests that systems with strict latency or privacy constraints on live randomization may default to counterfactual approaches.
Hybrid methods could combine historical correction with selective online exploration to balance the strengths of each.
The tutorial's framing implies that bias types beyond position bias may require tailored extensions of one method over the other.
Deployment in production could benefit from monitoring metrics that the paper identifies as differing between the approaches.

Load-bearing premise

That differences in theoretical guarantees, empirical results, user experience effects, and applicability between the two methodologies can be reliably assessed and contrasted from the existing literature without new empirical validation.

What would settle it

A new controlled experiment directly comparing both methods on the same datasets and user populations that finds equivalent theoretical guarantees, empirical performance, user experience effects, and applicability would undermine the claimed substantial differences.

read the original abstract

This tutorial covers and contrasts the two main methodologies in unbiased Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been an interest in LTR from user interactions, however, this form of implicit feedback is very biased. In recent years, unbiased LTR methods have been introduced to remove the effect of different types of bias caused by user-behavior in search. For instance, a well addressed type of bias is position bias: the rank at which a document is displayed heavily affects the interactions it receives. Counterfactual LTR methods deal with such types of bias by learning from historical interactions while correcting for the effect of the explicitly modelled biases. Online LTR does not use an explicit user model, in contrast, it learns through an interactive process where randomized results are displayed to the user. Through randomization the effect of different types of bias can be removed from the learning process. Though both methodologies lead to unbiased LTR, their approaches differ considerably, furthermore, so do their theoretical guarantees, empirical results, effects on the user experience during learning, and applicability. Consequently, for practitioners the choice between the two is very substantial. By providing an overview of both approaches and contrasting them, we aim to provide an essential guide to unbiased LTR so as to aid in understanding and choosing between methodologies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A clear tutorial contrasting counterfactual and online unbiased LTR but with no new results or validation.

read the letter

This is a tutorial that summarizes and contrasts counterfactual and online approaches to unbiased learning to rank. It adds nothing new in methods, derivations, or experiments. The authors explain the shared problem of biased implicit feedback from users and show how counterfactual methods correct for modeled biases like position effects using logged data, while online methods rely on randomization during live interactions to remove bias by design. They then lay out differences in theoretical guarantees, results from prior studies, effects on users during learning, and practical applicability. That side-by-side is the main contribution and could help a practitioner decide which route to take for a real system. The paper does this in plain terms without overclaiming. The limitation is straightforward: everything rests on the accuracy of the cited literature, with no fresh checks or comparisons here to test the contrasts. That is normal for a tutorial but means the work does not strengthen the evidence on its own. Readers will still need the original papers for the details behind the claimed differences. This is aimed at information retrieval people who know basic LTR and want a map of the unbiased options. It could serve as background in a reading group. The thinking is straightforward and engages the relevant distinctions without internal contradictions. I would send it for peer review at a venue that accepts tutorials rather than desk reject, since the topic is practical and the overview is focused.

Referee Report

0 major / 0 minor

Summary. This tutorial provides an overview and contrast of the two main methodologies for unbiased learning to rank (LTR) from implicit user feedback: counterfactual LTR, which corrects for biases such as position bias using historical interaction data and explicit user models, and online LTR, which removes bias effects through interactive randomization of results without an explicit user model. The central claim is that both approaches produce unbiased LTR but differ substantially in theoretical guarantees, empirical results, effects on user experience during learning, and applicability, making the choice between them consequential for practitioners.

Significance. If the synthesis of the existing literature is accurate, the tutorial could be a useful guide for the IR community by clarifying trade-offs between established counterfactual and online methods for unbiased LTR. It explicitly positions itself as an aid for understanding and choosing methodologies rather than advancing new derivations or experiments, which aligns with the scope of a tutorial.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the tutorial and the recommendation to accept. The review accurately captures the scope and intent of the work as a synthesis and contrast of counterfactual and online approaches to unbiased LTR.

Circularity Check

0 steps flagged

Tutorial overview with no derivations or self-referential claims

full rationale

The paper is a tutorial that synthesizes and contrasts two established methodologies (counterfactual LTR and online LTR) from prior literature. No novel theorems, equations, derivations, fitted parameters, or predictions are asserted; the central claim is an overview of known differences in guarantees, results, UX, and applicability. This is self-contained against external benchmarks with no opportunity for circular reduction by construction, self-citation load-bearing, or ansatz smuggling. No steps identified.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Tutorial paper with no central mathematical or empirical claim; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5764 in / 952 out tokens · 20951 ms · 2026-05-24T20:26:16.672033+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 4 internal anchors

[1]

Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky , and Marc Najork

work page
[2]

In The World Wide Web Conference

Addressing Trust Bias for Unbiased Learning-to-Rank . In The World Wide Web Conference. ACM, 4–14

work page
[3]

Aman Agarwal, Ivan Zaitsev, and Thorsten Joachims. 2018 . Consistent position bias estimation without online interventions for learning -to-rank. arXiv preprint arXiv:1806.03555 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased learning to rank with unbiased propensity estimat ion. arXiv preprint arXiv:1804.05938 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Mike Bendersky, Xuanhui Wang, Marc Najork, and Don Metzl er. 2018. Learning with sparse and biased feedback for personal search. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence (IJCAI). 5219–5223

work page 2018
[6]

Ben Carterette and Praveen Chandar. 2018. Ofﬂine compar ative evaluation with incremental, minimally-invasive online feedback. In The 41st International ACM SIGIR Conference on Research & Development in Information R etrieval. ACM, 705–714. 1SIGIR’19 slides will be published on: http://ltr-tutorial-sigir19.isti.cnr.it/

work page 2018
[7]

Olivier Chapelle and Yi Chang. 2011. Y ahoo! Learning to r ank challenge overview. In Proceedings of the Learning to Rank Challenge . 1–24

work page 2011
[8]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 20 15. Click models for web search. Synthesis Lectures on Information Concepts, Retrieval, an d Services 7, 3 (2015), 1–115

work page 2015
[9]

Norbert Fuhr and Chris Buckley. 1991. A probabilistic le arning approach for doc- ument indexing. ACM Transactions on Information Systems (TOIS) 9, 3 (1991), 223–248

work page 1991
[10]

Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for informa- tion retrieval: SIGIR 2016 tutorial. In SIGIR. ACM, 1215–1218

work page 2016
[11]

Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maart en de Rijke. 2013. Reusing historical interaction data for faster online lear ning to rank for IR. In Proceedings of the sixth ACM international conference on We b search and data mining. ACM, 183–192

work page 2013
[12]

Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2 013. Balancing ex- ploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval 16, 1 (2013), 63–90

work page 2013
[13]

Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2 013. Fidelity, sound- ness, and efﬁciency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS) 31, 4 (2013), 17

work page 2013
[14]

Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke . 2019. To model or to intervene: A comparison of counterfactual and online lea rning to rank from user interactions. In 42nd International ACM SIGIR Conference on Research & Development in Information Retrieval . ACM, (to appear)

work page 2019
[15]

Thorsten Joachims. 2002. Optimizing search engines us ing clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 133–142

work page 2002
[16]

Thorsten Joachims. 2003. Evaluating retrieval perfor mance using clickthrough data. In Text Mining, J. Franke, G. Nakhaeizadeh, and I. Renz (Eds.). Phys- ica/Springer V erlag, 79–96

work page 2003
[17]

Thorsten Joachims and Adith Swaminathan. 2016. Counte rfactual evaluation and learning for search, recommendation and ad placement. I n Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1199–1201

work page 2016
[18]

Thorsten Joachims, Adith Swaminathan, and Tobias Schn abel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM Interna- tional Conference on Web Search and Data Mining . ACM, 781–789

work page 2017
[19]

Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2016. DCM bandits: Learning to rank with multiple clicks. arXiv preprint arXiv:1602.03146 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azi n Ashkan. 2015. Cascading bandits: Learning to rank in the cascade model. arXiv preprint arXiv:1502.02763 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[21]

Paul Lagrée, Claire V ernade, and Olivier Cappé. 2016. M ultiple-play bandits in the position-based model. In Advances in Neural Information Processing Systems. 1597–1605

work page 2016
[22]

Tie-Y an Liu. 2009. Learning to rank for information ret rieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331

work page 2009
[23]

Harrie Oosterhuis. 2018. Learning to rank and evaluati on in the online setting. 12th Russian Summer School in Information Retrieval (RuSSI R 2018). (2018)

work page 2018
[24]

Harrie Oosterhuis and Maarten de Rijke. 2017. Balancin g speed and quality in online learning to rank for information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management . ACM, 277–286

work page 2017
[25]

Harrie Oosterhuis and Maarten de Rijke. 2018. Differen tiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1293–1302

work page 2018
[26]

Harrie Oosterhuis and Maarten de Rijke. 2019. Optimizi ng Ranking Models in an Online Setting. In Advances in Information Retrieval , Leif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, and Djoer d Hiemstra (Eds.). Springer International Publishing, Cham, 382–396

work page 2019
[27]

Harrie Oosterhuis, Anne Schuth, and Maarten de Rijke. 2 016. Probabilistic multileave gradient descent. In European Conference on Information Retrieval . Springer, 661–668

work page
[28]

Filip Radlinski, Madhu Kurup, and Thorsten Joachims. 2 008. How does click- through data reﬂect retrieval quality?. In Proceedings of the 17th ACM conference on Information and knowledge management . ACM, 43–52

work page
[29]

Mark Sanderson. 2010. Test collection based evaluatio n of information retrieval systems. Foundations and Trends in Information Retrieval 4, 4 (2010), 247–375

work page 2010
[30]

Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and M aarten de Rijke. 2016. Multileave gradient descent for fast online learning to ran k. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 457–466

work page 2016
[31]

Adith Swaminathan and Thorsten Joachims. 2015. Counte rfactual risk minimiza- tion: Learning from logged bandit feedback. In International Conference on Ma- chine Learning. 814–823

work page 2015
[32]

Xuanhui Wang, Michael Bendersky, Donald Metzler, and M arc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 3 39th International ACM SIGIR conference on Research and Dev elopment in In- formation Retrieval. ACM, 115–124

work page 2016
[33]

Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Dona ld Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learni ng to rank in personal search. In Proceedings of the Eleventh ACM International Conference o n Web Search and Data Mining. ACM, 610–618

work page 2018
[34]

Yisong Y ue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning . ACM, 1201–1208

work page 2009
[35]

Yisong Y ue, Rajan Patel, and Hein Roehrig. 2010. Beyond position bias: Exam- ining result attractiveness as a source of presentation bia s in clickthrough data. In Proceedings of the 19th international conference on World w ide web . ACM, 1011–1018

work page 2010
[36]

Tong Zhao and Irwin King. 2016. Constructing Reliable G radient Exploration for Online Learning to Rank. In CIKM. ACM, 1643–1652. 4

work page 2016

[1] [1]

Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky , and Marc Najork

work page

[2] [2]

In The World Wide Web Conference

Addressing Trust Bias for Unbiased Learning-to-Rank . In The World Wide Web Conference. ACM, 4–14

work page

[3] [3]

Aman Agarwal, Ivan Zaitsev, and Thorsten Joachims. 2018 . Consistent position bias estimation without online interventions for learning -to-rank. arXiv preprint arXiv:1806.03555 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased learning to rank with unbiased propensity estimat ion. arXiv preprint arXiv:1804.05938 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Mike Bendersky, Xuanhui Wang, Marc Najork, and Don Metzl er. 2018. Learning with sparse and biased feedback for personal search. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence (IJCAI). 5219–5223

work page 2018

[6] [6]

Ben Carterette and Praveen Chandar. 2018. Ofﬂine compar ative evaluation with incremental, minimally-invasive online feedback. In The 41st International ACM SIGIR Conference on Research & Development in Information R etrieval. ACM, 705–714. 1SIGIR’19 slides will be published on: http://ltr-tutorial-sigir19.isti.cnr.it/

work page 2018

[7] [7]

Olivier Chapelle and Yi Chang. 2011. Y ahoo! Learning to r ank challenge overview. In Proceedings of the Learning to Rank Challenge . 1–24

work page 2011

[8] [8]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 20 15. Click models for web search. Synthesis Lectures on Information Concepts, Retrieval, an d Services 7, 3 (2015), 1–115

work page 2015

[9] [9]

Norbert Fuhr and Chris Buckley. 1991. A probabilistic le arning approach for doc- ument indexing. ACM Transactions on Information Systems (TOIS) 9, 3 (1991), 223–248

work page 1991

[10] [10]

Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for informa- tion retrieval: SIGIR 2016 tutorial. In SIGIR. ACM, 1215–1218

work page 2016

[11] [11]

Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maart en de Rijke. 2013. Reusing historical interaction data for faster online lear ning to rank for IR. In Proceedings of the sixth ACM international conference on We b search and data mining. ACM, 183–192

work page 2013

[12] [12]

Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2 013. Balancing ex- ploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval 16, 1 (2013), 63–90

work page 2013

[13] [13]

Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2 013. Fidelity, sound- ness, and efﬁciency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS) 31, 4 (2013), 17

work page 2013

[14] [14]

Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke . 2019. To model or to intervene: A comparison of counterfactual and online lea rning to rank from user interactions. In 42nd International ACM SIGIR Conference on Research & Development in Information Retrieval . ACM, (to appear)

work page 2019

[15] [15]

Thorsten Joachims. 2002. Optimizing search engines us ing clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 133–142

work page 2002

[16] [16]

Thorsten Joachims. 2003. Evaluating retrieval perfor mance using clickthrough data. In Text Mining, J. Franke, G. Nakhaeizadeh, and I. Renz (Eds.). Phys- ica/Springer V erlag, 79–96

work page 2003

[17] [17]

Thorsten Joachims and Adith Swaminathan. 2016. Counte rfactual evaluation and learning for search, recommendation and ad placement. I n Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1199–1201

work page 2016

[18] [18]

Thorsten Joachims, Adith Swaminathan, and Tobias Schn abel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM Interna- tional Conference on Web Search and Data Mining . ACM, 781–789

work page 2017

[19] [19]

Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2016. DCM bandits: Learning to rank with multiple clicks. arXiv preprint arXiv:1602.03146 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[20] [20]

Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azi n Ashkan. 2015. Cascading bandits: Learning to rank in the cascade model. arXiv preprint arXiv:1502.02763 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[21] [21]

Paul Lagrée, Claire V ernade, and Olivier Cappé. 2016. M ultiple-play bandits in the position-based model. In Advances in Neural Information Processing Systems. 1597–1605

work page 2016

[22] [22]

Tie-Y an Liu. 2009. Learning to rank for information ret rieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331

work page 2009

[23] [23]

Harrie Oosterhuis. 2018. Learning to rank and evaluati on in the online setting. 12th Russian Summer School in Information Retrieval (RuSSI R 2018). (2018)

work page 2018

[24] [24]

Harrie Oosterhuis and Maarten de Rijke. 2017. Balancin g speed and quality in online learning to rank for information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management . ACM, 277–286

work page 2017

[25] [25]

Harrie Oosterhuis and Maarten de Rijke. 2018. Differen tiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1293–1302

work page 2018

[26] [26]

Harrie Oosterhuis and Maarten de Rijke. 2019. Optimizi ng Ranking Models in an Online Setting. In Advances in Information Retrieval , Leif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, and Djoer d Hiemstra (Eds.). Springer International Publishing, Cham, 382–396

work page 2019

[27] [27]

Harrie Oosterhuis, Anne Schuth, and Maarten de Rijke. 2 016. Probabilistic multileave gradient descent. In European Conference on Information Retrieval . Springer, 661–668

work page

[28] [28]

Filip Radlinski, Madhu Kurup, and Thorsten Joachims. 2 008. How does click- through data reﬂect retrieval quality?. In Proceedings of the 17th ACM conference on Information and knowledge management . ACM, 43–52

work page

[29] [29]

Mark Sanderson. 2010. Test collection based evaluatio n of information retrieval systems. Foundations and Trends in Information Retrieval 4, 4 (2010), 247–375

work page 2010

[30] [30]

Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and M aarten de Rijke. 2016. Multileave gradient descent for fast online learning to ran k. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 457–466

work page 2016

[31] [31]

Adith Swaminathan and Thorsten Joachims. 2015. Counte rfactual risk minimiza- tion: Learning from logged bandit feedback. In International Conference on Ma- chine Learning. 814–823

work page 2015

[32] [32]

Xuanhui Wang, Michael Bendersky, Donald Metzler, and M arc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 3 39th International ACM SIGIR conference on Research and Dev elopment in In- formation Retrieval. ACM, 115–124

work page 2016

[33] [33]

Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Dona ld Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learni ng to rank in personal search. In Proceedings of the Eleventh ACM International Conference o n Web Search and Data Mining. ACM, 610–618

work page 2018

[34] [34]

Yisong Y ue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning . ACM, 1201–1208

work page 2009

[35] [35]

Yisong Y ue, Rajan Patel, and Hein Roehrig. 2010. Beyond position bias: Exam- ining result attractiveness as a source of presentation bia s in clickthrough data. In Proceedings of the 19th international conference on World w ide web . ACM, 1011–1018

work page 2010

[36] [36]

Tong Zhao and Irwin King. 2016. Constructing Reliable G radient Exploration for Online Learning to Rank. In CIKM. ACM, 1643–1652. 4

work page 2016