Following the Eye-Tracking Evidence: Established Web-Search Assumptions Fail in Carousel Interfaces

Harrie Oosterhuis; Jingwei Kang; Maarten de Rijke

arxiv: 2604.21019 · v1 · submitted 2026-04-22 · 💻 cs.IR · cs.HC

Following the Eye-Tracking Evidence: Established Web-Search Assumptions Fail in Carousel Interfaces

Jingwei Kang , Maarten de Rijke , Harrie Oosterhuis This is my paper

Pith reviewed 2026-05-09 23:03 UTC · model grok-4.3

classification 💻 cs.IR cs.HC

keywords carousel interfaceseye trackinguser examinationclick modelsrecommendation systemsweb search assumptionsuser behavior

0 comments

The pith

Eye-tracking data from carousel interfaces shows that web-search assumptions about scanning patterns and click behavior do not hold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether user behaviors established in web-search lists, such as the F-pattern of visual scanning and the examination hypothesis that views predict clicks, carry over to the horizontal swiping format of carousel interfaces common in streaming services. Using a released eye-tracking dataset, the authors measure gaze paths and clicks and find systematic deviations: horizontal movement lacks the F-pattern, clicks correlate with an L-shaped examination path instead, and the expected link between examination and clicks is absent. Users also bypass headings to reach items directly. These results matter because many ranking, evaluation, and personalization systems for recommendations still rely on the transferred assumptions, so the models may be systematically miscalibrated.

Core claim

The analysis of eye-tracking recordings reveals that the F-pattern applies only to vertical examination and not to horizontal swiping; conditioned on a click, examination traces an L-pattern specific to carousels; the examination hypothesis fails to predict which items receive clicks; and users ignore carousel headings while focusing immediately on the displayed items. These patterns contradict the assumptions imported from single-list web search.

What carries the argument

Comparison of gaze sequences and click logs recorded during controlled carousel browsing against the F-pattern and examination hypothesis imported from web-search studies.

If this is right

Click models that assume examination precedes and causes clicks must be rebuilt for carousel settings.
Offline evaluation metrics that embed position bias from web-search lists will misrank items in carousels.
Interface designs that place important information in headings are likely to be overlooked by users.
Behavioral models for recommendation need separate parameters for vertical and horizontal examination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers could test whether separating vertical and horizontal interaction logs improves personalization accuracy in live recommendation systems.
The L-pattern finding raises the question of whether similar gaze shapes appear in other swipe-based mobile interfaces such as social feeds.

Load-bearing premise

The eye-tracking dataset accurately records representative examination and clicking behavior without artifacts from the laboratory setup or participant pool that would create the observed L-pattern or hypothesis failures.

What would settle it

A new eye-tracking study on a different carousel interface that records the classic F-pattern across both directions or finds that examined items receive clicks at rates predicted by the examination hypothesis would contradict the central claims.

Figures

Figures reproduced from arXiv: 2604.21019 by Harrie Oosterhuis, Jingwei Kang, Maarten de Rijke.

**Figure 2.** Figure 2: Design of a screen in the RecGaze eye-tracking [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Empirical examination frequency heatmap across positions and pages. The number in each small unit represents the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: In context of evaluating Assumption 2: The raw and machine-learning–smoothed examination frequency, conditioned on a click at a specific position. Each subplot corresponds to one conditioning position (highlighted by a white dot and labeled above); for the empirical heatmap, the number of observed clicks at that position (𝑛) is additionally annotated. Within a subplot, each cell represents the raw or smoot… view at source ↗

**Figure 5.** Figure 5: Click frequency given examination per item position in percentage. The positions [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Empirical frequency of item positions being examined earlier than their corresponding title, conditioned on examina [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Carousel interfaces have been the de-facto standard for streaming media services for over a decade. Yet, there has been very little research into user behavior with such interfaces, which thus remains poorly understood. Due to this lack of empirical research, previous work has assumed that behaviors established in single-list web-search interfaces, such as the F-pattern and the examination hypothesis, also apply to carousel interfaces, for instance when designing click models or evaluation metrics. We analyze a recently-released interaction and examination dataset resulting from an eye-tracking study performed on carousel interfaces to verify whether these assumptions actually hold. We find that (i)~the F-pattern holds only for vertical examination and not for horizontal swiping; additionally, we discover that, when conditioned on a click, user examination follows an L-pattern unique to carousel interfaces; (ii)~click-through-rates conditioned on examination indicate that the well-known examination hypothesis does not hold in carousel interfaces; and (iii)~contrary to the assumptions of previous work, users generally ignore carousel headings and focus directly on the content items. Our findings show that many user behavior assumptions, especially concerning examination patterns, do not transfer from web search interfaces to carousel recommendation settings. Our work shows that the field lacks a reliable foundation on which to build models of user behavior with these interfaces. Consequently, a re-evaluation of existing metrics and click models for carousel interfaces may be warranted.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Eye-tracking data shows F-pattern and examination hypothesis from web search do not transfer to carousels, with an L-pattern appearing on clicks and users ignoring headings.

read the letter

The core point is that standard assumptions about how users scan and click do not carry over from single-list web search to horizontal carousel interfaces. The paper uses eye-tracking data to show the F-pattern is limited to vertical movement, that clicks correlate with an L-shaped examination path instead, that click-through rates do not rise with more examination as the classic hypothesis predicts, and that users largely skip the headings altogether. These are direct empirical observations rather than model fits or simulations. The work is useful because it takes a recently released dataset and tests the transfer of long-standing IR assumptions in a setting that billions of users encounter daily in streaming apps. That kind of targeted disconfirmation is worth having on record. The analysis is grounded in external observations instead of circular derivations, which keeps the claims falsifiable. The main limitation is that everything rests on a single eye-tracking study whose collection details, fixation thresholds, participant count, task instructions, and statistical controls are not visible in the abstract. If those choices introduced interface-specific artifacts or small-sample effects, the L-pattern and hypothesis failure could be narrower than presented. The paper does not report error bars or sensitivity checks in the summary, so the strength of the conclusions depends on the methods section holding up. This is relevant for researchers who build click models, evaluation metrics, or ranking algorithms for carousel recommendation. Anyone working on real-world streaming or media interfaces would get concrete value from seeing which old assumptions break. It is worth sending to peer review so referees can check the data processing and decide how far the non-transfer generalizes.

Referee Report

3 major / 2 minor

Summary. The manuscript analyzes a recently-released eye-tracking dataset collected on carousel interfaces to test transfer of web-search user behavior assumptions, including the F-pattern, examination hypothesis, and attention to headings. The authors report that the F-pattern holds only for vertical (not horizontal) examination, an L-pattern emerges when examination is conditioned on clicks, click-through rates conditioned on examination do not support the examination hypothesis, and users largely ignore headings in favor of content items. They conclude that these assumptions fail to transfer and that metrics and click models for carousels require re-evaluation.

Significance. If the empirical patterns are robust, the work is significant for information retrieval and recommender systems because carousel interfaces dominate streaming and recommendation platforms yet lack dedicated behavioral models. The eye-tracking methodology provides direct evidence of examination behavior beyond click logs, strengthening the case against direct transfer of web-search findings. This could motivate interface-specific click models and evaluation metrics.

major comments (3)

[§4.2] §4.2 (Examination hypothesis analysis): The claim that CTR conditioned on examination rejects the examination hypothesis does not report the number of users, total examinations, or statistical tests (e.g., regression coefficients or p-values). Without these, it is impossible to assess whether the null result reflects a true interface difference or insufficient power, directly affecting the load-bearing conclusion that the hypothesis fails.
[§4.1] §4.1 (L-pattern result): The L-pattern (examination conditioned on click) is presented as unique to carousels, but the section provides no operational definition of 'examination' (fixation duration threshold or AOI boundaries), no count of conditioned clicks, and no comparison to a within-study web-search baseline. These omissions make it difficult to rule out measurement artifacts or task differences as drivers of the reported pattern.
[§3] §3 (Dataset and methods): Participant count, demographics, task instructions, exclusion criteria, and eye-tracking preprocessing steps (e.g., fixation detection parameters) are not detailed. Because all central claims rest on patterns extracted from this single dataset, missing methodological specifics prevent evaluation of whether study design choices artifactually produce the L-pattern or hypothesis failures.

minor comments (2)

[Abstract] The abstract refers to 'a recently-released' dataset without a citation; adding the reference in §3 or the introduction would improve traceability.
[Figures] Figure captions for the L-pattern and F-pattern visualizations could explicitly state the conditioning (e.g., 'conditioned on click') and axis scales to aid interpretation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below. Where the concerns identify gaps in reporting or clarity, we have revised the manuscript to incorporate the requested details and strengthen the presentation of our results.

read point-by-point responses

Referee: [§4.2] §4.2 (Examination hypothesis analysis): The claim that CTR conditioned on examination rejects the examination hypothesis does not report the number of users, total examinations, or statistical tests (e.g., regression coefficients or p-values). Without these, it is impossible to assess whether the null result reflects a true interface difference or insufficient power, directly affecting the load-bearing conclusion that the hypothesis fails.

Authors: We agree that explicit reporting of sample sizes, examination counts, and statistical tests is necessary to allow readers to evaluate the strength of the evidence against the examination hypothesis. In the revised manuscript we will state the number of users and total examinations drawn from the public dataset, and we will add a statistical test (logistic regression or equivalent) with coefficients and p-values to quantify the relationship between examination and click probability. This will directly address concerns about statistical power and support the claim that the hypothesis does not transfer to carousel interfaces. revision: yes
Referee: [§4.1] §4.1 (L-pattern result): The L-pattern (examination conditioned on click) is presented as unique to carousels, but the section provides no operational definition of 'examination' (fixation duration threshold or AOI boundaries), no count of conditioned clicks, and no comparison to a within-study web-search baseline. These omissions make it difficult to rule out measurement artifacts or task differences as drivers of the reported pattern.

Authors: We accept that the current version of §4.1 lacks an explicit operational definition and supporting counts. In revision we will define examination precisely (including fixation-duration threshold and AOI boundaries) and report the number of clicks conditioned on examination. Because the study collected data exclusively on carousel interfaces, a within-study web-search baseline is unavailable; however, we will add a comparison to established F-pattern results from prior web-search eye-tracking literature to better substantiate the claim that the observed L-pattern is interface-specific rather than an artifact. revision: yes
Referee: [§3] §3 (Dataset and methods): Participant count, demographics, task instructions, exclusion criteria, and eye-tracking preprocessing steps (e.g., fixation detection parameters) are not detailed. Because all central claims rest on patterns extracted from this single dataset, missing methodological specifics prevent evaluation of whether study design choices artifactually produce the L-pattern or hypothesis failures.

Authors: We acknowledge that the methods section is currently brief. Although the underlying dataset is publicly released and its original paper contains the full protocol, we agree that the present manuscript should be self-contained. In the revised version we will expand §3 to summarize participant count and demographics, task instructions, exclusion criteria, and key preprocessing parameters such as fixation-detection thresholds. This will enable readers to assess potential design artifacts without needing to consult the dataset paper. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical analysis of independent external dataset

full rationale

The paper conducts a direct empirical analysis of a recently-released eye-tracking dataset to test transfer of web-search assumptions (F-pattern, examination hypothesis, heading attention) to carousel interfaces. No equations, fitted parameters, derivations, or self-citations appear in the provided text or abstract. Claims reduce solely to observed patterns in the external data rather than any self-referential construction, renaming, or load-bearing prior work by the authors. This is a standard observational study whose central findings are falsifiable against the dataset itself and carry no internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical verification paper with no free parameters, no invented entities, and minimal axioms beyond standard assumptions of eye-tracking validity.

axioms (1)

domain assumption Eye-tracking data provides a valid and unbiased measure of user examination behavior in digital interfaces.
Invoked implicitly to interpret patterns like F-pattern, L-pattern, and examination hypothesis from the dataset.

pith-pipeline@v0.9.0 · 5555 in / 1262 out tokens · 30262 ms · 2026-05-09T23:03:28.645441+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork

work page
[2]

Keyphrase Extraction from Disaster-related Tweets , booktitle =

Addressing Trust Bias for Unbiased Learning-to-Rank. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 4–14. doi:10.1145/3308558.3313697

work page doi:10.1145/3308558.3313697
[3]

Walid Bendada, Guillaume Salha, and Théo Bontempelli. 2020. Carousel Person- alization in Music Streaming Apps with Contextual Bandits. InProceedings of the 14th ACM Conference on Recommender Systems(Virtual Event, Brazil)(Rec- Sys ’20). Association for Computing Machinery, New York, NY, USA, 420–425. doi:10.1145/3383313.3412217

work page doi:10.1145/3383313.3412217 2020
[4]

Olivier Chapelle and Ya Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. InProceedings of the 18th International Conference on World Wide Web(Madrid, Spain)(WWW ’09). Association for Computing Machinery, New York, NY, USA, 1–10. doi:10.1145/1526709.1526711

work page doi:10.1145/1526709.1526711 2009
[5]

Flavio Chierichetti, Ravi Kumar, and Prabhakar Raghavan. 2011. Optimizing Two-dimensional Search Results Presentation. InProceedings of the Fourth ACM International Conference on Web Search and Data Mining(Hong Kong, China) (WSDM ’11). Association for Computing Machinery, New York, NY, USA, 257–266. doi:10.1145/1935826.1935873

work page doi:10.1145/1935826.1935873 2011
[6]

2015.Click Models for Web Search

Aleksandr Chuklin, Ilya Markov, and Maarten Rijke. 2015.Click Models for Web Search. Springer Cham. doi:10.1007/978-3-031-02294-4

work page doi:10.1007/978-3-031-02294-4 2015
[7]

Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Exper- imental Comparison of Click Position-bias Models. InProceedings of the 2008 International Conference on Web Search and Data Mining(Palo Alto, California, USA)(WSDM ’08). Association for Computing Machinery, New York, NY, USA, 87–94. doi:10.1145/1341531.1341545

work page doi:10.1145/1341531.1341545 2008
[8]

Santiago de Leon-Martinez, Jingwei Kang, Robert Moro, Maarten de Rijke, Branislav Kveton, Harrie Oosterhuis, and Maria Bielikova. 2025. RecGaze: The First Eye Tracking and User Interaction Dataset for Carousel Interfaces. InPro- ceedings of the 48th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval(Padua, Italy)(SIG...

work page doi:10.1145/3726302.3730301 2025
[9]

Santiago de Leon-Martinez, Robert Moro, Branislav Kveton, and Maria Bielikova

work page
[10]

InProceedings of the 31st International Conference on Intelligent User Interfaces (IUI ’26)

Riding the Carousel: The First Extensive Eye Tracking Analysis of Browsing Behavior in Carousel Recommenders. InProceedings of the 31st International Conference on Intelligent User Interfaces (IUI ’26). Association for Computing Machinery, New York, NY, USA, 2120–2130. doi:10.1145/3742413.3789166

work page doi:10.1145/3742413.3789166
[11]

Dupret and Benjamin Piwowarski

Georges E. Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. InProceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(Singapore, Singapore)(SIGIR ’08). Association for Comput- ing Machinery, New York, NY, USA, 331–338. doi:...

work page doi:10.1145/1390334.1390392 2008
[12]

Nicolò Felicioni, Maurizio Ferrari Dacrema, and Paolo Cremonesi. 2021. Measur- ing the User Satisfaction in a Recommendation Interface with Multiple Carousels. InProceedings of the 2021 ACM International Conference on Interactive Media Ex- periences(Virtual Event, USA)(IMX ’21). Association for Computing Machinery, New York, NY, USA, 212–217. doi:10.1145/...

work page doi:10.1145/3452918.3465493 2021
[13]

Nicolò Felicioni, Maurizio Ferrari Dacrema, and Paolo Cremonesi. 2021. A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels. InAdjunct Proceedings of the 29th ACM Con- ference on User Modeling, Adaptation and Personalization(Utrecht, Netherlands) (UMAP ’21). Association for Computing Machinery, New Yo...

work page doi:10.1145/3450614.3461680 2021
[14]

Maurizio Ferrari Dacrema, Nicolò Felicioni, and Paolo Cremonesi. 2022. Offline Evaluation of Recommender Systems in a User Interface With Multiple Carousels. Frontiers in Big DataVolume 5 - 2022 (2022). doi:10.3389/fdata.2022.910030

work page doi:10.3389/fdata.2022.910030 2022
[15]

In: Proc

Laura Granka, Matthew Feusner, and Lori Lorigo. 2008.Eye Monitoring in Online Search. Springer Berlin Heidelberg, Berlin, Heidelberg, 347–372. doi:10.1007/978- 3-540-75412-1_16

work page doi:10.1007/978- 2008
[16]

Granka, Thorsten Joachims, and Geri Gay

Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-tracking Analy- sis of User Behavior in WWW Search(SIGIR ’04). Association for Computing Machinery, New York, NY, USA, 478–479. doi:10.1145/1008992.1009079

work page doi:10.1145/1008992.1009079 2004
[17]

Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael Taylor, Yi-Min Wang, and Christos Faloutsos. 2009. Click Chain Model in Web Search. InProceedings of the 18th International Conference on World Wide Web(Madrid, Spain)(WWW ’09). Association for Computing Machinery, New York, NY, USA, 11–20. doi:10. 1145/1526709.1526712

work page arXiv 2009
[18]

Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient Multiple-click Models in Web Search. InProceedings of the Second ACM International Conference on Web Search and Data Mining(Barcelona, Spain)(WSDM ’09). Association for Computing Machinery, New York, NY, USA, 124–131. doi:10.1145/1498759.1498818

work page doi:10.1145/1498759.1498818 2009
[19]

Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retriev- ing Highly Relevant Documents. InProceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Athens, Greece)(SIGIR ’00). Association for Computing Machinery, New York, NY, USA, 41–48. doi:10.1145/345508.345545 Followi...

work page doi:10.1145/345508.345545 2000
[20]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques.ACM Trans. Inf. Syst.20, 4 (Oct. 2002), 422–446. doi:10.1145/ 582415.582418

work page arXiv 2002
[21]

Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay

work page
[22]

InProceed- ings of the 28th Annual International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Salvador, Brazil)(SIGIR ’05)

Accurately Interpreting Clickthrough Data as Implicit Feedback. InProceed- ings of the 28th Annual International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Salvador, Brazil)(SIGIR ’05). Association for Com- puting Machinery, New York, NY, USA, 154–161. doi:10.1145/1076034.1076063

work page doi:10.1145/1076034.1076063
[23]

Yvonne Kammerer and Peter Gerjets. 2010. How the Interface Design Influences Users’ Spontaneous Trustworthiness Evaluations of Web Search Results: Com- paring a List and a Grid Interface. InProceedings of the 2010 Symposium on Eye- Tracking Research & Applications(Austin, Texas)(ETRA ’10). Association for Com- puting Machinery, New York, NY, USA, 299–306....

work page doi:10.1145/1743666.1743736 2010
[24]

Yvonne Kammerer and Peter Gerjets. 2014. The Role of Search Result Position and Source Trustworthiness in the Selection of Web Search Results When Using a List or a Grid Interface.International Journal of Human–Computer Interaction 30, 3 (2014), 177–191. doi:10.1080/10447318.2013.846790

work page doi:10.1080/10447318.2013.846790 2014
[25]

White, and Imed Zitouni

Youngho Kim, Ahmed Hassan, Ryen W. White, and Imed Zitouni. 2014. Compar- ing Client and Server Dwell Time Estimates for Click-level Satisfaction Predic- tion. InProceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval(Gold Coast, Queensland, Australia)(SI- GIR ’14). Association for Computing Machinery,...

work page doi:10.1145/2600428.2609468 2014
[26]

Kornaropoulos

Youngho Kim, Ahmed Hassan, Ryen W. White, and Imed Zitouni. 2014. Modeling Dwell Time to Predict Click-level Satisfaction. InProceedings of the 7th ACM International Conference on Web Search and Data Mining(New York, New York, USA)(WSDM ’14). Association for Computing Machinery, New York, NY, USA, 193–202. doi:10.1145/2556195.2556220

work page doi:10.1145/2556195.2556220 2014
[27]

Benedikt Loepp. 2023. Multi-list interfaces for recommender systems: survey and future directions.Frontiers in Big Data6 (2023). doi:10.3389/fdata.2023.1239705

work page doi:10.3389/fdata.2023.1239705 2023
[28]

Benedikt Loepp and Jürgen Ziegler. 2023. How Users Ride the Carousel: Exploring the Design of Multi-List Recommender Interfaces From a User Perspective. In Proceedings of the 17th ACM Conference on Recommender Systems(Singapore, Singapore)(RecSys ’23). Association for Computing Machinery, New York, NY, USA, 1090–1095. doi:10.1145/3604915.3610638

work page doi:10.1145/3604915.3610638 2023
[29]

Lori Lorigo, Bing Pan, Helene Hembrooke, Thorsten Joachims, Laura Granka, and Geri Gay. 2006. The Influence of Task and Gender on Search and Evaluation Behavior using Google.Information Processing & Management42, 4 (2006), 1123–

work page 2006
[30]

doi:10.1016/j.ipm.2005.10.001

work page doi:10.1016/j.ipm.2005.10.001 2005
[31]

Behnam Rahdari and Peter Brusilovsky. 2025. Under the Hood of Carousels: Investigating User Engagement and Navigation Effort in Multi-list Recommender Systems. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1485–1498. doi:10.1145/3708359.3712130

work page doi:10.1145/3708359.3712130 2025
[32]

Behnam Rahdari, Peter Brusilovsky, and Branislav Kveton. 2024. Towards Simulation-Based Evaluation of Recommender Systems with Carousel Inter- faces.ACM Trans. Recomm. Syst.2, 1, Article 9 (March 2024), 25 pages. doi:10.1145/3643709

work page doi:10.1145/3643709 2024
[33]

Behnam Rahdari, Branislav Kveton, and Peter Brusilovsky. 2022. The Magic of Carousels: Single vs. Multi-List Recommender Systems. InProceedings of the 33rd ACM Conference on Hypertext and Social Media(Barcelona, Spain)(HT ’22). Association for Computing Machinery, New York, NY, USA, 166–174. doi:10. 1145/3511095.3531278

work page arXiv 2022
[34]

Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting Clicks: Estimating the Click-through Rate for New Ads. InProceedings of the 16th International Conference on World Wide Web(Banff, Alberta, Canada)(WWW ’07). Association for Computing Machinery, New York, NY, USA, 521–530. doi:10. 1145/1242572.1242643

work page arXiv 2007
[35]

Chaparro

Christina Siu and Barbara S. Chaparro. 2014. First Look: Examining the Horizontal Grid Layout using Eye-tracking.Proceedings of the Human Factors and Ergonomics Society Annual Meeting58, 1 (2014), 1119–1123. doi:10.1177/1541931214581234

work page doi:10.1177/1541931214581234 2014
[36]

Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. InProceedings of the 29th ACM International Conference on Information & Knowledge Management(Virtual Event, Ireland)(CIKM ’20). Association for Computing Machinery, New York, NY, USA, 1475–1484. doi:...

work page doi:10.1145/3340531 2020
[37]

Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma

work page
[38]

InProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile)(SIGIR ’15)

Incorporating Non-sequential Behavior into Click Models. InProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile)(SIGIR ’15). Association for Computing Machinery, New York, NY, USA, 283–292. doi:10.1145/2766462.2767712

work page doi:10.1145/2766462.2767712
[39]

Alvino, Alexander J

Chao-Yuan Wu, Christopher V. Alvino, Alexander J. Smola, and Justin Basilico

work page
[40]

InProceed- ings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16)

Using Navigation to Improve Recommendations in Real-Time. InProceed- ings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16). Association for Computing Machinery, New York, NY, USA, 341–348. doi:10.1145/2959100.2959174

work page doi:10.1145/2959100.2959174
[41]

Xiaohui Xie, Yiqun Liu, Xiaochuan Wang, Meng Wang, Zhijing Wu, Yingying Wu, Min Zhang, and Shaoping Ma. 2017. Investigating Examination Behavior of Image Search Users. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval(Shinjuku, Tokyo, Japan) (SIGIR ’17). Association for Computing Machinery, N...

work page doi:10.1145/3077136.3080799 2017
[42]

Xiaohui Xie, Jiaxin Mao, Maarten de Rijke, Ruizhe Zhang, Min Zhang, and Shaop- ing Ma. 2018. Constructing an Interaction Behavior Model for Web Image Search. InThe 41st International ACM SIGIR Conference on Research & Development in In- formation Retrieval(Ann Arbor, MI, USA)(SIGIR ’18). Association for Computing Machinery, New York, NY, USA, 425–434. doi...

work page doi:10.1145/3209978.3209990 2018
[43]

Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Maarten de Rijke, Yunqiu Shao, Zixin Ye, Min Zhang, and Shaoping Ma. 2019. Grid-based Evaluation Metrics for Web Image Search. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 2103–2114. doi:10.1145/3308558.3313514

work page doi:10.1145/3308558.3313514 2019
[44]

Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorpo- rating Revisiting Behaviors into Click Models. InProceedings of the Fifth ACM International Conference on Web Search and Data Mining(Seattle, Washington, USA)(WSDM ’12). Association for Computing Machinery, New York, NY, USA, 303–312. doi:10.1145/2124295.2124334

work page doi:10.1145/2124295.2124334 2012
[45]

Maxwell Harper, and Joseph A

Qian Zhao, Shuo Chang, F. Maxwell Harper, and Joseph A. Konstan. 2016. Gaze Prediction for Recommender Systems. InProceedings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16). Association for Computing Machinery, New York, NY, USA, 131–138. doi:10.1145/2959100. 2959150

work page doi:10.1145/2959100 2016

[1] [1]

Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork

work page

[2] [2]

Keyphrase Extraction from Disaster-related Tweets , booktitle =

Addressing Trust Bias for Unbiased Learning-to-Rank. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 4–14. doi:10.1145/3308558.3313697

work page doi:10.1145/3308558.3313697

[3] [3]

Walid Bendada, Guillaume Salha, and Théo Bontempelli. 2020. Carousel Person- alization in Music Streaming Apps with Contextual Bandits. InProceedings of the 14th ACM Conference on Recommender Systems(Virtual Event, Brazil)(Rec- Sys ’20). Association for Computing Machinery, New York, NY, USA, 420–425. doi:10.1145/3383313.3412217

work page doi:10.1145/3383313.3412217 2020

[4] [4]

Olivier Chapelle and Ya Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. InProceedings of the 18th International Conference on World Wide Web(Madrid, Spain)(WWW ’09). Association for Computing Machinery, New York, NY, USA, 1–10. doi:10.1145/1526709.1526711

work page doi:10.1145/1526709.1526711 2009

[5] [5]

Flavio Chierichetti, Ravi Kumar, and Prabhakar Raghavan. 2011. Optimizing Two-dimensional Search Results Presentation. InProceedings of the Fourth ACM International Conference on Web Search and Data Mining(Hong Kong, China) (WSDM ’11). Association for Computing Machinery, New York, NY, USA, 257–266. doi:10.1145/1935826.1935873

work page doi:10.1145/1935826.1935873 2011

[6] [6]

2015.Click Models for Web Search

Aleksandr Chuklin, Ilya Markov, and Maarten Rijke. 2015.Click Models for Web Search. Springer Cham. doi:10.1007/978-3-031-02294-4

work page doi:10.1007/978-3-031-02294-4 2015

[7] [7]

Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Exper- imental Comparison of Click Position-bias Models. InProceedings of the 2008 International Conference on Web Search and Data Mining(Palo Alto, California, USA)(WSDM ’08). Association for Computing Machinery, New York, NY, USA, 87–94. doi:10.1145/1341531.1341545

work page doi:10.1145/1341531.1341545 2008

[8] [8]

Santiago de Leon-Martinez, Jingwei Kang, Robert Moro, Maarten de Rijke, Branislav Kveton, Harrie Oosterhuis, and Maria Bielikova. 2025. RecGaze: The First Eye Tracking and User Interaction Dataset for Carousel Interfaces. InPro- ceedings of the 48th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval(Padua, Italy)(SIG...

work page doi:10.1145/3726302.3730301 2025

[9] [9]

Santiago de Leon-Martinez, Robert Moro, Branislav Kveton, and Maria Bielikova

work page

[10] [10]

InProceedings of the 31st International Conference on Intelligent User Interfaces (IUI ’26)

Riding the Carousel: The First Extensive Eye Tracking Analysis of Browsing Behavior in Carousel Recommenders. InProceedings of the 31st International Conference on Intelligent User Interfaces (IUI ’26). Association for Computing Machinery, New York, NY, USA, 2120–2130. doi:10.1145/3742413.3789166

work page doi:10.1145/3742413.3789166

[11] [11]

Dupret and Benjamin Piwowarski

Georges E. Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. InProceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(Singapore, Singapore)(SIGIR ’08). Association for Comput- ing Machinery, New York, NY, USA, 331–338. doi:...

work page doi:10.1145/1390334.1390392 2008

[12] [12]

Nicolò Felicioni, Maurizio Ferrari Dacrema, and Paolo Cremonesi. 2021. Measur- ing the User Satisfaction in a Recommendation Interface with Multiple Carousels. InProceedings of the 2021 ACM International Conference on Interactive Media Ex- periences(Virtual Event, USA)(IMX ’21). Association for Computing Machinery, New York, NY, USA, 212–217. doi:10.1145/...

work page doi:10.1145/3452918.3465493 2021

[13] [13]

Nicolò Felicioni, Maurizio Ferrari Dacrema, and Paolo Cremonesi. 2021. A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels. InAdjunct Proceedings of the 29th ACM Con- ference on User Modeling, Adaptation and Personalization(Utrecht, Netherlands) (UMAP ’21). Association for Computing Machinery, New Yo...

work page doi:10.1145/3450614.3461680 2021

[14] [14]

Maurizio Ferrari Dacrema, Nicolò Felicioni, and Paolo Cremonesi. 2022. Offline Evaluation of Recommender Systems in a User Interface With Multiple Carousels. Frontiers in Big DataVolume 5 - 2022 (2022). doi:10.3389/fdata.2022.910030

work page doi:10.3389/fdata.2022.910030 2022

[15] [15]

In: Proc

Laura Granka, Matthew Feusner, and Lori Lorigo. 2008.Eye Monitoring in Online Search. Springer Berlin Heidelberg, Berlin, Heidelberg, 347–372. doi:10.1007/978- 3-540-75412-1_16

work page doi:10.1007/978- 2008

[16] [16]

Granka, Thorsten Joachims, and Geri Gay

Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-tracking Analy- sis of User Behavior in WWW Search(SIGIR ’04). Association for Computing Machinery, New York, NY, USA, 478–479. doi:10.1145/1008992.1009079

work page doi:10.1145/1008992.1009079 2004

[17] [17]

Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael Taylor, Yi-Min Wang, and Christos Faloutsos. 2009. Click Chain Model in Web Search. InProceedings of the 18th International Conference on World Wide Web(Madrid, Spain)(WWW ’09). Association for Computing Machinery, New York, NY, USA, 11–20. doi:10. 1145/1526709.1526712

work page arXiv 2009

[18] [18]

Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient Multiple-click Models in Web Search. InProceedings of the Second ACM International Conference on Web Search and Data Mining(Barcelona, Spain)(WSDM ’09). Association for Computing Machinery, New York, NY, USA, 124–131. doi:10.1145/1498759.1498818

work page doi:10.1145/1498759.1498818 2009

[19] [19]

Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retriev- ing Highly Relevant Documents. InProceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Athens, Greece)(SIGIR ’00). Association for Computing Machinery, New York, NY, USA, 41–48. doi:10.1145/345508.345545 Followi...

work page doi:10.1145/345508.345545 2000

[20] [20]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques.ACM Trans. Inf. Syst.20, 4 (Oct. 2002), 422–446. doi:10.1145/ 582415.582418

work page arXiv 2002

[21] [21]

Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay

work page

[22] [22]

InProceed- ings of the 28th Annual International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Salvador, Brazil)(SIGIR ’05)

Accurately Interpreting Clickthrough Data as Implicit Feedback. InProceed- ings of the 28th Annual International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Salvador, Brazil)(SIGIR ’05). Association for Com- puting Machinery, New York, NY, USA, 154–161. doi:10.1145/1076034.1076063

work page doi:10.1145/1076034.1076063

[23] [23]

Yvonne Kammerer and Peter Gerjets. 2010. How the Interface Design Influences Users’ Spontaneous Trustworthiness Evaluations of Web Search Results: Com- paring a List and a Grid Interface. InProceedings of the 2010 Symposium on Eye- Tracking Research & Applications(Austin, Texas)(ETRA ’10). Association for Com- puting Machinery, New York, NY, USA, 299–306....

work page doi:10.1145/1743666.1743736 2010

[24] [24]

Yvonne Kammerer and Peter Gerjets. 2014. The Role of Search Result Position and Source Trustworthiness in the Selection of Web Search Results When Using a List or a Grid Interface.International Journal of Human–Computer Interaction 30, 3 (2014), 177–191. doi:10.1080/10447318.2013.846790

work page doi:10.1080/10447318.2013.846790 2014

[25] [25]

White, and Imed Zitouni

Youngho Kim, Ahmed Hassan, Ryen W. White, and Imed Zitouni. 2014. Compar- ing Client and Server Dwell Time Estimates for Click-level Satisfaction Predic- tion. InProceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval(Gold Coast, Queensland, Australia)(SI- GIR ’14). Association for Computing Machinery,...

work page doi:10.1145/2600428.2609468 2014

[26] [26]

Kornaropoulos

Youngho Kim, Ahmed Hassan, Ryen W. White, and Imed Zitouni. 2014. Modeling Dwell Time to Predict Click-level Satisfaction. InProceedings of the 7th ACM International Conference on Web Search and Data Mining(New York, New York, USA)(WSDM ’14). Association for Computing Machinery, New York, NY, USA, 193–202. doi:10.1145/2556195.2556220

work page doi:10.1145/2556195.2556220 2014

[27] [27]

Benedikt Loepp. 2023. Multi-list interfaces for recommender systems: survey and future directions.Frontiers in Big Data6 (2023). doi:10.3389/fdata.2023.1239705

work page doi:10.3389/fdata.2023.1239705 2023

[28] [28]

Benedikt Loepp and Jürgen Ziegler. 2023. How Users Ride the Carousel: Exploring the Design of Multi-List Recommender Interfaces From a User Perspective. In Proceedings of the 17th ACM Conference on Recommender Systems(Singapore, Singapore)(RecSys ’23). Association for Computing Machinery, New York, NY, USA, 1090–1095. doi:10.1145/3604915.3610638

work page doi:10.1145/3604915.3610638 2023

[29] [29]

Lori Lorigo, Bing Pan, Helene Hembrooke, Thorsten Joachims, Laura Granka, and Geri Gay. 2006. The Influence of Task and Gender on Search and Evaluation Behavior using Google.Information Processing & Management42, 4 (2006), 1123–

work page 2006

[30] [30]

doi:10.1016/j.ipm.2005.10.001

work page doi:10.1016/j.ipm.2005.10.001 2005

[31] [31]

Behnam Rahdari and Peter Brusilovsky. 2025. Under the Hood of Carousels: Investigating User Engagement and Navigation Effort in Multi-list Recommender Systems. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1485–1498. doi:10.1145/3708359.3712130

work page doi:10.1145/3708359.3712130 2025

[32] [32]

Behnam Rahdari, Peter Brusilovsky, and Branislav Kveton. 2024. Towards Simulation-Based Evaluation of Recommender Systems with Carousel Inter- faces.ACM Trans. Recomm. Syst.2, 1, Article 9 (March 2024), 25 pages. doi:10.1145/3643709

work page doi:10.1145/3643709 2024

[33] [33]

Behnam Rahdari, Branislav Kveton, and Peter Brusilovsky. 2022. The Magic of Carousels: Single vs. Multi-List Recommender Systems. InProceedings of the 33rd ACM Conference on Hypertext and Social Media(Barcelona, Spain)(HT ’22). Association for Computing Machinery, New York, NY, USA, 166–174. doi:10. 1145/3511095.3531278

work page arXiv 2022

[34] [34]

Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting Clicks: Estimating the Click-through Rate for New Ads. InProceedings of the 16th International Conference on World Wide Web(Banff, Alberta, Canada)(WWW ’07). Association for Computing Machinery, New York, NY, USA, 521–530. doi:10. 1145/1242572.1242643

work page arXiv 2007

[35] [35]

Chaparro

Christina Siu and Barbara S. Chaparro. 2014. First Look: Examining the Horizontal Grid Layout using Eye-tracking.Proceedings of the Human Factors and Ergonomics Society Annual Meeting58, 1 (2014), 1119–1123. doi:10.1177/1541931214581234

work page doi:10.1177/1541931214581234 2014

[36] [36]

Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. InProceedings of the 29th ACM International Conference on Information & Knowledge Management(Virtual Event, Ireland)(CIKM ’20). Association for Computing Machinery, New York, NY, USA, 1475–1484. doi:...

work page doi:10.1145/3340531 2020

[37] [37]

Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma

work page

[38] [38]

InProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile)(SIGIR ’15)

Incorporating Non-sequential Behavior into Click Models. InProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile)(SIGIR ’15). Association for Computing Machinery, New York, NY, USA, 283–292. doi:10.1145/2766462.2767712

work page doi:10.1145/2766462.2767712

[39] [39]

Alvino, Alexander J

Chao-Yuan Wu, Christopher V. Alvino, Alexander J. Smola, and Justin Basilico

work page

[40] [40]

InProceed- ings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16)

Using Navigation to Improve Recommendations in Real-Time. InProceed- ings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16). Association for Computing Machinery, New York, NY, USA, 341–348. doi:10.1145/2959100.2959174

work page doi:10.1145/2959100.2959174

[41] [41]

Xiaohui Xie, Yiqun Liu, Xiaochuan Wang, Meng Wang, Zhijing Wu, Yingying Wu, Min Zhang, and Shaoping Ma. 2017. Investigating Examination Behavior of Image Search Users. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval(Shinjuku, Tokyo, Japan) (SIGIR ’17). Association for Computing Machinery, N...

work page doi:10.1145/3077136.3080799 2017

[42] [42]

Xiaohui Xie, Jiaxin Mao, Maarten de Rijke, Ruizhe Zhang, Min Zhang, and Shaop- ing Ma. 2018. Constructing an Interaction Behavior Model for Web Image Search. InThe 41st International ACM SIGIR Conference on Research & Development in In- formation Retrieval(Ann Arbor, MI, USA)(SIGIR ’18). Association for Computing Machinery, New York, NY, USA, 425–434. doi...

work page doi:10.1145/3209978.3209990 2018

[43] [43]

Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Maarten de Rijke, Yunqiu Shao, Zixin Ye, Min Zhang, and Shaoping Ma. 2019. Grid-based Evaluation Metrics for Web Image Search. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 2103–2114. doi:10.1145/3308558.3313514

work page doi:10.1145/3308558.3313514 2019

[44] [44]

Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorpo- rating Revisiting Behaviors into Click Models. InProceedings of the Fifth ACM International Conference on Web Search and Data Mining(Seattle, Washington, USA)(WSDM ’12). Association for Computing Machinery, New York, NY, USA, 303–312. doi:10.1145/2124295.2124334

work page doi:10.1145/2124295.2124334 2012

[45] [45]

Maxwell Harper, and Joseph A

Qian Zhao, Shuo Chang, F. Maxwell Harper, and Joseph A. Konstan. 2016. Gaze Prediction for Recommender Systems. InProceedings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16). Association for Computing Machinery, New York, NY, USA, 131–138. doi:10.1145/2959100. 2959150

work page doi:10.1145/2959100 2016