Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

Aryan Sajith; Hamed Zamani; Haw-Shiuan Chang; Jeffrey Gomez; Mehul Patwari

arxiv: 2606.20482 · v1 · pith:A2PSOHYDnew · submitted 2026-06-18 · 💻 cs.CL · cs.HC· cs.LG

Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

Haw-Shiuan Chang , Jeffrey Gomez , Mehul Patwari , Aryan Sajith , Hamed Zamani This is my paper

Pith reviewed 2026-06-26 17:24 UTC · model grok-4.3

classification 💻 cs.CL cs.HCcs.LG

keywords implicit feedbackLLM alignmentreward modelmouse trackingeye gazepreference learningDPO

0 comments

The pith

Implicit feedback from mouse trajectories and eye gaze improves LLM reward models from 55 percent to 64 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a dataset called IFLLM that records 1336 multi-turn questions along with mouse trajectories and webcam eye-gaze points from 59 workers as they read LLM responses. It trains a reward model that combines these implicit signals with response text and shows higher accuracy at predicting which response a user prefers. When this reward model is used for Direct Preference Optimization on eight different LLMs, the relative quality gains nearly triple compared with a text-only reward model. The work argues that such passive signals can reduce reliance on expensive explicit ratings because users already produce them during ordinary interaction.

Core claim

A reward model trained on implicit user feedback collected as mouse trajectories and eye-gaze points during response reading outperforms a text-only reward model, raising preference-prediction accuracy from 55 percent to 64 percent and nearly tripling the relative response-quality gains obtained after Direct Preference Optimization on eight LLMs.

What carries the argument

IFLLM dataset and multimodal reward model that fuses mouse-trajectory and eye-gaze features with response text to predict user preference.

If this is right

Preference data for alignment can be gathered at scale without prompting users for explicit ratings.
Direct Preference Optimization yields substantially larger gains when the reward model incorporates implicit signals.
Diverse gazing and mouse behaviors across users can be aggregated into a single improved reward function.
The same implicit signals could be collected continuously during normal LLM use rather than in dedicated annotation sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continuous collection of mouse and eye data during live interactions could support ongoing model updates without separate feedback campaigns.
Privacy and consent mechanisms would need to be addressed before deploying such tracking at internet scale.
Similar implicit signals might be captured from other interfaces such as touch or scroll patterns on mobile devices.

Load-bearing premise

The mouse and eye signals gathered from 59 paid Mechanical Turk workers reliably reflect genuine preferences and will appear the same way for ordinary users in real deployments.

What would settle it

Run the same data-collection protocol with a larger and more diverse unpaid user pool and measure whether the implicit-feedback reward model still outperforms the text-only baseline by the reported margin.

Figures

Figures reproduced from arXiv: 2606.20482 by Aryan Sajith, Hamed Zamani, Haw-Shiuan Chang, Jeffrey Gomez, Mehul Patwari.

**Figure 1.** Figure 1: IFLLM records the trajectories of eye gazing and mouse from a question answering session between a user and two LLMs. Then, we train our random forest reward model on the features extracted from the trajectories and preference labels from the user. Finally, we show that applying DPO to preferences predicted by our reward model improves LLM outputs more than a standard text-based reward model. This improv… view at source ↗

**Figure 2.** Figure 2: Diagram of webpage navigation for a worker. 1 cycle of the webpages correlates to 1 task, equivocally [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Average fixation weight over the response text [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Distribution of the persession Pearson correlation between mouse and gaze position, grouped by response length [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 8.** Figure 8: Gaze trajectory clusters over normalized [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 10.** Figure 10: The importance weights of the top 10 features [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: The importance weights of the top 50 fea [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Partial dependency analysis on the last char [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

**Figure 13.** Figure 13: An example of gazing trajectory for a topic [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 16.** Figure 16: Average fixation weight over the response [PITH_FULL_IMAGE:figures/full_fig_p016_16.png] view at source ↗

**Figure 17.** Figure 17: Average mouse position over normalized time, grouped by response length [PITH_FULL_IMAGE:figures/full_fig_p016_17.png] view at source ↗

**Figure 18.** Figure 18: Average mouse position over normalized time, for the pointwise setting and for the left and right responses in the pairwise setting. compared to the previous response. To balance the prediction classes, we subsample the data that prefer the current response. B.4 Hyperparameters for ModernBERT and Random forest For ModernBERT and Qwen3 1.7B, we set the batch size to be 1 and learning rate to be 1e-5. For p… view at source ↗

**Figure 19.** Figure 19: Gaze position distribution across the re [PITH_FULL_IMAGE:figures/full_fig_p017_19.png] view at source ↗

**Figure 21.** Figure 21: Gaze position distribution across the re [PITH_FULL_IMAGE:figures/full_fig_p017_21.png] view at source ↗

**Figure 23.** Figure 23: Distribution of the per-query Pearson corre [PITH_FULL_IMAGE:figures/full_fig_p017_23.png] view at source ↗

**Figure 25.** Figure 25: Distribution of the per-session Pearson cor [PITH_FULL_IMAGE:figures/full_fig_p018_25.png] view at source ↗

**Figure 26.** Figure 26: The crowdsourcing template we used in our [PITH_FULL_IMAGE:figures/full_fig_p019_26.png] view at source ↗

**Figure 28.** Figure 28: Macro average of each user’s Average Nor [PITH_FULL_IMAGE:figures/full_fig_p020_28.png] view at source ↗

**Figure 27.** Figure 27: Our website instruction page for response score and 0.3 for max index score) for the representation of the user’s attention to the task [PITH_FULL_IMAGE:figures/full_fig_p020_27.png] view at source ↗

read the original abstract

To align a Large Language Model (LLM), most existing methods collect explicit human feedback and train a reward model to predict the human preference based on the response text. These existing methods have two key limitations. First, the users rarely provide explicit feedback for LLM responses, which makes the high-quality preference annotation expensive to collect. Second, the methods do not leverage implicit human feedback, which has proven vital to the economic moats of Internet giants. To quantify the value of implicit feedback, we build a new dataset called IFLLM, which collects 1336 multi-turn questions from the 59 Mechanical Turk workers, their mouse trajectories, and eye gazing points to the LLMs' responses from their webcams. IFLLM shows that the users have very diverse types of gazing behavior and mouse trajectories. Our reward model based on the implicit user feedback boosts the accuracy of the text-based reward model from 55% to 64% and nearly triples the relative response quality improvements after applying the DPO to eight LLMs, demonstrating the value of implicit feedback in the wild. Our data collection website, dataset, and codes can be found at https://github.com/themehulpatwari/llm-implicit-feedback/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper releases a new IFLLM dataset of mouse and gaze signals from 59 MTurk workers and reports accuracy and DPO lifts, but the small paid sample leaves the generalization claim thin.

read the letter

The main point is that this work collects mouse trajectories and webcam gaze from 59 Mechanical Turk workers on 1336 multi-turn LLM interactions, builds a reward model that adds those signals to text, and reports lifting accuracy from 55% to 64% while nearly tripling relative DPO gains across eight LLMs.

What the paper does well is release the actual dataset, the collection website, and code. That makes the implicit signals available for others to inspect and extend. Noting the diversity in gazing and mouse patterns across users is also a useful concrete observation rather than just an assertion.

The soft spots are in the validation and scale. The results rest on a small, paid sample with no reported comparison of implicit signals to explicit preference labels on the same turns, no cross-user or demographic hold-outs, and no controls for possible confounds like reading time or interface effects. The abstract gives the headline numbers but no error bars, significance tests, or details on how trajectories were turned into features. Without those, the 9-point lift could be tied to the specific collection setup rather than a robust signal that transfers to unpaid users.

This is for people working on cheaper or higher-volume preference data for alignment. A reader who wants to experiment with new signals would get value from the released data. It deserves a serious referee because the dataset is new and the application is worth testing, even though the current evidence needs more controls and larger-scale checks before the "in the wild" claim can be taken as settled.

Referee Report

3 major / 2 minor

Summary. The paper introduces the IFLLM dataset, collected from 59 Mechanical Turk workers across 1336 multi-turn interactions, capturing mouse trajectories and webcam-based eye gaze points alongside LLM responses. It claims that a reward model trained on this implicit feedback improves text-only reward model accuracy from 55% to 64% and nearly triples relative response quality gains when used to train DPO on eight LLMs, arguing for the value of implicit signals 'in the wild.' The authors release the dataset, collection website, and code.

Significance. If the implicit signals are shown to reliably proxy genuine preferences and generalize beyond the small paid sample, the work would meaningfully reduce reliance on expensive explicit annotations for LLM alignment while demonstrating a practical way to leverage natural user behavior, akin to implicit signals in web systems. The public release of IFLLM, the website, and code is a clear strength that enables direct reproducibility and follow-on studies.

major comments (3)

[Data collection / §4] Data collection and evaluation sections: The headline accuracy lift (55% o 64%) and DPO gains rest on signals from only 59 MTurk workers with no reported cross-user hold-out, cohort-level validation, or direct comparison against explicit preference labels collected on the same turns; this leaves open whether the 9-point improvement reflects robust implicit preference or sample-specific artifacts.
[Reward model / §5] Reward model section: The manuscript states the accuracy improvement but supplies no description of the reward-model architecture, the precise feature extraction pipeline from mouse trajectories and gaze points, or any statistical significance testing or controls for confounds such as reading time or interface effects.
[DPO experiments / §6] DPO experiments: The claim that implicit feedback 'nearly triples' quality improvements across eight LLMs lacks details on the exact evaluation protocol, baseline definitions, or human evaluation rubric, making it impossible to assess whether the tripling is robust or driven by the particular reward model.

minor comments (2)

[Abstract] Abstract: The phrase 'nearly triples the relative response quality improvements' is imprecise; reporting the exact relative gain and the underlying metric would improve clarity.
[Dataset description] The paper would benefit from a table summarizing the 1336 interactions (e.g., turns per worker, average trajectory length) to allow readers to gauge data scale and diversity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to supply the requested details where feasible.

read point-by-point responses

Referee: [Data collection / §4] Data collection and evaluation sections: The headline accuracy lift (55% to 64%) and DPO gains rest on signals from only 59 MTurk workers with no reported cross-user hold-out, cohort-level validation, or direct comparison against explicit preference labels collected on the same turns; this leaves open whether the 9-point improvement reflects robust implicit preference or sample-specific artifacts.

Authors: The study collected data from 59 workers across 1336 interactions, which we present as an initial demonstration of implicit signals in the wild. The revised version will add user-level cross-validation results to test for cohort-specific effects. Explicit preference labels were not collected on the same turns, preventing a direct paired comparison. revision: partial
Referee: [Reward model / §5] Reward model section: The manuscript states the accuracy improvement but supplies no description of the reward-model architecture, the precise feature extraction pipeline from mouse trajectories and gaze points, or any statistical significance testing or controls for confounds such as reading time or interface effects.

Authors: We agree these technical details are absent from the current text. The revision will expand the reward-model section with the architecture, feature extraction pipeline, significance testing, and confound controls. revision: yes
Referee: [DPO experiments / §6] DPO experiments: The claim that implicit feedback 'nearly triples' quality improvements across eight LLMs lacks details on the exact evaluation protocol, baseline definitions, or human evaluation rubric, making it impossible to assess whether the tripling is robust or driven by the particular reward model.

Authors: The revision will provide the full evaluation protocol, baseline definitions, and human evaluation rubric used for the DPO quality measurements across the eight models. revision: yes

standing simulated objections not resolved

Direct comparison against explicit preference labels collected on the same turns, as this paired data was not gathered in the original study.

Circularity Check

0 steps flagged

No significant circularity; results are empirical measurements on newly collected data.

full rationale

The paper collects a fresh dataset (IFLLM) of 1336 interactions from 59 MTurk workers including mouse trajectories and webcam gaze, trains a reward model on these implicit signals, and reports accuracy gains (55% to 64%) plus DPO quality improvements on eight LLMs. These are direct empirical outcomes on the collected data rather than any derivation that reduces by construction to fitted parameters, self-citations, or renamed inputs. No equations, uniqueness theorems, or ansatzes are invoked that loop back to the paper's own definitions or prior author work. The central claims rest on external validation against text-only baselines and DPO runs, making the work self-contained against its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that eye gaze and mouse data serve as valid proxies for preference; the reward model parameters are fitted to this new data.

free parameters (1)

Reward model parameters
Parameters of the implicit-feedback reward model are fitted to the collected mouse and gaze data.

axioms (1)

domain assumption Mouse trajectories and eye gaze points collected during response viewing correlate with user preferences for LLM outputs
This premise underpins the construction of the reward model from implicit signals.

pith-pipeline@v0.9.1-grok · 5764 in / 1338 out tokens · 25607 ms · 2026-06-26T17:24:15.185121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 1 linked inside Pith

[1]

InInternational conference on machine learning, pages 2397–2430

Pythia: A suite for analyzing large language models across training and scaling. InInternational conference on machine learning, pages 2397–2430. PMLR. Anna Bondar, David Robert Reich, and Lena Ann Jäger. 2025a. Aleyegnment: Leveraging eye-tracking- while-reading to align language models with human preferences. InProceedings of the First International Wor...

Pith/arXiv arXiv 2025
[2]

Douglas W Oard and Jinliang Kim

International Joint Conferences on Artificial Intelligence Organization. Douglas W Oard and Jinliang Kim. 1998. Implicit feed- back for recommender systems. InAAAI Workshop on Recommender Systems, pages 81–85. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, and ...

arXiv 1998
[3]

Ningzhi Tang, Junwen An, Meng Chen, Aakash Bansal, Yu Huang, Collin McMillan, and Toby Jia-Jun Li

A comparison of document clustering tech- niques. Ningzhi Tang, Junwen An, Meng Chen, Aakash Bansal, Yu Huang, Collin McMillan, and Toby Jia-Jun Li. 2024a. Codegrits: A research toolkit for developer behavior and eye tracking in ide. InProceedings of the 2024 ieee/acm 46th international conference on software engineering: Companion proceedings, pages 119–...

arXiv 2024
[4]

QA and Preference Annotation

All claims are equal, but some claims are more equal than others: Importance-sensitive factu- ality evaluation of llm generations.arXiv preprint arXiv:2510.07083. Kun Yan, Zeyu Wang, Lei Ji, Yuntao Wang, Nan Duan, and Shuai Ma. 2024. V oila-a: Aligning vision- language models with user’s gaze attention.Ad- vances in neural information processing systems, ...

arXiv 2024
[5]

Instruction Following : Did the model follow all explicit and implicit instructions ?
[6]

Informativeness : Is the response comprehensive without being verbose ?
[7]

Factuality : Are the claims accurate ? For creative prompts , judge internal consistency
[8]

Clarity and Coherence : Is the response well - structured and easy to read ?
[9]

he was a great player

Overall Helpfulness : Which response Figure 25: Distribution of the per-session Pearson cor- relation between mouse and gaze position, for the point- wise setting and for the left and right responses in the pairwise setting. is more ready to use for the human ? You MUST always respond in EXACTLY this format ( no extra text , no markdown , no blank respons...

1977
[10]

Left-click the red circle buttons on your screen with your cursor until it becomes yellow

Calibration When prompted, click Allow to enable camera access for calibration. Left-click the red circle buttons on your screen with your cursor until it becomes yellow. Make sure your eyes track your cursor all the time during the calibration. After the calibration, you will receive an accuracy score. If your accuracy is low, try to better track your cu...
[11]

Interaction with AI You will be redirected to one of two tasks: General Guidelines for AI Interaction Each time you ask a question in the search box, the AI will respond in the box below. Please keep in mind that refreshing the page, switch to instruction page, or asking another question will delete the previous question and response on the screen, but th...
[12]

The summary should focus on what you learned on the topic and include nothing about the AI

Conversation Summary Summarize the conversation using one or two sentence(s) in the text box provided. The summary should focus on what you learned on the topic and include nothing about the AI. Click Submit to proceed to the next page
[13]

Copy the sentence you felt was most important and paste it into the provided box

Past Question and Response You will be shown a randomly chosen past question and the AI’s response. Copy the sentence you felt was most important and paste it into the provided box. (Optional) Add feedback in the Feedback Box if needed. Click Submit when you are done
[14]

Store this passcode somewhere safe and submit the passcode to MTurk to receive your payment

Payment Code A unique passcode will be displayed. Store this passcode somewhere safe and submit the passcode to MTurk to receive your payment. You will not be able to retrieve it later. Important: Please do NOT submit one passcode multiple times. We might be forced to reject your submission if you do that. If you really have issues with passcode, please c...
[15]

Chrome On your computer, open Chrome

Troubleshooting: Clearing Cookies If you experience issues with the study website, such as buttons not working or pages not loading correctly, try clearing cookies for using the instructions below for your browser. Chrome On your computer, open Chrome. At the top right, select More (three dots) → Settings. Go to Privacy and security → Third-party cookies....

[1] [1]

InInternational conference on machine learning, pages 2397–2430

Pythia: A suite for analyzing large language models across training and scaling. InInternational conference on machine learning, pages 2397–2430. PMLR. Anna Bondar, David Robert Reich, and Lena Ann Jäger. 2025a. Aleyegnment: Leveraging eye-tracking- while-reading to align language models with human preferences. InProceedings of the First International Wor...

Pith/arXiv arXiv 2025

[2] [2]

Douglas W Oard and Jinliang Kim

International Joint Conferences on Artificial Intelligence Organization. Douglas W Oard and Jinliang Kim. 1998. Implicit feed- back for recommender systems. InAAAI Workshop on Recommender Systems, pages 81–85. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, and ...

arXiv 1998

[3] [3]

Ningzhi Tang, Junwen An, Meng Chen, Aakash Bansal, Yu Huang, Collin McMillan, and Toby Jia-Jun Li

A comparison of document clustering tech- niques. Ningzhi Tang, Junwen An, Meng Chen, Aakash Bansal, Yu Huang, Collin McMillan, and Toby Jia-Jun Li. 2024a. Codegrits: A research toolkit for developer behavior and eye tracking in ide. InProceedings of the 2024 ieee/acm 46th international conference on software engineering: Companion proceedings, pages 119–...

arXiv 2024

[4] [4]

QA and Preference Annotation

All claims are equal, but some claims are more equal than others: Importance-sensitive factu- ality evaluation of llm generations.arXiv preprint arXiv:2510.07083. Kun Yan, Zeyu Wang, Lei Ji, Yuntao Wang, Nan Duan, and Shuai Ma. 2024. V oila-a: Aligning vision- language models with user’s gaze attention.Ad- vances in neural information processing systems, ...

arXiv 2024

[5] [5]

Instruction Following : Did the model follow all explicit and implicit instructions ?

[6] [6]

Informativeness : Is the response comprehensive without being verbose ?

[7] [7]

Factuality : Are the claims accurate ? For creative prompts , judge internal consistency

[8] [8]

Clarity and Coherence : Is the response well - structured and easy to read ?

[9] [9]

he was a great player

Overall Helpfulness : Which response Figure 25: Distribution of the per-session Pearson cor- relation between mouse and gaze position, for the point- wise setting and for the left and right responses in the pairwise setting. is more ready to use for the human ? You MUST always respond in EXACTLY this format ( no extra text , no markdown , no blank respons...

1977

[10] [10]

Left-click the red circle buttons on your screen with your cursor until it becomes yellow

Calibration When prompted, click Allow to enable camera access for calibration. Left-click the red circle buttons on your screen with your cursor until it becomes yellow. Make sure your eyes track your cursor all the time during the calibration. After the calibration, you will receive an accuracy score. If your accuracy is low, try to better track your cu...

[11] [11]

Interaction with AI You will be redirected to one of two tasks: General Guidelines for AI Interaction Each time you ask a question in the search box, the AI will respond in the box below. Please keep in mind that refreshing the page, switch to instruction page, or asking another question will delete the previous question and response on the screen, but th...

[12] [12]

The summary should focus on what you learned on the topic and include nothing about the AI

Conversation Summary Summarize the conversation using one or two sentence(s) in the text box provided. The summary should focus on what you learned on the topic and include nothing about the AI. Click Submit to proceed to the next page

[13] [13]

Copy the sentence you felt was most important and paste it into the provided box

Past Question and Response You will be shown a randomly chosen past question and the AI’s response. Copy the sentence you felt was most important and paste it into the provided box. (Optional) Add feedback in the Feedback Box if needed. Click Submit when you are done

[14] [14]

Store this passcode somewhere safe and submit the passcode to MTurk to receive your payment

Payment Code A unique passcode will be displayed. Store this passcode somewhere safe and submit the passcode to MTurk to receive your payment. You will not be able to retrieve it later. Important: Please do NOT submit one passcode multiple times. We might be forced to reject your submission if you do that. If you really have issues with passcode, please c...

[15] [15]

Chrome On your computer, open Chrome

Troubleshooting: Clearing Cookies If you experience issues with the study website, such as buttons not working or pages not loading correctly, try clearing cookies for using the instructions below for your browser. Chrome On your computer, open Chrome. At the top right, select More (three dots) → Settings. Go to Privacy and security → Third-party cookies....