AI Washing Inflates Expected Performance but Not Interaction Outcomes: An AI Placebo Study Using Fitts' Law
Pith reviewed 2026-05-09 18:41 UTC · model grok-4.3
The pith
AI-labeled mice raise user expectations of better performance but leave actual Fitts' Law task outcomes unchanged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Participants expected significantly improved performance in the two placebo AI conditions compared to baseline, but these expectations did not translate into differences in objective Fitts' Law performance indicators or in subjective assessments of workload and usability.
What carries the argument
Within-subjects Fitts' Law experiment that compares a standard mouse against two placebo-labeled AI-supported versions while recording both expected and actual task performance.
If this is right
- Overstated AI claims in product marketing can raise user expectations without delivering measurable interaction benefits.
- Actual task performance and perceived workload remain the same across labeled and unlabeled conditions.
- Fitts' Law tasks offer a repeatable method for auditing performance claims made about input devices.
- Transparency requirements for AI-labeled consumer products are needed to prevent misleading expectations.
Where Pith is reading between the lines
- The same expectation-inflation pattern may appear with other everyday AI-labeled devices such as keyboards or touchscreens.
- Empirical tests like this could help regulators define what counts as a deceptive AI claim.
- Varying how believable the AI label is could quantify how strongly expectations must be held before they affect behavior.
Load-bearing premise
Participants fully believed the placebo AI labels were genuine and that the Fitts' Law task was sensitive enough to detect any small real performance differences if they existed.
What would settle it
A follow-up study in which participants are told the labels are fake yet still show no change in performance, or one in which genuine AI support is added and performance improves while expectations remain matched.
Figures
read the original abstract
Expectations about the support of artificial intelligence (AI) may influence interaction outcomes similar to placebos. Such expectations may result from AI washing, a practice of overstating a system's AI capabilities when actual functionality is limited. For example, some computer mice are marketed as "AI-assisted" despite lacking AI in core functions. In a within-subjects study, 28 participants completed Fitts' Law tasks with a computer mouse under three conditions: no support, supposed predictive AI support, and supposed biosignal-enhanced AI support. Objective Fitts' Law performance indicators and subjective performance expectations, perceived workload, and perceived usability were measured. Compared to baseline, participants expected significantly improved performance in placebo conditions. However, these expectations did not translate into differences in objective or subjective assessments. This paper contributes evidence that AI washing inflates user expectations without altering actual interaction outcomes, highlighting a critical transparency issue. By exposing how deceptive AI marketing can shape user expectations, we underscore the need for accountability in AI product claims. Further, we establish Fitts' Law as a rigorous methodological lens for auditing AI-labelled input devices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a within-subjects experiment with 28 participants performing Fitts' Law pointing tasks on a computer mouse under three conditions: baseline (no support), placebo predictive AI support, and placebo biosignal-enhanced AI support. Participants showed significantly higher performance expectations in the AI-labeled conditions, but no differences emerged in objective Fitts' Law metrics (throughput, movement time, error rate) or subjective measures of workload and usability. The authors conclude that AI washing inflates expectations without altering actual interaction outcomes and position Fitts' Law as a rigorous method for auditing AI-labeled input devices.
Significance. If the null results on performance hold after addressing power and sensitivity concerns, the work provides controlled empirical evidence that deceptive AI marketing creates placebo expectations without corresponding performance gains. This is relevant to HCI for highlighting transparency issues in AI product claims. The use of objective, standardized Fitts' Law metrics is a methodological strength that supports falsifiable claims about interaction outcomes rather than relying solely on subjective reports.
major comments (2)
- [Methods] Methods section: No power analysis, minimum detectable effect size, or sensitivity discussion is reported for the primary Fitts' Law dependent variables (throughput, movement time, error rate). With n=28 in a within-subjects design across three conditions, the null performance result central to the claim ('does not alter actual interaction outcomes') cannot be confidently distinguished from low statistical power to detect small genuine placebo effects.
- [Abstract] Abstract and Methods: The abstract and methods description omit key Fitts' Law task parameters (target widths/distances, number of trials, index of difficulty range). Without these, it is impossible to evaluate whether the task had adequate sensitivity to detect the magnitude of performance change one might expect from even a weak placebo effect, directly affecting interpretation of the null result.
minor comments (2)
- [Results] Results: Report effect sizes (e.g., partial eta-squared or Cohen's d) for the significant expectation differences and for the non-significant performance comparisons to aid interpretation of the null findings.
- [Discussion] Discussion: Clarify whether the Fitts' Law task was chosen because it is insensitive to placebo effects or because it provides an objective benchmark; this would strengthen the methodological contribution claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important aspects of methodological transparency. We will revise the manuscript to address the concerns about statistical power and task parameter reporting. Our responses to the major comments are provided below.
read point-by-point responses
-
Referee: [Methods] Methods section: No power analysis, minimum detectable effect size, or sensitivity discussion is reported for the primary Fitts' Law dependent variables (throughput, movement time, error rate). With n=28 in a within-subjects design across three conditions, the null performance result central to the claim ('does not alter actual interaction outcomes') cannot be confidently distinguished from low statistical power to detect small genuine placebo effects.
Authors: We agree that a formal power or sensitivity analysis strengthens interpretation of null results. In the revision, we will add a post-hoc sensitivity analysis (e.g., using G*Power for repeated-measures ANOVA) to report the minimum detectable effect size for throughput, movement time, and error rate given n=28, alpha=0.05, and 80% power. This will clarify whether small placebo effects were detectable. We maintain that the standardized Fitts' Law paradigm is sensitive to performance differences, but explicit reporting addresses the concern directly. revision: yes
-
Referee: [Abstract] Abstract and Methods: The abstract and methods description omit key Fitts' Law task parameters (target widths/distances, number of trials, index of difficulty range). Without these, it is impossible to evaluate whether the task had adequate sensitivity to detect the magnitude of performance change one might expect from even a weak placebo effect, directly affecting interpretation of the null result.
Authors: We acknowledge the omission of these details. The revised Methods section will include the exact Fitts' Law parameters used in the study (target widths, distances, trial counts per condition, and index of difficulty range). We will also ensure the abstract references the standardized task setup to support evaluation of sensitivity. These additions will allow readers to assess whether the task could detect small effects. revision: yes
Circularity Check
No circularity: empirical measurements with no derivations or self-referential claims
full rationale
The paper reports a within-subjects experiment measuring objective Fitts' Law metrics (throughput, movement time, error rate) and subjective ratings under three mouse conditions. No equations, fitted parameters, predictions derived from prior results, or self-citations are used to support the central claim. The null performance result and expectation inflation are direct outcomes of the collected data rather than any reduction to inputs by construction. The use of Fitts' Law is as a standard measurement tool, not a derived or renamed result. This is a self-contained empirical study with no load-bearing theoretical steps.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions underlying the statistical tests used to detect significant differences in expectations versus performance measures
Reference graph
Works this paper leans on
-
[1]
Aaron Bangor, Philip T. Kortum, and James T. Miller. 2009. Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale.Journal of Usability Studies4, 3 (May 2009), 114–123. https://uxpajournal.org/wp-content/uploads/sites/7/pdf/JUS_Bangor_ May2009.pdf
work page 2009
-
[2]
Esther Bosch, Robin Welsch, Tamim Ayach, Christopher Katins, and Thomas Kosch. 2024. The Illusion of Performance: The Effect of Phantom Display Refresh Rates on User Expectations and Reaction Times. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI EA ’24). Association for Computing Machinery, New York...
-
[3]
John Brooke. 1996. SUS: A ’quick and Dirty’ Usability Scale.Usability Evaluation in Industry189 (1996), 189–194
work page 1996
-
[4]
Burno, Bing Wu, Rina Doherty, Hannah Colett, and Rania Elnaggar
Rachael A. Burno, Bing Wu, Rina Doherty, Hannah Colett, and Rania Elnaggar. 2015. Applying Fitts’ Law to Gesture Based Computer Interactions.Procedia Manufacturing3 (2015), 4342–4349. doi:10.1016/j.promfg.2015.07.429 6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the Affiliated Conferences, AHFE 2015
-
[5]
Gregory W. Corder and Dale I. Foreman. 2009.Comparing More Than Two Related Samples: The Friedman Test. John Wiley & Sons, Ltd, Chapter 5, 79–98. doi:10.1002/9781118165881.ch5
-
[6]
Alena Denisova and Paul Cairns. 2015. The Placebo Effect in Digital Games: Phantom Perception of Adaptive Artificial Intelligence. InProceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play(London, United Kingdom)(CHI PLAY ’15). Association for Computing Machinery, New York, NY, USA, 23–33. doi:10.1145/2793107.2793109
-
[7]
Alena Denisova and Eliott Cook. 2019. Power-Ups in Digital Games: The Rewarding Effect of Phantom Game Elements on Player Experience. InProceedings of the Annual Symposium on Computer-Human Interaction in Play(Barcelona, Spain)(CHI PLAY ’19). Association for Computing Machinery, New York, NY, USA, 161–168. doi:10.1145/3311350.3347173 AI Washing Inflates E...
-
[8]
Sarah A. Douglas, Arthur E. Kirkpatrick, and I. Scott MacKenzie. 1999. Testing pointing device performance and user assessment with the ISO 9241, Part 9 standard. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Pittsburgh, Pennsylvania, USA)(CHI ’99). Association for Computing Machinery, New York, NY, USA, 215–222. doi:10.1145...
-
[9]
European Parliament and Council of the European Union. 2024. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union, L 2024/1689. https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng
work page 2024
-
[10]
Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Statistical Power Analyses Using G*Power 3.1: Tests for Correlation and Regression Analyses.Behavior Research Methods41, 4 (Nov. 2009), 1149–1160. doi:10.3758/BRM.41.4.1149
-
[11]
Franz Faul, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. G*Power 3: A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences.Behavior Research Methods39, 2 (May 2007), 175–191. doi:10.3758/BF03193146
-
[12]
Mayara Costa Figueiredo, Elizabeth Ankrah, Jacquelyn E. Powell, Daniel A. Epstein, and Yunan Chen. 2023. Powered by AI: Examining How AI Descriptions Influence Perceptions of Fertility Tracking Applications.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7, 4 (Dec. 2023), 1–24. doi:10.1145/3631414
-
[13]
Paul M. Fitts. 1954. The Information Capacity of the Human Motor System in Controlling the Amplitude of Movement.Journal of Experimental Psychology47, 6 (1954), 381–391. doi:10.1037/h0055392
-
[14]
Kristina Flägel, Britta Galler, Jost Steinhäuser, and Katja Götz. 2019. Der „National Aeronautics and Space Administration-Task Load Index“ (NASA-TLX) – Ein Instrument Zur Erfassung Der Arbeitsbelastung in Der Hausärztlichen Sprechstunde: Bestimmung Der Psychometrischen Eigenschaften.Zeitschrift für Evidenz, Fortbildung und Qualität im Gesundheitswesen147...
-
[15]
Erin D. Foster and Ariel Deardorff. 2017. Open Science Framework (OSF).Journal of the Medical Library Association105, 2 (April 2017). doi:10.5195/jmla.2017.88
-
[16]
Pedro García García, Enrico Costanza, Jhim Verame, Diana Nowacka, and Sarvapali D. Ramchurn. 2021. Seeing (Movement) is Believing: The Effect of Motion on Perception of Automatic Systems Performance.Human–Computer Interaction36, 1 (2021), 1–51. doi:10.1080/07370024.2018.1453815
-
[17]
2017.Gartner Says AI Technologies Will Be in Almost Every New Software Product by 2020
Gartner. 2017.Gartner Says AI Technologies Will Be in Almost Every New Software Product by 2020. Gartner, Inc. Retrieved March 24, 2026 from https://www.gartner.com/en/newsroom/press-releases/2017-07-18-gartner-says-ai-technologies-will-be-in-almost-every- new-software-product-by-2020
work page 2017
-
[18]
Jana Gonnermann-Müller, Kristina Sahling, and Jennifer Haase. 2025. Let’s Be Realistic: AI-Recommender Use in a Complex Management Setting. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–10. doi:10.1145/3706599.3720131
-
[19]
Sandra G. Hart. 2006. Nasa-Task Load Index (NASA-TLX); 20 Years Later.Proceedings of the Human Factors and Ergonomics Society Annual Meeting50, 9 (Oct. 2006), 904–908. doi:10.1177/154193120605000909
-
[20]
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. InHuman Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. doi:10.1016/S0166-4115(08)62386-9
-
[21]
Morten Hertzum. 2021. Reference Values and Subscale Patterns for the Task Load Index (TLX): A Meta-Analytic Review.Ergonomics64, 7 (July 2021), 869–878. doi:10.1080/00140139.2021.1876927
-
[22]
Silas Hsu, Vinay Koshy, Kristen Vaccaro, Christian Sandvig, and Karrie Karahalios. 2025. Placebo Effect of Control Settings in Feeds Are Not Always Strong. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 226, 16 pages. doi:10.1145/3706598.3714197
-
[23]
Philip Hurst, Lieke Schipof-Godart, Attila Szabo, John Raglin, Florentina Hettinga, Bart Roelands, Andrew Lane, Abby Foad, Damian Coleman, and Chris Beedie. 2020. The Placebo and Nocebo Effect on Sports Performance: A Systematic Review.European Journal of Sport Science20, 3 (April 2020), 279–292. doi:10.1080/17461391.2019.1655098
-
[24]
Adesso Inc. [n. d.].iMouse A30P – Wireless Mouse with AI CoPilot Shortcut Button. Retrieved April 30, 2026 from https://download. adesso.com/ds/iMouse_A30P_DS.pdf
work page 2026
-
[25]
Interaction Design Group. [n. d.].NASA-TLX (deutsche Übersetzung). Retrieved April 30, 2026 from https://web.archive.org/web/ 20221221074553/http://interaction-design-group.de/toolbox/wp-content/uploads/2016/05/NASA-TLX.pdf
work page 2026
-
[26]
DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing,
Agnes Mercedes Kloft, Robin Welsch, Thomas Kosch, and Steeven Villa. 2024. "AI Enhances Our Performance, I Have No Doubt This One Will Do the Same": The Placebo Effect Is Robust to Negative Descriptions of AI. InProceedings of the CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–24. doi:10.1145/3613904.3642633
-
[27]
Jing Kong and Xiangshi Ren. 2006. Calculation of Effective Target Width and Its Effects on Pointing Tasks.IPSJ Digital Courier2 (2006), 235–237. doi:10.2197/ipsjdc.2.235
-
[28]
Thomas Kosch, Robin Welsch, Lewis Chuang, and Albrecht Schmidt. 2023. The Placebo Effect of Artificial Intelligence in Hu- man–Computer Interaction.ACM Trans. Comput.-Hum. Interact.29, 6, Article 56 (Jan. 2023), 32 pages. doi:10.1145/3529225 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Nick von Felten, Luisa Müller, and Johannes Schöning
-
[29]
Markus Langer, Tim Hunsicker, Tina Feldkamp, Cornelius J. König, and Nina Grgić-Hlača. 2022. “Look! It’s a Computer Program! It’s an Algorithm! It’s AI!”: Does Terminology Affect Human Perceptions and Evaluations of Algorithmic Decision-Making Systems?. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(C...
-
[30]
Dirk Leffrang and Oliver Mueller. 2023. AI Washing: The Framing Effect of Labels on Algorithmic Advice Utilization. InICIS 2023 Proceedings. 10. https://aisel.aisnet.org/icis2023/aiinbus/aiinbus/10
work page 2023
-
[31]
Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, David Arpin, Tim Januschowski, Valentin Flunkert, Yuyang Wang, Jan Gasthaus, Lorenzo Stella, Syama Rangapuram, David Salinas, Sebastian Schelter, ...
-
[32]
Logitech. 2024.Logitech präsentiert den neuen Logi AI Prompt Builder – direkter Zugang zu KI, ohne dass der Workflow gestört wird. Retrieved April 30, 2026 from https://press.logitech.eu/de-de/logitech-praesentiert-den-neuen-logi-ai-prompt-builder-direkter-zugang- zu-ki-ohne-dass-der-workflow-gestoert-wird/
work page 2024
-
[33]
Lew Ludwig. 2026. AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference: By Arvind Narayanan and Sayash Kapoor.The Mathematical Intelligencer(Jan. 2026), s00283–025–10489–9. doi:10.1007/s00283-025-10489-9
-
[34]
I. Scott MacKenzie. 1992. Fitts’ Law as a Research and Design Tool in Human-Computer Interaction.Human–Computer Interaction7, 1 (March 1992), 91–139. doi:10.1207/s15327051hci0701_3
-
[35]
I. Scott MacKenzie. 2018. Fitts’ Law. InThe Wiley Handbook of Human Computer Interaction, Kent L. Norman and Jurek Kirakowski (Eds.). John Wiley & Sons, Ltd, Chapter 17, 347–370. doi:10.1002/9781118976005.ch17
-
[36]
Arvind Narayanan and Sayash Kapoor. 2024.AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference. Princeton University Press, Princeton Oxford
work page 2024
-
[37]
Nixtla. 2024.About TimeGPT. Retrieved April 30, 2026 from https://docs.nixtla.io/docs/getting-started-about_timegpt
work page 2024
-
[38]
Sebastian A. C. Perrig, Lena Fanya Aeschbach, Nicolas Scharowski, Nick von Felten, Klaus Opwis, and Florian Brühlmann. 2024. Measurement Practices in User Experience (UX) Research: A Systematic Quantitative Literature Review.Frontiers in Computer Science6 (March 2024), 1368860. doi:10.3389/fcomp.2024.1368860
-
[39]
Sebastian A. C. Perrig, Nick von Felten, Beat Vollenwyder, and Klaus Opwis. 2025. Development and Psychometric Validation of a Positively Worded German Version of the System Usability Scale (SUS).International Journal of Human–Computer Interaction41, 16 (2025), 10399–10419. doi:10.1080/10447318.2024.2434720
-
[40]
RingConn. 2026.RingConn Smart Ring. Retrieved March 24, 2026 from https://ringconn.com/
work page 2026
-
[41]
B. Rummel. 2016.System Usability Scale – jetzt auch auf Deutsch [System usability scale – Now also in German]. Retrieved March 24, 2026 from https://blogs.sap.com/2016/02/01/system-usability-scale-jetzt-auch-auf-deutsch/
work page 2016
-
[42]
Nicolas Scharowski, Sebastian A. C. Perrig, Nick von Felten, Lena Fanya Aeschbach, Klaus Opwis, Philipp Wintersberger, and Florian Brühlmann. 2025. To Trust or Distrust AI: A Questionnaire Validation Study. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY,...
-
[43]
Peter Seele and Mario D. Schultz. 2022. From Greenwashing to Machinewashing: A Model and Future Directions Derived from Reasoning by Analogy.Journal of Business Ethics178, 4 (July 2022), 1063–1089. doi:10.1007/s10551-022-05054-9
-
[44]
A. K. Shapiro. 1959. The Placebo Effect in the History of Medical Treatment: Implications for Psychiatry.The American Journal of Psychiatry116 (Oct. 1959), 298–304. doi:10.1176/ajp.116.4.298
-
[45]
Margaret Shih, Todd L. Pittinsky, and Nalini Ambady. 1999. Stereotype Susceptibility: Identity Salience and Shifts in Quantitative Performance.Psychological Science10, 1 (Jan. 1999), 80–83. doi:10.1111/1467-9280.00111
-
[46]
Baba Shiv, Ziv Carmon, and Dan Ariely. 2005. Placebo Effects of Marketing Actions: Consumers May Get What They Pay For.Journal of Marketing Research42, 4 (Nov. 2005), 383–393. doi:10.1509/jmkr.2005.42.4.383
-
[47]
R. William Soukoreff and I. Scott MacKenzie. 2004. Towards a Standard for Pointing Device Evaluation, Perspectives on 27 Years of Fitts’ Law Research in HCI.International Journal of Human-Computer Studies61, 6 (Dec. 2004), 751–789. doi:10.1016/j.ijhcs.2004.09.001
-
[48]
How to Do Better with Gender on Surveys: A Guide for HCI Researchers
Katta Spiel, Oliver L. Haimson, and Danielle Lottridge. 2019. How to do better with gender on surveys: a guide for HCI researchers. Interactions26, 4 (June 2019), 62–65. doi:10.1145/3338283
-
[49]
Sarita Sridharan, Steeven Villa, Alena Denisova, and Johanna Pirker. 2025. Exploring the (Placebo) Effect of an AI-Powered Dialogue System on Player Experience. In2025 IEEE Conference on Games (CoG). IEEE, Lisbon, Portugal, 1–8. doi:10.1109/CoG64752.2025.11114156
-
[50]
Kristen Vaccaro, Dylan Huang, Motahhare Eslami, Christian Sandvig, Kevin Hamilton, and Karrie Karahalios. 2018. The Illusion of Control: Placebo Effects of Control Settings. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, Montreal QC Canada, 1–13. doi:10.1145/3173574.3173590 AI Washing Inflates Expected Performance: AI...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.