Functional Misalignment in Human-AI Interactions on Digital Platforms
Pith reviewed 2026-05-10 15:44 UTC · model grok-4.3
The pith
Algorithmic systems optimize for predictable user behavior in ways that structurally misalign with the human goals these predictions are meant to serve.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Functional misalignment arises through a bias toward modeling fast, reactive behavioral signals over reflective judgment, feedback loops that couple user behavior with algorithmic learning, and emergent collective dynamics that amplify these effects at scale. Accurate individual-level predictions can therefore produce adverse societal outcomes because the optimization target of predictability diverges from the human goals these predictions are intended to serve.
What carries the argument
Functional misalignment, the structural gap between what algorithms optimize (predictable behavior) and the human goals these predictions are intended to serve.
If this is right
- Accurate predictions at the individual level can still generate negative societal outcomes when the three mechanisms are active.
- Mitigation efforts must address the misalignment itself rather than focus solely on increasing prediction accuracy.
- The framework accounts for multiple harms, including polarization and trust erosion, under a single structural account.
- Research can now target the three mechanisms to study and reduce adverse effects in human-AI interaction systems.
Where Pith is reading between the lines
- Interventions that favor reflective signals or interrupt feedback loops could reduce harms while preserving useful prediction capabilities.
- The same misalignment pattern may appear in other AI systems where short-term observable signals conflict with longer-term user or societal interests.
- Controlled platform modifications that alter signal weighting or loop strength offer direct tests of whether the mechanisms drive the listed harms.
Load-bearing premise
The three mechanisms—bias toward fast reactive signals, feedback loops coupling behavior and learning, and emergent collective dynamics—are the primary and sufficient causes of the observed societal harms.
What would settle it
An experiment that reduces algorithmic bias toward fast reactive signals or breaks feedback loops between behavior and learning, then measures whether rates of polarization, mental health concerns, or trust erosion decline on the affected platforms.
Figures
read the original abstract
Algorithmic systems, particularly social media recommenders, have achieved remarkable success in predicting behavior. By optimizing for observable signals such as clicks, views, and engagement, these systems effectively capture user attention and guide interaction. Yet their widespread adoption has coincided with troubling outcomes, including rising mental health concerns, increasing polarization, and erosion of trust. This paper argues that these effects are consequences of a structural functional misalignment between what algorithms optimize - predictable behavior - and the human goals these predictions are intended to serve. We propose that this misalignment arises through three mechanisms: (1) a bias toward modeling fast, reactive behavioral signals over reflective judgment, (2) feedback loops that couple user behavior with algorithmic learning, and (3) emergent collective dynamics that amplify these effects at scale. Together, these mechanisms explain how accurate individual-level predictions can produce adverse societal outcomes. We present functional misalignment as a unifying framework and outline a research agenda for studying and mitigating its effects in human-AI interaction systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that troubling societal outcomes associated with social media recommenders—rising mental health concerns, polarization, and erosion of trust—are consequences of a structural functional misalignment between algorithmic optimization for predictable behavior (via engagement signals like clicks and views) and the human goals these predictions serve. It identifies three mechanisms through which this misalignment operates: (1) bias toward modeling fast, reactive behavioral signals over reflective judgment, (2) feedback loops coupling user behavior with algorithmic learning, and (3) emergent collective dynamics that amplify effects at scale. The paper presents 'functional misalignment' as a unifying framework and outlines a research agenda for studying and mitigating its effects in human-AI interaction systems.
Significance. If substantiated through modeling or evidence, the framework could offer a useful conceptual lens for diagnosing and redesigning recommendation systems to reduce unintended societal harms. At present, however, the work remains at the level of qualitative description without empirical tests, formal derivations, or simulations, so its significance is limited to hypothesis generation rather than explanatory power.
major comments (3)
- [Abstract] Abstract: The central claim that accurate individual-level predictions produce adverse societal outcomes 'specifically through' the three mechanisms is not supported by any derivation, agent-based model, simulation, or empirical comparison. The mechanisms are described rather than shown to be necessary or sufficient, leaving open whether the harms would arise under different optimization targets or exogenous factors.
- [Mechanisms description] Mechanisms section: The feedback-loop mechanism is asserted to couple user behavior with algorithmic learning in ways that amplify polarization and mental-health decline, yet no concrete illustration, equation, or example is supplied showing how optimization for predictable behavior (as opposed to other platform policies) produces these specific outcomes.
- [Framework and research agenda] Framework proposal: The definition of functional misalignment is framed in terms of the very outcomes (polarization, mental-health decline, trust erosion) it is invoked to explain, creating a circularity that prevents independent verification of the causal link.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from explicit statements distinguishing the proposed framework from prior work on algorithmic bias, filter bubbles, and engagement optimization.
- Notation for the three mechanisms is introduced informally; a numbered list or table summarizing each mechanism, its hypothesized pathway, and the predicted societal harm would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments, which highlight important opportunities to clarify the scope and presentation of our conceptual framework. We respond to each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that accurate individual-level predictions produce adverse societal outcomes 'specifically through' the three mechanisms is not supported by any derivation, agent-based model, simulation, or empirical comparison. The mechanisms are described rather than shown to be necessary or sufficient, leaving open whether the harms would arise under different optimization targets or exogenous factors.
Authors: We agree that the paper is a conceptual contribution and does not include formal derivations, simulations, or new empirical tests. The central claim is advanced as a synthesized argument drawing on existing literature and logical analysis of the mechanisms, rather than a demonstration of necessity or sufficiency. In the revised manuscript, we will modify the abstract to state explicitly that the framework proposes these mechanisms as plausible pathways contributing to the observed outcomes and positions the work as hypothesis-generating. This revision will temper the language and directly address the concern about unsupported causal assertions. revision: partial
-
Referee: [Mechanisms description] Mechanisms section: The feedback-loop mechanism is asserted to couple user behavior with algorithmic learning in ways that amplify polarization and mental-health decline, yet no concrete illustration, equation, or example is supplied showing how optimization for predictable behavior (as opposed to other platform policies) produces these specific outcomes.
Authors: The referee correctly identifies that the feedback-loop description would be strengthened by additional concreteness. While developing a full simulation or set of equations exceeds the scope of this framework paper, we will incorporate a stylized illustrative example in the revised mechanisms section. This example will describe how optimization for short-term engagement signals can create reinforcing loops favoring reactive content, drawing on documented patterns from recommender system research. We will also add citations to relevant studies on feedback dynamics to provide grounding without claiming new empirical results. revision: partial
-
Referee: [Framework and research agenda] Framework proposal: The definition of functional misalignment is framed in terms of the very outcomes (polarization, mental-health decline, trust erosion) it is invoked to explain, creating a circularity that prevents independent verification of the causal link.
Authors: We appreciate the identification of potential circularity. In the revision, we will redefine functional misalignment more independently as the structural mismatch between algorithmic optimization for immediate, predictable behavioral signals and the broader human goals of reflective judgment and sustained well-being. The societal outcomes will then be presented as downstream consequences that the framework seeks to explain, supported by references to independent empirical literature on those phenomena. This adjustment should facilitate future verification through targeted studies. revision: yes
Circularity Check
No circularity; conceptual framework with independent definitions
full rationale
The paper advances a qualitative argument that societal harms arise from a defined mismatch between algorithmic optimization for predictable behavior and intended human goals, mediated by three descriptive mechanisms. No equations, fitted parameters, predictions, or self-citations appear in the provided text that would reduce any claim to its own inputs by construction. The misalignment is posited as an explanatory hypothesis rather than a self-referential tautology, and the mechanisms are listed as proposed pathways without derivation from the outcomes they address. The derivation chain is therefore self-contained as a conceptual proposal.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Optimizing for observable fast signals such as clicks produces accurate individual predictions but conflicts with reflective human goals.
invented entities (1)
-
functional misalignment
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose that this misalignment arises through three mechanisms: (1) a bias toward modeling fast, reactive behavioral signals over reflective judgment, (2) feedback loops that couple user behavior with algorithmic learning, and (3) emergent collective dynamics that amplify these effects at scale.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
functional misalignment as a unifying framework
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Arpit Agarwal, Nicolas Usunier, Alessandro Lazaric, and Maximilian Nickel. 2024. System-2 recommenders: disentangling utility and engagement in recommendation systems via temporal point-processes. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1763–1773
work page 2024
-
[2]
C. A. Bail, L. P. Argyle, T. W. Brown, J. P. Bumpus, H. Chen, M. B. F. Hunzaker, J. Lee, M. Mann, F. Merhout, and A. Volfovsky. 2018. Exposure to Opposing Views on Social Media Can Increase Political Polarization.Proceedings of the National Academy of Sciences115, 37 (2018), 9216–9221
work page 2018
-
[3]
P. Barberá. 2020. Social Media, Echo Chambers, and Political Polarization. InSocial Media and Democracy: The State of the Field, Prospects for Reform. 34–55
work page 2020
-
[4]
N. Bartley, A. Abeliuk, E. Ferrara, and K. Lerman. 2021. Auditing Algorithmic Bias on Twitter. InProceedings of the 13th ACM Web Science Conference
work page 2021
-
[5]
Roy F Baumeister, Ellen Bratslavsky, Catrin Finkenauer, and Kathleen D Vohs. 2001. Bad is stronger than good.Review of general psychology5, 4 (2001), 323–370
work page 2001
-
[6]
William J Brady, Ana P Gantman, and Jay J Van Bavel. 2020. Attentional capture helps explain why moral and emotional content go viral.Journal of Experimental Psychology: General149, 4 (2020), 746
work page 2020
-
[7]
William J Brady, Joshua Conrad Jackson, Björn Lindström, and MJ Crockett. 2023. Algorithm-mediated social learning in online social networks. Trends in Cognitive Sciences27, 10 (2023), 947–960
work page 2023
-
[8]
William J Brady, Julian A Wills, John T Jost, Joshua A Tucker, and Jay J Van Bavel. 2017. Emotion shapes the diffusion of moralized content in social networks.Proceedings of the National Academy of Sciences114, 28 (2017), 7313–7318
work page 2017
-
[9]
Keith Burghardt, Emanuel F Alsina, Michelle Girvan, William Rand, and Kristina Lerman. 2017. The myopia of crowds: Cognitive load and collective evaluation of answers on Stack Exchange.PloS one12, 3 (2017), e0173610
work page 2017
-
[10]
Keith Burghardt, Tad Hogg, Raissa D’Souza, Kristina Lerman, and Marton Posfai. 2020. Origins of algorithmic instabilities in crowdsourced ranking. Proceedings of the ACM on Human-Computer Interaction4, CSCW2 (2020), 1–20
work page 2020
-
[11]
Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan. 2021. Estimating and penalizing preference shift in recommender systems. InProceedings of the 15th ACM Conference on Recommender Systems. 661–667. 10 Lerman
work page 2021
-
[12]
Wen Chen, Diogo Pacheco, Kai-Cheng Yang, and Filippo Menczer. 2021. Neutral bots probe political bias on social media.Nature communications 12, 1 (2021), 5580
work page 2021
-
[13]
Sophia Choukas-Bradley, Savannah R Roberts, Anne J Maheux, and Jacqueline Nesi. 2022. The perfect storm: A developmental–sociocultural framework for the role of social media in adolescent girls’ body image concerns and mental health.Clinical child and family psychology review25, 4 (2022), 681–701
work page 2022
-
[14]
Matteo Cinelli, Gianmarco De Francisci Morales, Alessandro Galeazzi, Walter Quattrociocchi, and Michele Starnini. 2021. The echo chamber effect on social media.Proceedings of the national academy of sciences118, 9 (2021), e2023301118
work page 2021
-
[15]
Daniel MT Fessler, Anne C Pisor, and Carlos David Navarrete. 2014. Negatively-biased credulity and the cultural evolution of beliefs.PloS one9, 4 (2014), e95167
work page 2014
-
[16]
Scott Griffiths, Emily A Harris, Grace Whitehead, Felicity Angelopoulos, Ben Stone, Wesley Grey, and Simon Dennis. 2024. Does TikTok contribute to eating disorders? A comparison of the TikTok algorithms belonging to individuals with eating disorders versus healthy controls.Body image51 (2024), 101807
work page 2024
-
[17]
Andrew M Guess, Brendan Nyhan, and Jason Reifler. 2020. Exposure to untrustworthy websites in the 2016 US election.Nature human behaviour4, 5 (2020), 472–480
work page 2020
-
[18]
Jonathan Haidt. 2024.The Anxious Generation: How the Great Rewiring of Childhood Is Causing an Epidemic of Mental Illness. Penguin Press, New York
work page 2024
-
[19]
Joseph Henrich and Francisco J Gil-White. 2001. The evolution of prestige: Freely conferred deference as a mechanism for enhancing the benefits of cultural transmission.Evolution and human behavior22, 3 (2001), 165–196
work page 2001
-
[20]
Derek E Holliday, Yphtach Lelkes, and Sean J Westwood. 2025. Why depolarization is hard: Evaluating attempts to decrease partisan animosity in America.Proceedings of the National Academy of Sciences122, 39 (2025), e2508827122
work page 2025
-
[21]
Ferenc Huszár, Sofia Ira Ktena, Conor O’Brien, Luca Belli, Andrew Schlaikjer, and Moritz Hardt. 2022. Algorithmic amplification of politics on Twitter.Proceedings of the national academy of sciences119, 1 (2022), e2025334119
work page 2022
-
[22]
Shanto Iyengar and Sean J Westwood. 2015. Fear and loathing across party lines: New evidence on group polarization.American journal of political science59, 3 (2015), 690–707
work page 2015
- [23]
-
[24]
Daniel Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux
work page 2011
-
[25]
Poruz Khambatta, Shwetha Mariadassou, Joshua Morris, and S Christian Wheeler. 2023. Tailoring recommendation algorithms to ideal preferences makes users better off.Scientific Reports13, 1 (2023), 9325
work page 2023
-
[26]
Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Manish Raghavan. 2024. The inversion problem: Why algorithms should infer mental state and not just predict behavior.Perspectives on Psychological Science19, 5 (2024), 827–838
work page 2024
-
[27]
K. Lerman. 2016. Information Is Not a Virus, and Other Consequences of Human Cognitive Limits.Future Internet8, 2 (2016), 21
work page 2016
-
[28]
Kristina Lerman and Tad Hogg. 2014. Leveraging position bias to improve peer recommendation.PloS one9, 6 (2014), e98914
work page 2014
-
[29]
Stephan Lewandowsky, Ullrich KH Ecker, Colleen M Seifert, Norbert Schwarz, and John Cook. 2012. Misinformation and its correction: Continued influence and successful debiasing.Psychological Science in the Public Interest13, 3 (2012), 106–131
work page 2012
-
[30]
Norman P Li, Mark Van Vugt, and Stephen M Colarelli. 2018. The evolutionary mismatch hypothesis: Implications for psychological science. Current Directions in Psychological Science27, 1 (2018), 38–44
work page 2018
-
[31]
Amy J Lim and Edison Tan. 2024. Social media ills and evolutionary mismatches: A conceptual framework.Evolutionary Psychological Science10, 3 (2024), 212–235
work page 2024
-
[32]
Yang-Yu Liu, Jean-Jacques Slotine, and Albert-László Barabási. 2011. Controllability of complex networks.nature473, 7346 (2011), 167–173
work page 2011
-
[33]
R. K. Merton. 1968. The Matthew Effect in Science.Science159, 3810 (1968), 56–63
work page 1968
-
[34]
Smitha Milli, Luca Belli, and Moritz Hardt. 2021. From optimizing engagement to measuring value. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency. 714–722
work page 2021
-
[35]
Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, and Anca D Dragan. 2025. Engagement, user satisfaction, and the amplification of divisive content on social media.PNAS nexus4, 3 (2025), pgaf062
work page 2025
-
[36]
Buddhika Nettasinghe, Nazanin Alipourfard, Vikram Krishnamurthy, and Kristina Lerman. 2026. Emergence of Structural Disparities in theWeb of Scientific Citations. InProceedings of TheWebConference (WWW)
work page 2026
-
[37]
Buddhika Nettasinghe, Allon G Percus, and Kristina Lerman. 2025. How out-group animosity can shape partisan divisions: A model of affective polarization.PNAS nexus4, 3 (2025), pgaf082
work page 2025
-
[38]
Buddhika Nettasinghe, Ashwin Rao, Bohan Jiang, Allon G Percus, and Kristina Lerman. 2025. In-Group Love, Out-Group Hate: A Framework to Measure Affective Polarization via Contentious Online Discussions. InProceedings of the ACM on Web Conference 2025. 560–575
work page 2025
-
[39]
Dimitar Nikolov, Diego FM Oliveira, Alessandro Flammini, and Filippo Menczer. 2015. Measuring online social bubbles.PeerJ computer science1 (2015), e38
work page 2015
-
[40]
Brendan Nyhan and Jason Reifler. 2010. When corrections fail: The persistence of political misperceptions.Political Behavior32, 2 (2010), 303–330
work page 2010
-
[41]
Thomas Olesen. 2025. Big tech whistleblowing: Frances Haugen and the Facebook files.Organization(2025), 13505084251321785. Functional Misalignment in Human–AI Interactions on Digital Platforms 11
work page 2025
-
[42]
Amy Orben, Adrian Meier, Tim Dalgleish, and Sarah-Jayne Blakemore. 2024. Mechanisms linking social media use to adolescent mental health vulnerability.Nature Reviews Psychology3, 6 (2024), 407–423
work page 2024
-
[43]
Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina, Ricardo Baeza-Yates, Albert-László Barabási, Frank Dignum, Virginia Dignum, Tina Eliassi-Rad, Fosca Giannotti, János Kertész, et al. 2025. Human-AI coevolution.Artificial Intelligence339 (2025), 104244
work page 2025
-
[44]
Gordon Pennycook and David G Rand. 2019. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences116, 7 (2019), 2521–2526
work page 2019
-
[45]
Lakshmi Radhakrishnan. 2022. Pediatric emergency department visits associated with mental health conditions before and during the COVID-19 pandemic—United States, January 2019–January 2022.MMWR. Morbidity and mortality weekly report71 (2022)
work page 2022
-
[46]
Amanda Remsö, Emma A Renström, and Hanna Bäck. 2025. Climate change threats and affective polarization. Exploring the role of negative emotional reactions.Analyses of Social Issues and Public Policy25, 2 (2025), e70020
work page 2025
-
[47]
Vicky Rideout. 2021. The common sense census: media use by tweens and teens in America, a common sense media research study, United States,
work page 2021
-
[48]
Matthew J Salganik, Peter Sheridan Dodds, and Duncan J Watts. 2006. Experimental study of inequality and unpredictability in an artificial cultural market.science311, 5762 (2006), 854–856
work page 2006
-
[49]
Donghee Shin. 2024. Misinformation, extremism, and conspiracies: Amplification and polarization by algorithms. InArtificial misinformation: exploring human-algorithm interaction online. Springer, 49–78
work page 2024
-
[50]
Ben Smith. 2021. How TikTok Reads Your Mind.The New York Times(5 Dec. 2021). https://www.nytimes.com/2021/12/05/business/media/tiktok- algorithm.html Accessed: 2026-03-31
work page 2021
-
[51]
Jonathan Stray, Ian Baker, George Beknazar-Yuzbashev, Ceren Budak, Julia Kamin, Kylan Rutherford, Mateusz Stalinski, Tin Acosta, Chris Bail, Michael Bernstein, et al. 2026. The Prosocial Ranking Challenge: Reducing Polarization on Social Media without Sacrificing Engagement.arXiv preprint arXiv:2603.19626(2026)
- [52]
-
[53]
Samantha Teague, Klaire Somoray, Adrian Shatte, Daniel Miller, Kristian Moss, Andrew Crawford, Harrison Wildman, Diana Kayal, and Delyse Hutchinson. 2026. Digital Media Use and Child Health and Development: A Systematic Review and Meta-Analysis.JAMA pediatrics(2026)
work page 2026
-
[54]
Petter Törnberg. 2022. How digital media drive affective polarization through partisan sorting.Proceedings of the National Academy of Sciences119, 42 (2022), e2207159119
work page 2022
-
[55]
Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics and Biases: Biases in judgments reveal some heuristics of thinking under uncertainty.science185, 4157 (1974), 1124–1131
work page 1974
-
[56]
Jean M. Twenge. 2017.iGen: Why Today’s Super-Connected Kids Are Growing Up Less Rebellious, More Tolerant, Less Happy—and Completely Unprepared for Adulthood. Atria Books, New York
work page 2017
-
[57]
Albert Wendsjö, Hanna Bäck, and Andrej Kokkonen. 2025. A Male Hostility Spiral? Polarized Communication among Political Elites on Social Media. (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.