pith. sign in

arxiv: 2604.11459 · v1 · submitted 2026-04-13 · 💻 cs.CY · cs.HC

Functional Misalignment in Human-AI Interactions on Digital Platforms

Pith reviewed 2026-05-10 15:44 UTC · model grok-4.3

classification 💻 cs.CY cs.HC
keywords functional misalignmenthuman-AI interactionalgorithmic recommenderspredictable behaviorsocietal harmsfeedback loopspolarization
0
0 comments X

The pith

Algorithmic systems optimize for predictable user behavior in ways that structurally misalign with the human goals these predictions are meant to serve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that troubling outcomes from algorithmic recommenders, including mental health concerns, polarization, and erosion of trust, arise as direct consequences of a structural functional misalignment. Algorithms achieve strong predictions of observable signals such as clicks and engagement by targeting predictable behavior, yet this target diverges from the broader human goals the systems are intended to support. The misalignment operates through three mechanisms: a bias toward fast reactive signals over reflective judgment, feedback loops that couple behavior with learning, and emergent collective dynamics that scale the effects. A reader would care because the account explains how accurate individual predictions can still produce adverse collective results without contradicting the systems' technical success.

Core claim

Functional misalignment arises through a bias toward modeling fast, reactive behavioral signals over reflective judgment, feedback loops that couple user behavior with algorithmic learning, and emergent collective dynamics that amplify these effects at scale. Accurate individual-level predictions can therefore produce adverse societal outcomes because the optimization target of predictability diverges from the human goals these predictions are intended to serve.

What carries the argument

Functional misalignment, the structural gap between what algorithms optimize (predictable behavior) and the human goals these predictions are intended to serve.

If this is right

  • Accurate predictions at the individual level can still generate negative societal outcomes when the three mechanisms are active.
  • Mitigation efforts must address the misalignment itself rather than focus solely on increasing prediction accuracy.
  • The framework accounts for multiple harms, including polarization and trust erosion, under a single structural account.
  • Research can now target the three mechanisms to study and reduce adverse effects in human-AI interaction systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interventions that favor reflective signals or interrupt feedback loops could reduce harms while preserving useful prediction capabilities.
  • The same misalignment pattern may appear in other AI systems where short-term observable signals conflict with longer-term user or societal interests.
  • Controlled platform modifications that alter signal weighting or loop strength offer direct tests of whether the mechanisms drive the listed harms.

Load-bearing premise

The three mechanisms—bias toward fast reactive signals, feedback loops coupling behavior and learning, and emergent collective dynamics—are the primary and sufficient causes of the observed societal harms.

What would settle it

An experiment that reduces algorithmic bias toward fast reactive signals or breaks feedback loops between behavior and learning, then measures whether rates of polarization, mental health concerns, or trust erosion decline on the affected platforms.

Figures

Figures reproduced from arXiv: 2604.11459 by Kristina Lerman.

Figure 1
Figure 1. Figure 1: Functional misalignment framework: Social media algorithms trained to accurately predict user engagement learn to amplify [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Algorithmic systems, particularly social media recommenders, have achieved remarkable success in predicting behavior. By optimizing for observable signals such as clicks, views, and engagement, these systems effectively capture user attention and guide interaction. Yet their widespread adoption has coincided with troubling outcomes, including rising mental health concerns, increasing polarization, and erosion of trust. This paper argues that these effects are consequences of a structural functional misalignment between what algorithms optimize - predictable behavior - and the human goals these predictions are intended to serve. We propose that this misalignment arises through three mechanisms: (1) a bias toward modeling fast, reactive behavioral signals over reflective judgment, (2) feedback loops that couple user behavior with algorithmic learning, and (3) emergent collective dynamics that amplify these effects at scale. Together, these mechanisms explain how accurate individual-level predictions can produce adverse societal outcomes. We present functional misalignment as a unifying framework and outline a research agenda for studying and mitigating its effects in human-AI interaction systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript argues that troubling societal outcomes associated with social media recommenders—rising mental health concerns, polarization, and erosion of trust—are consequences of a structural functional misalignment between algorithmic optimization for predictable behavior (via engagement signals like clicks and views) and the human goals these predictions serve. It identifies three mechanisms through which this misalignment operates: (1) bias toward modeling fast, reactive behavioral signals over reflective judgment, (2) feedback loops coupling user behavior with algorithmic learning, and (3) emergent collective dynamics that amplify effects at scale. The paper presents 'functional misalignment' as a unifying framework and outlines a research agenda for studying and mitigating its effects in human-AI interaction systems.

Significance. If substantiated through modeling or evidence, the framework could offer a useful conceptual lens for diagnosing and redesigning recommendation systems to reduce unintended societal harms. At present, however, the work remains at the level of qualitative description without empirical tests, formal derivations, or simulations, so its significance is limited to hypothesis generation rather than explanatory power.

major comments (3)
  1. [Abstract] Abstract: The central claim that accurate individual-level predictions produce adverse societal outcomes 'specifically through' the three mechanisms is not supported by any derivation, agent-based model, simulation, or empirical comparison. The mechanisms are described rather than shown to be necessary or sufficient, leaving open whether the harms would arise under different optimization targets or exogenous factors.
  2. [Mechanisms description] Mechanisms section: The feedback-loop mechanism is asserted to couple user behavior with algorithmic learning in ways that amplify polarization and mental-health decline, yet no concrete illustration, equation, or example is supplied showing how optimization for predictable behavior (as opposed to other platform policies) produces these specific outcomes.
  3. [Framework and research agenda] Framework proposal: The definition of functional misalignment is framed in terms of the very outcomes (polarization, mental-health decline, trust erosion) it is invoked to explain, creating a circularity that prevents independent verification of the causal link.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from explicit statements distinguishing the proposed framework from prior work on algorithmic bias, filter bubbles, and engagement optimization.
  2. Notation for the three mechanisms is introduced informally; a numbered list or table summarizing each mechanism, its hypothesized pathway, and the predicted societal harm would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which highlight important opportunities to clarify the scope and presentation of our conceptual framework. We respond to each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that accurate individual-level predictions produce adverse societal outcomes 'specifically through' the three mechanisms is not supported by any derivation, agent-based model, simulation, or empirical comparison. The mechanisms are described rather than shown to be necessary or sufficient, leaving open whether the harms would arise under different optimization targets or exogenous factors.

    Authors: We agree that the paper is a conceptual contribution and does not include formal derivations, simulations, or new empirical tests. The central claim is advanced as a synthesized argument drawing on existing literature and logical analysis of the mechanisms, rather than a demonstration of necessity or sufficiency. In the revised manuscript, we will modify the abstract to state explicitly that the framework proposes these mechanisms as plausible pathways contributing to the observed outcomes and positions the work as hypothesis-generating. This revision will temper the language and directly address the concern about unsupported causal assertions. revision: partial

  2. Referee: [Mechanisms description] Mechanisms section: The feedback-loop mechanism is asserted to couple user behavior with algorithmic learning in ways that amplify polarization and mental-health decline, yet no concrete illustration, equation, or example is supplied showing how optimization for predictable behavior (as opposed to other platform policies) produces these specific outcomes.

    Authors: The referee correctly identifies that the feedback-loop description would be strengthened by additional concreteness. While developing a full simulation or set of equations exceeds the scope of this framework paper, we will incorporate a stylized illustrative example in the revised mechanisms section. This example will describe how optimization for short-term engagement signals can create reinforcing loops favoring reactive content, drawing on documented patterns from recommender system research. We will also add citations to relevant studies on feedback dynamics to provide grounding without claiming new empirical results. revision: partial

  3. Referee: [Framework and research agenda] Framework proposal: The definition of functional misalignment is framed in terms of the very outcomes (polarization, mental-health decline, trust erosion) it is invoked to explain, creating a circularity that prevents independent verification of the causal link.

    Authors: We appreciate the identification of potential circularity. In the revision, we will redefine functional misalignment more independently as the structural mismatch between algorithmic optimization for immediate, predictable behavioral signals and the broader human goals of reflective judgment and sustained well-being. The societal outcomes will then be presented as downstream consequences that the framework seeks to explain, supported by references to independent empirical literature on those phenomena. This adjustment should facilitate future verification through targeted studies. revision: yes

Circularity Check

0 steps flagged

No circularity; conceptual framework with independent definitions

full rationale

The paper advances a qualitative argument that societal harms arise from a defined mismatch between algorithmic optimization for predictable behavior and intended human goals, mediated by three descriptive mechanisms. No equations, fitted parameters, predictions, or self-citations appear in the provided text that would reduce any claim to its own inputs by construction. The misalignment is posited as an explanatory hypothesis rather than a self-referential tautology, and the mechanisms are listed as proposed pathways without derivation from the outcomes they address. The derivation chain is therefore self-contained as a conceptual proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on the untested premise that optimizing for predictable behavior necessarily conflicts with reflective human goals, plus the introduction of the misalignment concept itself.

axioms (1)
  • domain assumption Optimizing for observable fast signals such as clicks produces accurate individual predictions but conflicts with reflective human goals.
    Stated directly in the abstract as the source of misalignment.
invented entities (1)
  • functional misalignment no independent evidence
    purpose: Unifying explanation for how accurate predictions produce adverse societal outcomes
    New term and framework introduced without external empirical grounding or falsifiable test.

pith-pipeline@v0.9.0 · 5457 in / 1341 out tokens · 29054 ms · 2026-05-10T15:44:09.881179+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

  1. [1]

    Arpit Agarwal, Nicolas Usunier, Alessandro Lazaric, and Maximilian Nickel. 2024. System-2 recommenders: disentangling utility and engagement in recommendation systems via temporal point-processes. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1763–1773

  2. [2]

    C. A. Bail, L. P. Argyle, T. W. Brown, J. P. Bumpus, H. Chen, M. B. F. Hunzaker, J. Lee, M. Mann, F. Merhout, and A. Volfovsky. 2018. Exposure to Opposing Views on Social Media Can Increase Political Polarization.Proceedings of the National Academy of Sciences115, 37 (2018), 9216–9221

  3. [3]

    P. Barberá. 2020. Social Media, Echo Chambers, and Political Polarization. InSocial Media and Democracy: The State of the Field, Prospects for Reform. 34–55

  4. [4]

    Bartley, A

    N. Bartley, A. Abeliuk, E. Ferrara, and K. Lerman. 2021. Auditing Algorithmic Bias on Twitter. InProceedings of the 13th ACM Web Science Conference

  5. [5]

    Roy F Baumeister, Ellen Bratslavsky, Catrin Finkenauer, and Kathleen D Vohs. 2001. Bad is stronger than good.Review of general psychology5, 4 (2001), 323–370

  6. [6]

    William J Brady, Ana P Gantman, and Jay J Van Bavel. 2020. Attentional capture helps explain why moral and emotional content go viral.Journal of Experimental Psychology: General149, 4 (2020), 746

  7. [7]

    William J Brady, Joshua Conrad Jackson, Björn Lindström, and MJ Crockett. 2023. Algorithm-mediated social learning in online social networks. Trends in Cognitive Sciences27, 10 (2023), 947–960

  8. [8]

    William J Brady, Julian A Wills, John T Jost, Joshua A Tucker, and Jay J Van Bavel. 2017. Emotion shapes the diffusion of moralized content in social networks.Proceedings of the National Academy of Sciences114, 28 (2017), 7313–7318

  9. [9]

    Keith Burghardt, Emanuel F Alsina, Michelle Girvan, William Rand, and Kristina Lerman. 2017. The myopia of crowds: Cognitive load and collective evaluation of answers on Stack Exchange.PloS one12, 3 (2017), e0173610

  10. [10]

    Keith Burghardt, Tad Hogg, Raissa D’Souza, Kristina Lerman, and Marton Posfai. 2020. Origins of algorithmic instabilities in crowdsourced ranking. Proceedings of the ACM on Human-Computer Interaction4, CSCW2 (2020), 1–20

  11. [11]

    Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan. 2021. Estimating and penalizing preference shift in recommender systems. InProceedings of the 15th ACM Conference on Recommender Systems. 661–667. 10 Lerman

  12. [12]

    Wen Chen, Diogo Pacheco, Kai-Cheng Yang, and Filippo Menczer. 2021. Neutral bots probe political bias on social media.Nature communications 12, 1 (2021), 5580

  13. [13]

    Sophia Choukas-Bradley, Savannah R Roberts, Anne J Maheux, and Jacqueline Nesi. 2022. The perfect storm: A developmental–sociocultural framework for the role of social media in adolescent girls’ body image concerns and mental health.Clinical child and family psychology review25, 4 (2022), 681–701

  14. [14]

    Matteo Cinelli, Gianmarco De Francisci Morales, Alessandro Galeazzi, Walter Quattrociocchi, and Michele Starnini. 2021. The echo chamber effect on social media.Proceedings of the national academy of sciences118, 9 (2021), e2023301118

  15. [15]

    Daniel MT Fessler, Anne C Pisor, and Carlos David Navarrete. 2014. Negatively-biased credulity and the cultural evolution of beliefs.PloS one9, 4 (2014), e95167

  16. [16]

    Scott Griffiths, Emily A Harris, Grace Whitehead, Felicity Angelopoulos, Ben Stone, Wesley Grey, and Simon Dennis. 2024. Does TikTok contribute to eating disorders? A comparison of the TikTok algorithms belonging to individuals with eating disorders versus healthy controls.Body image51 (2024), 101807

  17. [17]

    Andrew M Guess, Brendan Nyhan, and Jason Reifler. 2020. Exposure to untrustworthy websites in the 2016 US election.Nature human behaviour4, 5 (2020), 472–480

  18. [18]

    2024.The Anxious Generation: How the Great Rewiring of Childhood Is Causing an Epidemic of Mental Illness

    Jonathan Haidt. 2024.The Anxious Generation: How the Great Rewiring of Childhood Is Causing an Epidemic of Mental Illness. Penguin Press, New York

  19. [19]

    Joseph Henrich and Francisco J Gil-White. 2001. The evolution of prestige: Freely conferred deference as a mechanism for enhancing the benefits of cultural transmission.Evolution and human behavior22, 3 (2001), 165–196

  20. [20]

    Derek E Holliday, Yphtach Lelkes, and Sean J Westwood. 2025. Why depolarization is hard: Evaluating attempts to decrease partisan animosity in America.Proceedings of the National Academy of Sciences122, 39 (2025), e2508827122

  21. [21]

    Ferenc Huszár, Sofia Ira Ktena, Conor O’Brien, Luca Belli, Andrew Schlaikjer, and Moritz Hardt. 2022. Algorithmic amplification of politics on Twitter.Proceedings of the national academy of sciences119, 1 (2022), e2025334119

  22. [22]

    Shanto Iyengar and Sean J Westwood. 2015. Fear and loathing across party lines: New evidence on group polarization.American journal of political science59, 3 (2015), 690–707

  23. [23]

    Eaman Jahani, Blas Kolic, Manuel Tonneau, Hause Lin, Daniel Barkoczi, Edwin Ikhuoria, Victor Orozco, and Samuel Fraiberger. 2026. Celebrity messages reduce online hate and limit its spread.arXiv preprint arXiv:2601.04134(2026)

  24. [24]

    2011.Thinking, Fast and Slow

    Daniel Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux

  25. [25]

    Poruz Khambatta, Shwetha Mariadassou, Joshua Morris, and S Christian Wheeler. 2023. Tailoring recommendation algorithms to ideal preferences makes users better off.Scientific Reports13, 1 (2023), 9325

  26. [26]

    Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Manish Raghavan. 2024. The inversion problem: Why algorithms should infer mental state and not just predict behavior.Perspectives on Psychological Science19, 5 (2024), 827–838

  27. [27]

    K. Lerman. 2016. Information Is Not a Virus, and Other Consequences of Human Cognitive Limits.Future Internet8, 2 (2016), 21

  28. [28]

    Kristina Lerman and Tad Hogg. 2014. Leveraging position bias to improve peer recommendation.PloS one9, 6 (2014), e98914

  29. [29]

    Stephan Lewandowsky, Ullrich KH Ecker, Colleen M Seifert, Norbert Schwarz, and John Cook. 2012. Misinformation and its correction: Continued influence and successful debiasing.Psychological Science in the Public Interest13, 3 (2012), 106–131

  30. [30]

    Norman P Li, Mark Van Vugt, and Stephen M Colarelli. 2018. The evolutionary mismatch hypothesis: Implications for psychological science. Current Directions in Psychological Science27, 1 (2018), 38–44

  31. [31]

    Amy J Lim and Edison Tan. 2024. Social media ills and evolutionary mismatches: A conceptual framework.Evolutionary Psychological Science10, 3 (2024), 212–235

  32. [32]

    Yang-Yu Liu, Jean-Jacques Slotine, and Albert-László Barabási. 2011. Controllability of complex networks.nature473, 7346 (2011), 167–173

  33. [33]

    R. K. Merton. 1968. The Matthew Effect in Science.Science159, 3810 (1968), 56–63

  34. [34]

    Smitha Milli, Luca Belli, and Moritz Hardt. 2021. From optimizing engagement to measuring value. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency. 714–722

  35. [35]

    Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, and Anca D Dragan. 2025. Engagement, user satisfaction, and the amplification of divisive content on social media.PNAS nexus4, 3 (2025), pgaf062

  36. [36]

    Buddhika Nettasinghe, Nazanin Alipourfard, Vikram Krishnamurthy, and Kristina Lerman. 2026. Emergence of Structural Disparities in theWeb of Scientific Citations. InProceedings of TheWebConference (WWW)

  37. [37]

    Buddhika Nettasinghe, Allon G Percus, and Kristina Lerman. 2025. How out-group animosity can shape partisan divisions: A model of affective polarization.PNAS nexus4, 3 (2025), pgaf082

  38. [38]

    Buddhika Nettasinghe, Ashwin Rao, Bohan Jiang, Allon G Percus, and Kristina Lerman. 2025. In-Group Love, Out-Group Hate: A Framework to Measure Affective Polarization via Contentious Online Discussions. InProceedings of the ACM on Web Conference 2025. 560–575

  39. [39]

    Dimitar Nikolov, Diego FM Oliveira, Alessandro Flammini, and Filippo Menczer. 2015. Measuring online social bubbles.PeerJ computer science1 (2015), e38

  40. [40]

    Brendan Nyhan and Jason Reifler. 2010. When corrections fail: The persistence of political misperceptions.Political Behavior32, 2 (2010), 303–330

  41. [41]

    Thomas Olesen. 2025. Big tech whistleblowing: Frances Haugen and the Facebook files.Organization(2025), 13505084251321785. Functional Misalignment in Human–AI Interactions on Digital Platforms 11

  42. [42]

    Amy Orben, Adrian Meier, Tim Dalgleish, and Sarah-Jayne Blakemore. 2024. Mechanisms linking social media use to adolescent mental health vulnerability.Nature Reviews Psychology3, 6 (2024), 407–423

  43. [43]

    Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina, Ricardo Baeza-Yates, Albert-László Barabási, Frank Dignum, Virginia Dignum, Tina Eliassi-Rad, Fosca Giannotti, János Kertész, et al. 2025. Human-AI coevolution.Artificial Intelligence339 (2025), 104244

  44. [44]

    Gordon Pennycook and David G Rand. 2019. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences116, 7 (2019), 2521–2526

  45. [45]

    Lakshmi Radhakrishnan. 2022. Pediatric emergency department visits associated with mental health conditions before and during the COVID-19 pandemic—United States, January 2019–January 2022.MMWR. Morbidity and mortality weekly report71 (2022)

  46. [46]

    Amanda Remsö, Emma A Renström, and Hanna Bäck. 2025. Climate change threats and affective polarization. Exploring the role of negative emotional reactions.Analyses of Social Issues and Public Policy25, 2 (2025), e70020

  47. [47]

    Vicky Rideout. 2021. The common sense census: media use by tweens and teens in America, a common sense media research study, United States,

  48. [48]

    Matthew J Salganik, Peter Sheridan Dodds, and Duncan J Watts. 2006. Experimental study of inequality and unpredictability in an artificial cultural market.science311, 5762 (2006), 854–856

  49. [49]

    Donghee Shin. 2024. Misinformation, extremism, and conspiracies: Amplification and polarization by algorithms. InArtificial misinformation: exploring human-algorithm interaction online. Springer, 49–78

  50. [50]

    Ben Smith. 2021. How TikTok Reads Your Mind.The New York Times(5 Dec. 2021). https://www.nytimes.com/2021/12/05/business/media/tiktok- algorithm.html Accessed: 2026-03-31

  51. [51]

    Jonathan Stray, Ian Baker, George Beknazar-Yuzbashev, Ceren Budak, Julia Kamin, Kylan Rutherford, Mateusz Stalinski, Tin Acosta, Chris Bail, Michael Bernstein, et al. 2026. The Prosocial Ranking Challenge: Reducing Polarization on Social Media without Sacrificing Engagement.arXiv preprint arXiv:2603.19626(2026)

  52. [52]

    2005.The wisdom of crowds

    James Surowiecki. 2005.The wisdom of crowds. Vintage

  53. [53]

    Samantha Teague, Klaire Somoray, Adrian Shatte, Daniel Miller, Kristian Moss, Andrew Crawford, Harrison Wildman, Diana Kayal, and Delyse Hutchinson. 2026. Digital Media Use and Child Health and Development: A Systematic Review and Meta-Analysis.JAMA pediatrics(2026)

  54. [54]

    Petter Törnberg. 2022. How digital media drive affective polarization through partisan sorting.Proceedings of the National Academy of Sciences119, 42 (2022), e2207159119

  55. [55]

    Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics and Biases: Biases in judgments reveal some heuristics of thinking under uncertainty.science185, 4157 (1974), 1124–1131

  56. [56]

    Jean M. Twenge. 2017.iGen: Why Today’s Super-Connected Kids Are Growing Up Less Rebellious, More Tolerant, Less Happy—and Completely Unprepared for Adulthood. Atria Books, New York

  57. [57]

    Albert Wendsjö, Hanna Bäck, and Andrej Kokkonen. 2025. A Male Hostility Spiral? Polarized Communication among Political Elites on Social Media. (2025)