pith. sign in

arxiv: 2604.22143 · v1 · submitted 2026-04-24 · 💻 cs.CY · cs.CL

Recognition Without Authorization: LLMs and the Moral Order of Online Advice

Pith reviewed 2026-05-08 09:52 UTC · model grok-4.3

classification 💻 cs.CY cs.CL
keywords LLMsonline advicerelationship adviceRedditmoral orderssafety alignmentdirective advicecommunity consensus
0
0 comments X

The pith

LLMs spot the same relationship problems as human users but give far less directive advice on what to do.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four large language models against human comments on more than eleven thousand posts from the subreddit r/relationship_advice. Models detect many of the same relational dynamics and harms that the community flags, yet they turn that detection into clear recommendations for action much less often. The difference shows up most clearly on posts where the community agrees strongly, such as those involving abuse or safety threats; here models suggest leaving the relationship at roughly half the rate humans do. Instead they lean on hedging, emotional validation, and therapeutic framing. The author calls this pattern recognition without authorization and traces it to the structural features of how assistant-style models are built.

Core claim

Across models, LLMs identify many of the same dynamics as human commenters, but are markedly less likely to convert that recognition into directive authorization for action. The gap is sharpest where community consensus is strongest: on high-consensus posts involving abuse or safety threats, models recommend exit at roughly half the human rate while maintaining elevated levels of hedging, validation, and therapeutic framing.

What carries the argument

Recognition without authorization: the capacity to register harm or problematic dynamics while withholding socially ratified permission for consequential action.

Load-bearing premise

Divergence from subreddit consensus can be attributed primarily to structural features of LLMs such as safety alignment and training-data averaging rather than to study-specific factors like prompt phrasing or how posts were selected and coded.

What would settle it

Re-running the comparison on a new sample of high-consensus abuse posts from the same subreddit, using varied prompt wordings and additional models, and checking whether the roughly 50 percent lower exit recommendation rate remains stable.

Figures

Figures reproduced from arXiv: 2604.22143 by Tom van Nuenen.

Figure 1
Figure 1. Figure 1: Therapeutic language density across eight topics. Reddit (r/relationship_advice) de view at source ↗
Figure 2
Figure 2. Figure 2: Reddit–LLM divergence as a function of thread consensus. LLMs diverge most view at source ↗
Figure 3
Figure 3. Figure 3: Top-voted r/relationship_advice comments (“Reddit”) and four LLMs compared on view at source ↗
read the original abstract

Large language models are increasingly used to mediate everyday interpersonal dilemmas, yet how their advisory defaults interact with the concentrated moral orders of specific communities remains poorly understood. This article compares four assistant-style LLMs with community-endorsed advice on 11,565 posts from r/relationship_advice, using the subreddit as a concentrated, vote-ratified moral formation whose prescriptive clarity makes divergence measurable. Across models, LLMs identify many of the same dynamics as human commenters, but are markedly less likely to convert that recognition into directive authorization for action. The gap is sharpest where community consensus is strongest: on high-consensus posts involving abuse or safety threats, models recommend exit at roughly half the human rate while maintaining elevated levels of hedging, validation, and therapeutic framing. The article describes this pattern as recognition without authorization: the capacity to register harm while withholding socially ratified permission for consequential action. This divergence is not incidental but structural: a portable advisory style that remains validating, risk-averse, and weakly directive across contexts. Safety alignment is one plausible contributor to this pattern, alongside training-data averaging and broader assistant design. The article argues that model divergence can be reframed from a technical error to a way of seeing what standardized assistant norms flatten when they encounter situated moral worlds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper empirically compares four assistant-style LLMs against 11,565 human comments from r/relationship_advice, arguing that models recognize similar interpersonal dynamics (e.g., abuse or safety threats) but convert that recognition into directive authorization for action at markedly lower rates than the subreddit community, especially on high-consensus posts. This pattern is labeled 'recognition without authorization' and attributed primarily to structural LLM features such as safety alignment, training-data averaging, and assistant design rather than prompt or coding artifacts. The work reframes the divergence as evidence of how standardized AI norms flatten situated moral orders.

Significance. If the measurement holds, the result identifies a systematic flattening effect in LLM advice that is most pronounced precisely where community consensus is strongest, with potential implications for AI deployment in interpersonal domains. The scale of the comparison (over 11k posts) and the focus on vote-ratified subreddit norms provide a concrete testbed for studying advisory divergence; the paper also supplies a portable conceptual label that could be applied to other contexts.

major comments (3)
  1. [Methods] Methods (prompting and model invocation): the manuscript provides no details on the exact prompts, temperature settings, system instructions, or few-shot examples used to elicit advice from the four LLMs. Without this, it is impossible to rule out that elevated hedging and reduced directive language are artifacts of prompt phrasing rather than intrinsic model properties, directly undermining the claim that the gap is 'structural.'
  2. [Methods / Results] Coding of 'directive authorization' and 'high-consensus' posts: the operationalization of these central variables is not described, nor is inter-rater reliability or robustness to alternative coding schemes reported. The skeptic concern is therefore load-bearing: if annotation criteria for 'authorization' (vs. hedging/validation) were applied inconsistently or if 'high-consensus' was defined post-hoc, the reported halving of exit recommendations on abuse posts could be measurement-induced rather than evidence of LLM flattening.
  3. [Discussion] Attribution section: the argument that divergence arises primarily from safety alignment and training averaging (rather than study-specific factors) rests on the untested assumption that the experimental frame is neutral. A concrete test—e.g., ablation with deliberately directive prompts or comparison to non-aligned base models—is absent, leaving the causal reframing from 'technical error' to 'flattened moral worlds' unsupported by the data presented.
minor comments (2)
  1. [Abstract / Results] The abstract states 'models recommend exit at roughly half the human rate' but the main text should report the exact percentages, confidence intervals, and the precise definition of the 'exit' category used in both human and LLM annotations.
  2. [Results] Table or figure presenting the per-model breakdown by consensus level would improve readability; currently the comparison is summarized at a high level without showing variance across the four LLMs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which identifies key areas where the manuscript can be strengthened through greater methodological transparency and more cautious interpretive framing. We address each major comment below and specify the revisions we will undertake.

read point-by-point responses
  1. Referee: [Methods] Methods (prompting and model invocation): the manuscript provides no details on the exact prompts, temperature settings, system instructions, or few-shot examples used to elicit advice from the four LLMs. Without this, it is impossible to rule out that elevated hedging and reduced directive language are artifacts of prompt phrasing rather than intrinsic model properties, directly undermining the claim that the gap is 'structural.'

    Authors: We agree that the original manuscript omitted critical details on prompting and model parameters, which limits reproducibility and leaves open the possibility of prompt-induced effects. In the revised version, we will add a dedicated methods subsection that reports the exact system prompts, user templates, temperature (uniformly set to 0.7), top-p settings, and any few-shot or instruction elements used for each of the four models. We will also include a brief robustness discussion noting that the reduced directive pattern was observed consistently across models despite their differing alignment procedures, though we acknowledge that prompt variation remains a factor that future work should test directly. revision: yes

  2. Referee: [Methods / Results] Coding of 'directive authorization' and 'high-consensus' posts: the operationalization of these central variables is not described, nor is inter-rater reliability or robustness to alternative coding schemes reported. The skeptic concern is therefore load-bearing: if annotation criteria for 'authorization' (vs. hedging/validation) were applied inconsistently or if 'high-consensus' was defined post-hoc, the reported halving of exit recommendations on abuse posts could be measurement-induced rather than evidence of LLM flattening.

    Authors: We accept that the manuscript insufficiently documented the coding procedures for directive authorization and high-consensus posts. The revision will provide an explicit operationalization, including decision rules and illustrative examples distinguishing directive exit recommendations from hedging or validation-only responses. We will report inter-rater reliability (Cohen's kappa) from the annotation process and include sensitivity analyses using alternative thresholds for consensus (e.g., top 10% vs. top 20% by vote ratio). These additions will allow readers to evaluate whether the observed gap on abuse-related posts is robust to coding choices. revision: yes

  3. Referee: [Discussion] Attribution section: the argument that divergence arises primarily from safety alignment and training averaging (rather than study-specific factors) rests on the untested assumption that the experimental frame is neutral. A concrete test—e.g., ablation with deliberately directive prompts or comparison to non-aligned base models—is absent, leaving the causal reframing from 'technical error' to 'flattened moral worlds' unsupported by the data presented.

    Authors: The referee correctly identifies that we did not conduct ablations or comparisons to base models, so direct causal evidence for safety alignment as the dominant driver is not present. We will revise the discussion to present safety alignment, training-data averaging, and assistant-style design as plausible and mutually reinforcing contributors rather than claiming primacy for any single factor. The core empirical claim—that LLMs exhibit recognition without authorization relative to the subreddit's vote-ratified norms—will be retained as a descriptive finding supported by the scale of the human-LLM comparison. We will add an explicit limitations paragraph calling for future ablation studies while arguing that the current design already demonstrates a systematic flattening effect worth conceptualizing independently of full causal decomposition. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical comparison of LLM and human advice

full rationale

The paper conducts an external empirical study comparing four LLMs against 11,565 human comments from r/relationship_advice, measuring divergence in recognition of dynamics versus directive authorization. No equations, derivations, or fitted parameters are present; the pattern labeled 'recognition without authorization' is defined directly from the observed output differences rather than by construction from inputs. Claims about structural features (safety alignment, training averaging) are presented as plausible contributors supported by the data contrast, not as self-referential definitions or predictions forced by prior self-citations. The work is self-contained against the subreddit benchmark with no load-bearing reductions to its own assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on treating the subreddit as a reliable proxy for concentrated moral order and on interpreting model outputs as structurally non-directive rather than prompt- or sampling-dependent.

axioms (1)
  • domain assumption r/relationship_advice constitutes a concentrated, vote-ratified moral formation whose prescriptive clarity makes divergence measurable.
    Invoked to justify using subreddit consensus as the human baseline for comparison.
invented entities (1)
  • recognition without authorization no independent evidence
    purpose: Descriptive label for the observed pattern of harm detection without directive permission.
    New term introduced to characterize the LLM-community gap; no independent falsifiable test provided.

pith-pipeline@v0.9.0 · 5518 in / 1371 out tokens · 61620 ms · 2026-05-08T09:52:41.319422+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 3 canonical work pages

  1. [1]

    Arsenault, A. C. and Kreps, S. (2026), ‘Whose voice counts? the role of large language models in public commenting’,Big Data & Society13(1), 20539517261419341

  2. [2]

    M., Gebru, T., McMillan-Major, A

    Bender, E. M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2021), On the dangers of stochastic parrots: Can language models be too big?,in‘Proceedings of the ACM Confer- ence on Fairness, Accountability, and Transparency (FAccT)’, ACM, pp. 610–623

  3. [3]

    Bowker, G. C. and Star, S. L. (1999),Sorting Things Out: Classification and Its Consequences, MIT Press, Cambridge, MA

  4. [4]

    (2025), ‘Cosine capital: Large language models and the embedding of all things’, Big Data & Society12(4)

    Brunila, M. (2025), ‘Cosine capital: Large language models and the embedding of all things’, Big Data & Society12(4)

  5. [5]

    and Jurafsky, D

    Cheng, M., Yu, S., Lee, C., Khadpe, P., Ibrahim, L. and Jurafsky, D. (2025), ‘ELEPHANT: Measuring and understanding social sycophancy in LLMs’. Preprint

  6. [6]

    Emmison, M., Butler, C. W. and Danby, S. (2011), ‘Script proposals: A device for empowering clients in counselling’,Discourse studies13(1), 3–26. 32

  7. [7]

    (2018),Custodians of the Internet: Platforms, Content Moderation, and the Hid- den Decisions That Shape Social Media, Yale University Press, New Haven

    Gillespie, T. (2018),Custodians of the Internet: Platforms, Content Moderation, and the Hid- den Decisions That Shape Social Media, Yale University Press, New Haven

  8. [8]

    and Stewart, B

    Grimmer, J. and Stewart, B. M. (2013), ‘Text as data: The promise and pitfalls of automatic content analysis methods for political texts’,Political Analysis21(3), 267–297

  9. [9]

    Huang, Y .-H. et al. (2024), ChatGPT giving relationship advice – how reliable is it?,in‘Pro- ceedings of the International AAAI Conference on Web and Social Media’

  10. [10]

    (1998),Hedging in Scientific Research Articles, Pragmatics and Beyond New Se- ries, John Benjamins, Amsterdam

    Hyland, K. (1998),Hedging in Scientific Research Articles, Pragmatics and Beyond New Se- ries, John Benjamins, Amsterdam

  11. [11]

    (2005),Metadiscourse: Exploring Interaction in Writing, Continuum, London

    Hyland, K. (2005),Metadiscourse: Exploring Interaction in Writing, Continuum, London

  12. [12]

    and Russo, F

    Iliadis, A. and Russo, F. (2016), ‘Critical data studies: An introduction’,Big Data & Society 3(2), 1–7. URL:https://doi.org/10.1177/2053951716674238

  13. [13]

    (2008),Saving the Modern Soul: Therapy, Emotions, and the Culture of Self-Help, University of California Press, Berkeley

    Illouz, E. (2008),Saving the Modern Soul: Therapy, Emotions, and the Culture of Self-Help, University of California Press, Berkeley

  14. [14]

    and Raghavan, M

    Kleinberg, J. and Raghavan, M. (2021), ‘Algorithmic monoculture and social welfare’,Pro- ceedings of the National Academy of Sciences118(22), e2018340118

  15. [15]

    (2025), ‘An interactional model for politeness evaluation’,Journal of Politeness Re- search21(1), 135–164

    Lu, X. (2025), ‘An interactional model for politeness evaluation’,Journal of Politeness Re- search21(1), 135–164

  16. [16]

    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A. et al. (2022), ‘Training language models to follow instructions with human feedback’,Advances in neural information processing systems35, 27730–27744

  17. [17]

    Palmer, F. R. (2001),Mood and Modality, 2nd edn, Cambridge University Press, Cambridge

  18. [18]

    M., Hoyle, A., Sun, S., Resnik, P

    Pham, C. M., Hoyle, A., Sun, S., Resnik, P. and Iyyer, M. (2024), TopicGPT: A prompt-based topic modeling framework,in‘Proceedings of the 2024 Conference of the North American 33 Chapter of the Association for Computational Linguistics (NAACL)’. URL:https://aclanthology.org/2024.naacl-long.164/

  19. [19]

    (2025), ‘A history of the advice genre on Reddit: Evolutionary paths and sibling rivalries’,First Monday30(2)

    Reagle, J. (2025), ‘A history of the advice genre on Reddit: Evolutionary paths and sibling rivalries’,First Monday30(2). URL:https://doi.org/10.5210/fm.v30i2.13729

  20. [20]

    (1999),Governing the Soul: The Shaping of the Private Self, 2nd edn, Free Associa- tion Books, London

    Rose, N. (1999),Governing the Soul: The Shaping of the Private Self, 2nd edn, Free Associa- tion Books, London

  21. [21]

    Sachdeva, P. S. and van Nuenen, T. (2025), Normative evaluation of large language models with everyday moral dilemmas,in‘Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT)’, ACM. URL:https: // github. com/ dlab-projects/ normative_ evaluation_ llms_ everyday_ dilemmas

  22. [22]

    and Hashimoto, T

    Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P. and Hashimoto, T. (2023), Whose opinions do language models reflect?,in‘Proceedings of the 40th International Conference on Machine Learning (ICML)’, PMLR. URL:https://proceedings.mlr.press/v202/santurkar23a.html

  23. [23]

    (1917), Art as technique,inL

    Shklovsky, V . (1917), Art as technique,inL. T. Lemon and M. J. Reis, eds, ‘Russian Formalist Criticism: Four Essays’, University of Nebraska Press

  24. [24]

    and Peräkylä, A

    Stevanovic, M. and Peräkylä, A. (2012), ‘Deontic authority in interaction: The right to an- nounce, propose, and decide’,Research on Language and Social Interaction45(3), 297–321. URL:https://doi.org/10.1080/08351813.2012.699260

  25. [25]

    and Singh, R

    Vecchione, B. and Singh, R. (2025), ‘Artificial intelligence is mental: Evaluating the role of large-language models in supporting mental health and well-being’,Big Data & Society 12(4), 20539517251383884

  26. [26]

    and Annus, S

    Vicsek, L., Zajko, M., Vancsó, A., Takacs, J. and Annus, S. (2025), ‘Cross-cultural challenges in generative AI: Addressing homophobia in diverse sociocultural contexts’,Big Data & Society12(4). 34 V owels, L. M., Francois-Walcott, R. R. R. and Darwiche, J. (2024), ‘Ai in relationship coun- selling: Evaluating ChatGPT’s therapeutic capabilities in providi...

  27. [27]

    Yun, H. S. and Bickmore, T. (2025), ‘Online health information–seeking in the era of large language models: Cross-sectional web-based survey study’,Journal of Medical Internet Re- search27, e68560. 35