Algorithmic Cultivation: How Social Media Feeds Shape User Language
Pith reviewed 2026-05-19 18:34 UTC · model grok-4.3
The pith
Users exposed to algorithmic social media feeds adapt their language more than similar unexposed users.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Algorithmic feeds act as persistent linguistic environments that drive measurable changes in user writing. Users exposed to the feeds exhibit significantly greater stylistic accommodation, semantic alignment, and register formalization than matched controls. These shifts vary by feed identity, with Blacksky producing the broadest psycholinguistic restructuring in cognitive processing, affective expression, and pronoun use, while News and Science effects center on register and topical focus. Reposting emerges as the most consistent predictor of convergence across all feeds.
What carries the argument
Quasi-experimental matching of feed-exposed users to controls combined with longitudinal measurement of changes in lexico-semantic, psycholinguistic, and topical language features.
If this is right
- Reposting is the most consistent predictor of linguistic convergence regardless of which feed users engage with.
- Blacksky exposure produces deeper changes in cognitive processing, affect, and pronoun patterns than News or Science exposure.
- Linguistic effects accumulate through sustained feed engagement and can be isolated from other platform activities via matching.
- Cultivation theory applies to language behavior, showing feeds shape not only information access but communicative output.
Where Pith is reading between the lines
- If feeds cultivate language, users may gradually adopt the framing and emotional tone of the feed when discussing events outside the platform.
- Similar language alignment patterns could appear in recommendation systems or other algorithmic content environments beyond social feeds.
- Design choices in feed algorithms might be tuned to encourage specific registers or reduce certain types of linguistic polarization.
Load-bearing premise
The matching of exposed users to controls successfully balances all pre-existing differences and other platform behaviors so that language changes can be attributed to feed exposure.
What would settle it
Re-running the matching analysis after adding controls for users' prior network structure or baseline posting frequency eliminates the observed differences in linguistic metrics between groups.
Figures
read the original abstract
Algorithmic feeds have become primary environments for encountering information online, yet while they shape what people see, less is known about how sustained feed exposure shapes how people write. Drawing on Cultivation Theory, we examine whether algorithmic feeds function as online environments that leave measurable traces in users' language. We leverage a large-scale longitudinal dataset of 235M posts by 4M users on Bluesky, and conduct a quasi-experimental study matching an initial pool of 368,513 users exposed to one of three feeds -- News, Science, and Blacksky -- with a pool of 2,001,915 active control users who did not engage with any of these feeds. We examine linguistic evolution across three dimensions: lexico-semantics, psycholinguistics, and topics. We find that users exposed to these feeds show significantly greater stylistic accommodation, semantic alignment, and register formalization than matched controls. These effects vary markedly by feed identity -- Blacksky produces the deepest psycholinguistic restructuring, with significant shifts in cognitive processing, affective expression, and pronoun use, while News and Science effects are largely confined to register and topical focus. Regression models reveal that reposting is the most consistent predictor of linguistic convergence across all feeds, whereas posting and bookmarking show feed-dependent effects, with effects differing more than fourfold across feeds. Our work extends Cultivation Theory beyond belief formation to linguistic behavior, demonstrating that feeds function as persistent linguistic environments that gradually shape what and how users write online. Our work has implications for studying algorithmic influence, online identity formation, and the design and governance of feed-based platforms that mediate online interactions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that algorithmic feeds on Bluesky act as linguistic environments that cultivate changes in users' writing, based on a quasi-experimental matching of 368,513 users exposed to News, Science, or Blacksky feeds against 2,001,915 controls. Exposed users exhibit greater stylistic accommodation, semantic alignment, and register formalization than controls, with Blacksky producing the largest psycholinguistic shifts; reposting is the strongest predictor of convergence across feeds. The work extends Cultivation Theory to linguistic behavior using a longitudinal dataset of 235M posts.
Significance. If the identification strategy holds, the result would be significant for showing that sustained algorithmic feed exposure can measurably reshape not only information consumption but users' own language production across lexico-semantic, psycholinguistic, and topical dimensions. The scale of the Bluesky dataset and the feed-specific heterogeneity provide concrete evidence that could inform studies of online identity formation and platform governance.
major comments (3)
- [Methods (Quasi-experimental design)] Methods section on quasi-experimental design: the matching of 368,513 exposed users to 2,001,915 controls is described only in terms of activity levels, with no reported balance statistics on pre-exposure linguistic traits, topic interests, or cognitive-style proxies. This directly undermines the central causal claim that observed differences can be attributed to feed exposure rather than selection.
- [Results] Results section: statistically significant differences in stylistic accommodation, semantic alignment, and register formalization are reported without effect sizes, standardized coefficients, or robustness checks such as alternative matching specifications or pre-trend tests. This leaves the practical magnitude and reliability of the feed-specific effects (especially Blacksky's deeper restructuring) unevaluated.
- [Regression analysis] Regression models: the claim that reposting is the most consistent predictor and that effects differ more than fourfold across feeds rests on unspecified model specifications, without user fixed effects, multicollinearity diagnostics, or tests for reverse causality. These details are load-bearing for interpreting engagement-type coefficients as evidence of cultivation.
minor comments (2)
- [Abstract] Abstract: the time window of the longitudinal data and per-feed sample sizes after matching are not stated, which would help readers assess the scope of the observed changes.
- [Introduction / Methods] Notation: the three linguistic dimensions (lexico-semantics, psycholinguistics, topics) are introduced without explicit operational definitions or example measures in the main text, making it harder to connect specific findings to the dimensions.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our paper. We address each of the major comments below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: Methods section on quasi-experimental design: the matching of 368,513 exposed users to 2,001,915 controls is described only in terms of activity levels, with no reported balance statistics on pre-exposure linguistic traits, topic interests, or cognitive-style proxies. This directly undermines the central causal claim that observed differences can be attributed to feed exposure rather than selection.
Authors: We acknowledge the importance of demonstrating balance on pre-exposure characteristics to support the causal interpretation. Our current matching focuses on activity levels to ensure similar platform engagement prior to feed exposure. In the revised version, we will report balance statistics for pre-exposure linguistic traits, topic interests, and any available cognitive-style proxies. If additional matching variables are feasible, we will incorporate them and update the methods section accordingly. revision: yes
-
Referee: Results section: statistically significant differences in stylistic accommodation, semantic alignment, and register formalization are reported without effect sizes, standardized coefficients, or robustness checks such as alternative matching specifications or pre-trend tests. This leaves the practical magnitude and reliability of the feed-specific effects (especially Blacksky's deeper restructuring) unevaluated.
Authors: We agree that providing effect sizes and additional robustness checks will better convey the magnitude and reliability of our findings. We will add standardized coefficients, effect sizes, and perform robustness analyses including alternative matching specifications and pre-trend tests leveraging the longitudinal data structure. These will be included in the revised results section. revision: yes
-
Referee: Regression models: the claim that reposting is the most consistent predictor and that effects differ more than fourfold across feeds rests on unspecified model specifications, without user fixed effects, multicollinearity diagnostics, or tests for reverse causality. These details are load-bearing for interpreting engagement-type coefficients as evidence of cultivation.
Authors: We will provide full details on the regression model specifications in the revision, including any user fixed effects, multicollinearity diagnostics such as variance inflation factors, and discussions or tests addressing potential reverse causality. This will clarify how the coefficients support the cultivation interpretation and address concerns about model robustness. revision: yes
Circularity Check
No circularity: empirical quasi-experimental design relies on external data and standard methods
full rationale
The paper conducts a quasi-experimental study by matching 368k exposed users to 2M controls on Bluesky activity data and then applies regression to measure linguistic shifts across lexico-semantics, psycholinguistics, and topics. No equations, predictions, or first-principles derivations are presented that reduce by construction to fitted parameters or self-citations; the central claims rest on observable differences in longitudinal post data rather than any self-referential loop. The design is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
free parameters (1)
- engagement-type regression coefficients
axioms (1)
- domain assumption Quasi-experimental matching on observables controls for self-selection and confounding factors
Reference graph
Works this paper leans on
-
[1]
Aleksic, A. 2025. Algospeak: How social media is transforming the future of language. Knopf
work page 2025
-
[2]
Angrist, J. D.; and Pischke, J.-S. 2009. Mostly harmless econometrics: An empiricist's companion. Princeton university press
work page 2009
-
[3]
Bakshy, E.; Messing, S.; and Adamic, L. A. 2015. Exposure to ideologically diverse news and opinion on Facebook. Science
work page 2015
- [4]
-
[5]
L.; Ashokkumar, A.; Seraj, S.; and Pennebaker, J
Boyd, R. L.; Ashokkumar, A.; Seraj, S.; and Pennebaker, J. W. 2022. The development and psychometric properties of LIWC-22
work page 2022
- [6]
-
[7]
Chan, J.; Lambert, C.; Choi, F.; Chancellor, S.; and Chandrasekharan, E. 2024. Understanding Community Resilience: Quantifying the Effects of Sudden Popularity via Algorithmic Curation. In ICWSM
work page 2024
-
[8]
Chandrasekharan, E.; Jhaver, S.; Bruckman, A.; and Gilbert, E. 2022. Quarantined! Examining the effects of a community-wide moderation intervention on Reddit. ACM TOCHI
work page 2022
-
[9]
Chandrasekharan, E.; Pavalanathan, U.; Srinivasan, A.; Glynn, A.; Eisenstein, J.; and Gilbert, E. 2017. You can't stay here: The efficacy of reddit's 2015 ban examined through hate speech. PACM HCI, (CSCW)
work page 2017
-
[10]
Chandrasekharan, E.; Samory, M.; Jhaver, S.; Charvat, H.; Bruckman, A.; Lampe, C.; Eisenstein, J.; and Gilbert, E. 2018. The Internet's hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales. PACM HCI, (CSCW)
work page 2018
- [11]
-
[12]
Chowdhury, F. A.; Saha, D.; Hasan, M. R.; Saha, K.; and Mueen, A. 2021. Examining factors associated with twitter account suspension following the 2020 us presidential election. In ASONAM
work page 2021
-
[13]
Cohn, M. A.; Mehl, M. R.; and Pennebaker, J. W. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological science, 15(10): 687--693
work page 2004
-
[14]
Danescu-Niculescu-Mizil, C.; Gamon, M.; and Dumais, S. 2011. Mark my words!: linguistic style accommodation in social media. In WWW
work page 2011
-
[15]
Danescu-Niculescu-Mizil, C.; Sudhof, M.; Jurafsky, D.; Leskovec, J.; and Potts, C. 2013 a . A computational approach to politeness with application to social factors. In ACL
work page 2013
-
[16]
Danescu-Niculescu-Mizil, C.; West, R.; Jurafsky, D.; Leskovec, J.; and Potts, C. 2013 b . No country for old members: User lifecycle and linguistic change in online communities. In WWW
work page 2013
-
[17]
De Choudhury, M.; and K c man, E. 2017. The language of social support in social media and its effect on suicidal ideation risk. In ICWSM
work page 2017
-
[18]
Egger, R.; and Yu, J. 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology
work page 2022
-
[19]
El Malki, O.; Aubin Le Qu \'e r \'e , M.; Monroy-Hern \'a ndez, A.; and Horta Ribeiro, M. 2026. Bonsai: Intentional and personalized social media feeds. In Proc. CHI
work page 2026
-
[20]
Ernala, S. K.; Rizvi, A. F.; Birnbaum, M. L.; Kane, J. M.; and De Choudhury, M. 2017. Linguistic markers indicating therapeutic outcomes of social media disclosures of schizophrenia. PACM HCI, (CSCW)
work page 2017
-
[21]
Failla, A.; and Rossetti, G. 2024. “I’m in the bluesky tonight”: insights from a year worth of social data. PloS one
work page 2024
-
[22]
Gerbner, G.; Gross, L.; Morgan, M.; and Signorielli, N. 1980. The “mainstreaming” of America: Violence profile number 11. Journal of communication, 30(3): 10--29
work page 1980
-
[23]
Gerbner, G.; et al. 1978. Cultural indicators: Violence profile no. 9. Journal of communication, 28(3): 176--207
work page 1978
-
[24]
Giles, H.; Coupland, J.; and Coupland, N. 1991. Contexts of accommodation: Developments in applied sociolinguistics
work page 1991
-
[25]
Goel, R.; Soni, S.; Goyal, N.; Paparrizos, J.; Wallach, H.; Diaz, F.; and Eisenstein, J. 2016. The social dynamics of language change in online networks. In SocInfo
work page 2016
-
[26]
Goffman, E.; et al. 1959. The presentation of self in everyday life
work page 1959
- [27]
-
[28]
Grimmelmann, J. 2015. The law and ethics of experiments on social media users. Colo. Tech. LJ, 13: 219
work page 2015
-
[29]
Grootendorst, M. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[30]
Hermann, E.; Morgan, M.; and Shanahan, J. 2023. Cultivation and social media: A meta-analysis. New Media & Society, 25(9)
work page 2023
-
[31]
Imbens, G. W.; and Rubin, D. B. 2015. Causal inference in statistics, social, and biomedical sciences. Cambridge university press
work page 2015
-
[32]
Kiciman, E.; Counts, S.; and Gasser, M. 2018. Using longitudinal social media analysis to understand the effects of early college alcohol use. In ICWSM
work page 2018
-
[33]
Kleppmann, M.; Frazee, P.; Gold, J.; Graber, J.; Holmgren, D.; Ivy, D.; Johnson, J.; Newbold, B.; and Volpert, J. 2024. Bluesky and the at protocol: Usable decentralized social media. In ACM Conext-2024
work page 2024
-
[34]
Kolden, G. G.; Klein, M. H.; Wang, C.-C.; and Austin, S. B. 2011. Congruence/genuineness. Psychotherapy, 48(1): 65
work page 2011
-
[35]
Lambert, C.; Saha, K.; and Chandrasekharan, E. 2025. Does Positive Reinforcement Work?: A Quasi-Experimental Study of the Effects of Positive Feedback on Reddit. In Proc. CHI
work page 2025
-
[36]
F.; Berelson, B.; and Gaudet, H
Lazarsfeld, P. F.; Berelson, B.; and Gaudet, H. 1968. The people’s choice: How the voter makes up his mind in a presidential campaign. Columbia University Press
work page 1968
-
[37]
Malki, O. E.; Qu \'e r \'e , M. A. L.; Monroy-Hern \'a ndez, A.; and Ribeiro, M. H. 2025. Bonsai: Intentional and personalized social media feeds. arXiv preprint arXiv:2509.10776
-
[38]
McCombs, M. E.; and Shaw, D. L. 1972. The agenda-setting function of mass media. Public opinion quarterly
work page 1972
-
[39]
Metzler, H.; and Garcia, D. 2024. Social drivers and algorithmic mechanisms on digital media. Perspectives on Psychological Science
work page 2024
-
[40]
Moreno, M. A.; Goniu, N.; Moreno, P. S.; and Diekema, D. 2013. Ethics of social media research: Common concerns and practical considerations. Cyberpsychology, behavior, and social networking
work page 2013
- [41]
-
[42]
Pariser, E. 2011. The filter bubble: What the Internet is hiding from you. Penguin UK
work page 2011
-
[43]
Park, A.; Conway, M.; et al. 2018. Harnessing reddit to understand the written-communication challenges experienced by individuals with mental health disorders: analysis of texts from mental health communities. Journal of medical Internet research, 20(4): e8219
work page 2018
-
[44]
Parshall, A. 2025. The Internet Is Making Us Fluent in Algospeak. Scientific American
work page 2025
-
[45]
Pennebaker, J. W.; Chung, C. K.; Frazee, J.; Lavergne, G. M.; and Beaver, D. I. 2014. When small words foretell academic success: The case of college admissions essays. PloS one
work page 2014
-
[46]
Pennebaker, J. W.; Mehl, M. R.; and Niederhoffer, K. G. 2003. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology
work page 2003
-
[47]
Reimers, N.; and Gurevych, I. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP-IJCNLP
work page 2019
-
[48]
Rosenbaum, P. R.; and Rubin, D. B. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1): 41--55
work page 1983
-
[49]
Rubin, D. B. 2005. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American statistical Association
work page 2005
-
[50]
Saha, K.; Chandrasekharan, E.; and De Choudhury, M. 2019. Prevalence and psychological effects of hateful speech in online college communities. In ACM WebSci
work page 2019
-
[51]
Saha, K.; Jain, Y.; Liu, C.; Kaliappan, S.; and Karkar, R. 2025. Ai vs. humans for online support: Comparing the language of responses from llms and online communities of alzheimer’s disease. ACM HEALTH
work page 2025
-
[52]
A.; Neves, L.; Shah, N.; and Bos, M
Saha, K.; Liu, Y.; Vincent, N.; Chowdhury, F. A.; Neves, L.; Shah, N.; and Bos, M. W. 2021. Advertiming matters: Examining user ad consumption for effective ad allocations on social media. In CHI
work page 2021
-
[53]
Saha, K.; Sugar, B.; Torous, J.; Abrahao, B.; K c man, E.; and De Choudhury, M. 2019. A Social Media Study on the Effects of Psychiatric Medication Use. In ICWSM
work page 2019
-
[54]
Saha, K.; Weber, I.; and De Choudhury, M. 2018. A Social Media Based Examination of the Effects of Counseling Recommendations After Student Deaths on College Campuses. In ICWSM
work page 2018
-
[55]
Sahneh, E. S.; Nogara, G.; DeVerna, M. R.; Liu, N.; Luceri, L.; Menczer, F.; Pierri, F.; and Giordano, S. 2024. The dawn of decentralized social media: an exploration of bluesky’s public opening. In ASONAM
work page 2024
-
[56]
Schlessinger, J.; Garimella, K.; Jakesch, M.; and Eckles, D. 2023. Effects of Algorithmic Trend Promotion: Evidence from Coordinated Campaigns in Twitter's Trending Topics. In ICWSM
work page 2023
-
[57]
Sharma, E.; and De Choudhury, M. 2018. Mental health support and its relationship to linguistic accommodation in online communities. In CHI
work page 2018
-
[58]
Stewart, I.; Chancellor, S.; De Choudhury, M.; and Eisenstein, J. 2017. \# anorexia,\# anarexia,\# anarexyia: Characterizing online community practices with orthographic variation. In IEEE BigData
work page 2017
-
[59]
Tausczik, Y. R.; and Pennebaker, J. W. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. JLS
work page 2010
-
[60]
Thorson, K.; and Wells, C. 2016. Curated flows: A framework for mapping media exposure in the digital age. Communication theory
work page 2016
-
[61]
Yuan, Y.; Saha, K.; Keller, B.; Isomets \"a , E. T.; and Aledavood, T. 2023. Mental health coping stories on social media: a causal-inference study of Papageno effect. In ACM WebConf, 2677--2685
work page 2023
-
[62]
Yuan, Y.; Zhang, J.; Aledavood, T.; Zhang, R.; and Saha, K. 2026. Mental Health Impacts of AI Companions: Triangulating Social Media Quasi-Experiments, User Perspectives, and Relational Lens. In Proc. CHI
work page 2026
-
[63]
Zhu, J.; Zou, H.; Rosset, S.; Hastie, T.; et al. 2009. Multi-class adaboost. Statistics and its Interface, 2(3): 349--360
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.