Algorithmic Cultivation: How Social Media Feeds Shape User Language

Agam Goyal; Eshwar Chandrasekharan; Koustuv Saha; Olivia Pal

arxiv: 2605.17010 · v1 · pith:6RXQ3N3Vnew · submitted 2026-05-16 · 💻 cs.SI · cs.AI· cs.CL· cs.CY· cs.HC

Algorithmic Cultivation: How Social Media Feeds Shape User Language

Olivia Pal , Agam Goyal , Eshwar Chandrasekharan , Koustuv Saha This is my paper

Pith reviewed 2026-05-19 18:34 UTC · model grok-4.3

classification 💻 cs.SI cs.AIcs.CLcs.CYcs.HC

keywords algorithmic feedslanguage changecultivation theorysocial mediaBlueskylinguistic accommodationquasi-experimental studyuser language

0 comments

The pith

Users exposed to algorithmic social media feeds adapt their language more than similar unexposed users.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether algorithmic feeds function as cultivation environments that gradually reshape users' language production, extending media effects theory beyond beliefs to writing habits. It matches 368,513 users who engaged with News, Science, or Blacksky feeds against over 2 million controls on Bluesky and tracks evolution across 235 million posts in style, meaning, and formality. Exposed users display greater accommodation and alignment, with the strongest restructuring from the Blacksky feed in cognitive and emotional expression. A reader would care because the claim implies that what algorithms show us also changes how we express ourselves and form online identities over time.

Core claim

Algorithmic feeds act as persistent linguistic environments that drive measurable changes in user writing. Users exposed to the feeds exhibit significantly greater stylistic accommodation, semantic alignment, and register formalization than matched controls. These shifts vary by feed identity, with Blacksky producing the broadest psycholinguistic restructuring in cognitive processing, affective expression, and pronoun use, while News and Science effects center on register and topical focus. Reposting emerges as the most consistent predictor of convergence across all feeds.

What carries the argument

Quasi-experimental matching of feed-exposed users to controls combined with longitudinal measurement of changes in lexico-semantic, psycholinguistic, and topical language features.

If this is right

Reposting is the most consistent predictor of linguistic convergence regardless of which feed users engage with.
Blacksky exposure produces deeper changes in cognitive processing, affect, and pronoun patterns than News or Science exposure.
Linguistic effects accumulate through sustained feed engagement and can be isolated from other platform activities via matching.
Cultivation theory applies to language behavior, showing feeds shape not only information access but communicative output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If feeds cultivate language, users may gradually adopt the framing and emotional tone of the feed when discussing events outside the platform.
Similar language alignment patterns could appear in recommendation systems or other algorithmic content environments beyond social feeds.
Design choices in feed algorithms might be tuned to encourage specific registers or reduce certain types of linguistic polarization.

Load-bearing premise

The matching of exposed users to controls successfully balances all pre-existing differences and other platform behaviors so that language changes can be attributed to feed exposure.

What would settle it

Re-running the matching analysis after adding controls for users' prior network structure or baseline posting frequency eliminates the observed differences in linguistic metrics between groups.

Figures

Figures reproduced from arXiv: 2605.17010 by Agam Goyal, Eshwar Chandrasekharan, Koustuv Saha, Olivia Pal.

**Figure 1.** Figure 1: Treatment/Placebo Dates: Distribution of treatment and placebo dates across feeds. 0.0 0.25 0.5 0.75 1.0 Propensity Score 0 2 4 6 Density (Users) News Control Treated 0.0 0.25 0.5 0.75 1.0 Propensity Score 0 2 4 6 Density (Users) Blacksky Control Treated 0.0 0.25 0.5 0.75 1.0 Propensity Score 0 1 2 3 4 5 Density (Users) Science Control Treated [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Propensity Scores: Distribution of propensity scores across Treated and Control users for each feed. ATE = Y¯ post Tr − Y¯ post Ct (Imbens and Rubin 2015). A positive ATE indicates that the linguistic outcome increased among feed-exposed (Treated) users relative to matched controls, whereas a negative ATE indicates a decrease following feed exposure. To compare across outcomes with different scales, we add… view at source ↗

**Figure 3.** Figure 3: Covariate balance: Distribution of standardized mean differences (SMD) across 520 covariates for unmatched and matched users. The dashed vertical line indicates SMD threshold=0.15. Gurevych 2019), a sentence transformer that maps text into a dense semantic space where cosine similarity reflects similarity in content. For each user, we compute the mean sentence embedding of authored posts in each period a… view at source ↗

read the original abstract

Algorithmic feeds have become primary environments for encountering information online, yet while they shape what people see, less is known about how sustained feed exposure shapes how people write. Drawing on Cultivation Theory, we examine whether algorithmic feeds function as online environments that leave measurable traces in users' language. We leverage a large-scale longitudinal dataset of 235M posts by 4M users on Bluesky, and conduct a quasi-experimental study matching an initial pool of 368,513 users exposed to one of three feeds -- News, Science, and Blacksky -- with a pool of 2,001,915 active control users who did not engage with any of these feeds. We examine linguistic evolution across three dimensions: lexico-semantics, psycholinguistics, and topics. We find that users exposed to these feeds show significantly greater stylistic accommodation, semantic alignment, and register formalization than matched controls. These effects vary markedly by feed identity -- Blacksky produces the deepest psycholinguistic restructuring, with significant shifts in cognitive processing, affective expression, and pronoun use, while News and Science effects are largely confined to register and topical focus. Regression models reveal that reposting is the most consistent predictor of linguistic convergence across all feeds, whereas posting and bookmarking show feed-dependent effects, with effects differing more than fourfold across feeds. Our work extends Cultivation Theory beyond belief formation to linguistic behavior, demonstrating that feeds function as persistent linguistic environments that gradually shape what and how users write online. Our work has implications for studying algorithmic influence, online identity formation, and the design and governance of feed-based platforms that mediate online interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that algorithmic feeds on Bluesky act as linguistic environments that cultivate changes in users' writing, based on a quasi-experimental matching of 368,513 users exposed to News, Science, or Blacksky feeds against 2,001,915 controls. Exposed users exhibit greater stylistic accommodation, semantic alignment, and register formalization than controls, with Blacksky producing the largest psycholinguistic shifts; reposting is the strongest predictor of convergence across feeds. The work extends Cultivation Theory to linguistic behavior using a longitudinal dataset of 235M posts.

Significance. If the identification strategy holds, the result would be significant for showing that sustained algorithmic feed exposure can measurably reshape not only information consumption but users' own language production across lexico-semantic, psycholinguistic, and topical dimensions. The scale of the Bluesky dataset and the feed-specific heterogeneity provide concrete evidence that could inform studies of online identity formation and platform governance.

major comments (3)

[Methods (Quasi-experimental design)] Methods section on quasi-experimental design: the matching of 368,513 exposed users to 2,001,915 controls is described only in terms of activity levels, with no reported balance statistics on pre-exposure linguistic traits, topic interests, or cognitive-style proxies. This directly undermines the central causal claim that observed differences can be attributed to feed exposure rather than selection.
[Results] Results section: statistically significant differences in stylistic accommodation, semantic alignment, and register formalization are reported without effect sizes, standardized coefficients, or robustness checks such as alternative matching specifications or pre-trend tests. This leaves the practical magnitude and reliability of the feed-specific effects (especially Blacksky's deeper restructuring) unevaluated.
[Regression analysis] Regression models: the claim that reposting is the most consistent predictor and that effects differ more than fourfold across feeds rests on unspecified model specifications, without user fixed effects, multicollinearity diagnostics, or tests for reverse causality. These details are load-bearing for interpreting engagement-type coefficients as evidence of cultivation.

minor comments (2)

[Abstract] Abstract: the time window of the longitudinal data and per-feed sample sizes after matching are not stated, which would help readers assess the scope of the observed changes.
[Introduction / Methods] Notation: the three linguistic dimensions (lexico-semantics, psycholinguistics, topics) are introduced without explicit operational definitions or example measures in the main text, making it harder to connect specific findings to the dimensions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments on our paper. We address each of the major comments below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: Methods section on quasi-experimental design: the matching of 368,513 exposed users to 2,001,915 controls is described only in terms of activity levels, with no reported balance statistics on pre-exposure linguistic traits, topic interests, or cognitive-style proxies. This directly undermines the central causal claim that observed differences can be attributed to feed exposure rather than selection.

Authors: We acknowledge the importance of demonstrating balance on pre-exposure characteristics to support the causal interpretation. Our current matching focuses on activity levels to ensure similar platform engagement prior to feed exposure. In the revised version, we will report balance statistics for pre-exposure linguistic traits, topic interests, and any available cognitive-style proxies. If additional matching variables are feasible, we will incorporate them and update the methods section accordingly. revision: yes
Referee: Results section: statistically significant differences in stylistic accommodation, semantic alignment, and register formalization are reported without effect sizes, standardized coefficients, or robustness checks such as alternative matching specifications or pre-trend tests. This leaves the practical magnitude and reliability of the feed-specific effects (especially Blacksky's deeper restructuring) unevaluated.

Authors: We agree that providing effect sizes and additional robustness checks will better convey the magnitude and reliability of our findings. We will add standardized coefficients, effect sizes, and perform robustness analyses including alternative matching specifications and pre-trend tests leveraging the longitudinal data structure. These will be included in the revised results section. revision: yes
Referee: Regression models: the claim that reposting is the most consistent predictor and that effects differ more than fourfold across feeds rests on unspecified model specifications, without user fixed effects, multicollinearity diagnostics, or tests for reverse causality. These details are load-bearing for interpreting engagement-type coefficients as evidence of cultivation.

Authors: We will provide full details on the regression model specifications in the revision, including any user fixed effects, multicollinearity diagnostics such as variance inflation factors, and discussions or tests addressing potential reverse causality. This will clarify how the coefficients support the cultivation interpretation and address concerns about model robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical quasi-experimental design relies on external data and standard methods

full rationale

The paper conducts a quasi-experimental study by matching 368k exposed users to 2M controls on Bluesky activity data and then applies regression to measure linguistic shifts across lexico-semantics, psycholinguistics, and topics. No equations, predictions, or first-principles derivations are presented that reduce by construction to fitted parameters or self-citations; the central claims rest on observable differences in longitudinal post data rather than any self-referential loop. The design is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that observable matching approximates causal isolation and on standard statistical assumptions for regression; no new entities are postulated and no free parameters are explicitly named beyond the fitted regression coefficients for engagement types.

free parameters (1)

engagement-type regression coefficients
Coefficients fitted to predict linguistic convergence; reported to differ more than fourfold across feeds and to identify reposting as the strongest predictor.

axioms (1)

domain assumption Quasi-experimental matching on observables controls for self-selection and confounding factors
Invoked when constructing the exposed and control pools to support causal attribution of linguistic changes to feed exposure.

pith-pipeline@v0.9.0 · 5841 in / 1125 out tokens · 68806 ms · 2026-05-19T18:34:34.508553+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 1 internal anchor

[1]

Aleksic, A. 2025. Algospeak: How social media is transforming the future of language. Knopf

work page 2025
[2]

D.; and Pischke, J.-S

Angrist, J. D.; and Pischke, J.-S. 2009. Mostly harmless econometrics: An empiricist's companion. Princeton university press

work page 2009
[3]

Bakshy, E.; Messing, S.; and Adamic, L. A. 2015. Exposure to ideologically diverse news and opinion on Facebook. Science

work page 2015
[4]

M.; Ng, A

Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirichlet allocation. JMLR, 3(Jan): 993--1022

work page 2003
[5]

L.; Ashokkumar, A.; Seraj, S.; and Pennebaker, J

Boyd, R. L.; Ashokkumar, A.; Seraj, S.; and Pennebaker, J. W. 2022. The development and psychometric properties of LIWC-22

work page 2022
[6]

Chan, J.; Choi, F.; Saha, K.; and Chandrasekharan, E. 2025. Examining algorithmic curation on social media: An empirical audit of Reddit's r/popular feed. arXiv preprint arXiv:2502.20491

work page arXiv 2025
[7]

Chan, J.; Lambert, C.; Choi, F.; Chancellor, S.; and Chandrasekharan, E. 2024. Understanding Community Resilience: Quantifying the Effects of Sudden Popularity via Algorithmic Curation. In ICWSM

work page 2024
[8]

Chandrasekharan, E.; Jhaver, S.; Bruckman, A.; and Gilbert, E. 2022. Quarantined! Examining the effects of a community-wide moderation intervention on Reddit. ACM TOCHI

work page 2022
[9]

Chandrasekharan, E.; Pavalanathan, U.; Srinivasan, A.; Glynn, A.; Eisenstein, J.; and Gilbert, E. 2017. You can't stay here: The efficacy of reddit's 2015 ban examined through hate speech. PACM HCI, (CSCW)

work page 2017
[10]

Chandrasekharan, E.; Samory, M.; Jhaver, S.; Charvat, H.; Bruckman, A.; Lampe, C.; Eisenstein, J.; and Gilbert, E. 2018. The Internet's hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales. PACM HCI, (CSCW)

work page 2018
[11]

Choi, F.; and Chandrasekharan, E. 2025. Designing Usable Controls for Customizable Social Media Feeds. arXiv preprint arXiv:2509.19615

work page arXiv 2025
[12]

A.; Saha, D.; Hasan, M

Chowdhury, F. A.; Saha, D.; Hasan, M. R.; Saha, K.; and Mueen, A. 2021. Examining factors associated with twitter account suspension following the 2020 us presidential election. In ASONAM

work page 2021
[13]

A.; Mehl, M

Cohn, M. A.; Mehl, M. R.; and Pennebaker, J. W. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological science, 15(10): 687--693

work page 2004
[14]

Danescu-Niculescu-Mizil, C.; Gamon, M.; and Dumais, S. 2011. Mark my words!: linguistic style accommodation in social media. In WWW

work page 2011
[15]

Danescu-Niculescu-Mizil, C.; Sudhof, M.; Jurafsky, D.; Leskovec, J.; and Potts, C. 2013 a . A computational approach to politeness with application to social factors. In ACL

work page 2013
[16]

Danescu-Niculescu-Mizil, C.; West, R.; Jurafsky, D.; Leskovec, J.; and Potts, C. 2013 b . No country for old members: User lifecycle and linguistic change in online communities. In WWW

work page 2013
[17]

De Choudhury, M.; and K c man, E. 2017. The language of social support in social media and its effect on suicidal ideation risk. In ICWSM

work page 2017
[18]

Egger, R.; and Yu, J. 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology

work page 2022
[19]

El Malki, O.; Aubin Le Qu \'e r \'e , M.; Monroy-Hern \'a ndez, A.; and Horta Ribeiro, M. 2026. Bonsai: Intentional and personalized social media feeds. In Proc. CHI

work page 2026
[20]

K.; Rizvi, A

Ernala, S. K.; Rizvi, A. F.; Birnbaum, M. L.; Kane, J. M.; and De Choudhury, M. 2017. Linguistic markers indicating therapeutic outcomes of social media disclosures of schizophrenia. PACM HCI, (CSCW)

work page 2017
[21]

I’m in the bluesky tonight

Failla, A.; and Rossetti, G. 2024. “I’m in the bluesky tonight”: insights from a year worth of social data. PloS one

work page 2024
[22]

mainstreaming

Gerbner, G.; Gross, L.; Morgan, M.; and Signorielli, N. 1980. The “mainstreaming” of America: Violence profile number 11. Journal of communication, 30(3): 10--29

work page 1980
[23]

Gerbner, G.; et al. 1978. Cultural indicators: Violence profile no. 9. Journal of communication, 28(3): 176--207

work page 1978
[24]

Giles, H.; Coupland, J.; and Coupland, N. 1991. Contexts of accommodation: Developments in applied sociolinguistics

work page 1991
[25]

Goel, R.; Soni, S.; Goyal, N.; Paparrizos, J.; Wallach, H.; Diaz, F.; and Eisenstein, J. 2016. The social dynamics of language change in online networks. In SocInfo

work page 2016
[26]

Goffman, E.; et al. 1959. The presentation of self in everyday life

work page 1959
[27]

Goyal, A.; Lambert, C.; and Chandrasekharan, E. 2025. The language of approval: Identifying the drivers of positive feedback online. arXiv preprint arXiv:2509.10370

work page arXiv 2025
[28]

Grimmelmann, J. 2015. The law and ethics of experiments on social media users. Colo. Tech. LJ, 13: 219

work page 2015
[29]

Grootendorst, M. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794

work page internal anchor Pith review Pith/arXiv arXiv 2022
[30]

Hermann, E.; Morgan, M.; and Shanahan, J. 2023. Cultivation and social media: A meta-analysis. New Media & Society, 25(9)

work page 2023
[31]

W.; and Rubin, D

Imbens, G. W.; and Rubin, D. B. 2015. Causal inference in statistics, social, and biomedical sciences. Cambridge university press

work page 2015
[32]

Kiciman, E.; Counts, S.; and Gasser, M. 2018. Using longitudinal social media analysis to understand the effects of early college alcohol use. In ICWSM

work page 2018
[33]

Kleppmann, M.; Frazee, P.; Gold, J.; Graber, J.; Holmgren, D.; Ivy, D.; Johnson, J.; Newbold, B.; and Volpert, J. 2024. Bluesky and the at protocol: Usable decentralized social media. In ACM Conext-2024

work page 2024
[34]

G.; Klein, M

Kolden, G. G.; Klein, M. H.; Wang, C.-C.; and Austin, S. B. 2011. Congruence/genuineness. Psychotherapy, 48(1): 65

work page 2011
[35]

Lambert, C.; Saha, K.; and Chandrasekharan, E. 2025. Does Positive Reinforcement Work?: A Quasi-Experimental Study of the Effects of Positive Feedback on Reddit. In Proc. CHI

work page 2025
[36]

F.; Berelson, B.; and Gaudet, H

Lazarsfeld, P. F.; Berelson, B.; and Gaudet, H. 1968. The people’s choice: How the voter makes up his mind in a presidential campaign. Columbia University Press

work page 1968
[37]

E.; Qu \'e r \'e , M

Malki, O. E.; Qu \'e r \'e , M. A. L.; Monroy-Hern \'a ndez, A.; and Ribeiro, M. H. 2025. Bonsai: Intentional and personalized social media feeds. arXiv preprint arXiv:2509.10776

work page arXiv 2025
[38]

E.; and Shaw, D

McCombs, M. E.; and Shaw, D. L. 1972. The agenda-setting function of mass media. Public opinion quarterly

work page 1972
[39]

Metzler, H.; and Garcia, D. 2024. Social drivers and algorithmic mechanisms on digital media. Perspectives on Psychological Science

work page 2024
[40]

A.; Goniu, N.; Moreno, P

Moreno, M. A.; Goniu, N.; Moreno, P. S.; and Diekema, D. 2013. Ethics of social media research: Common concerns and practical considerations. Cyberpsychology, behavior, and social networking

work page 2013
[41]

Pal, O.; Goyal, A.; Chandrasekharan, E.; and Saha, K. 2026. The Hidden Toll of Social Media News: Causal Effects on Psychosocial Wellbeing. arXiv preprint arXiv:2601.13487

work page arXiv 2026
[42]

Pariser, E. 2011. The filter bubble: What the Internet is hiding from you. Penguin UK

work page 2011
[43]

Park, A.; Conway, M.; et al. 2018. Harnessing reddit to understand the written-communication challenges experienced by individuals with mental health disorders: analysis of texts from mental health communities. Journal of medical Internet research, 20(4): e8219

work page 2018
[44]

Parshall, A. 2025. The Internet Is Making Us Fluent in Algospeak. Scientific American

work page 2025
[45]

W.; Chung, C

Pennebaker, J. W.; Chung, C. K.; Frazee, J.; Lavergne, G. M.; and Beaver, D. I. 2014. When small words foretell academic success: The case of college admissions essays. PloS one

work page 2014
[46]

W.; Mehl, M

Pennebaker, J. W.; Mehl, M. R.; and Niederhoffer, K. G. 2003. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology

work page 2003
[47]

Reimers, N.; and Gurevych, I. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP-IJCNLP

work page 2019
[48]

R.; and Rubin, D

Rosenbaum, P. R.; and Rubin, D. B. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1): 41--55

work page 1983
[49]

Rubin, D. B. 2005. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American statistical Association

work page 2005
[50]

Saha, K.; Chandrasekharan, E.; and De Choudhury, M. 2019. Prevalence and psychological effects of hateful speech in online college communities. In ACM WebSci

work page 2019
[51]

Saha, K.; Jain, Y.; Liu, C.; Kaliappan, S.; and Karkar, R. 2025. Ai vs. humans for online support: Comparing the language of responses from llms and online communities of alzheimer’s disease. ACM HEALTH

work page 2025
[52]

A.; Neves, L.; Shah, N.; and Bos, M

Saha, K.; Liu, Y.; Vincent, N.; Chowdhury, F. A.; Neves, L.; Shah, N.; and Bos, M. W. 2021. Advertiming matters: Examining user ad consumption for effective ad allocations on social media. In CHI

work page 2021
[53]

Saha, K.; Sugar, B.; Torous, J.; Abrahao, B.; K c man, E.; and De Choudhury, M. 2019. A Social Media Study on the Effects of Psychiatric Medication Use. In ICWSM

work page 2019
[54]

Saha, K.; Weber, I.; and De Choudhury, M. 2018. A Social Media Based Examination of the Effects of Counseling Recommendations After Student Deaths on College Campuses. In ICWSM

work page 2018
[55]

S.; Nogara, G.; DeVerna, M

Sahneh, E. S.; Nogara, G.; DeVerna, M. R.; Liu, N.; Luceri, L.; Menczer, F.; Pierri, F.; and Giordano, S. 2024. The dawn of decentralized social media: an exploration of bluesky’s public opening. In ASONAM

work page 2024
[56]

Schlessinger, J.; Garimella, K.; Jakesch, M.; and Eckles, D. 2023. Effects of Algorithmic Trend Promotion: Evidence from Coordinated Campaigns in Twitter's Trending Topics. In ICWSM

work page 2023
[57]

Sharma, E.; and De Choudhury, M. 2018. Mental health support and its relationship to linguistic accommodation in online communities. In CHI

work page 2018
[58]

Stewart, I.; Chancellor, S.; De Choudhury, M.; and Eisenstein, J. 2017. \# anorexia,\# anarexia,\# anarexyia: Characterizing online community practices with orthographic variation. In IEEE BigData

work page 2017
[59]

R.; and Pennebaker, J

Tausczik, Y. R.; and Pennebaker, J. W. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. JLS

work page 2010
[60]

Thorson, K.; and Wells, C. 2016. Curated flows: A framework for mapping media exposure in the digital age. Communication theory

work page 2016
[61]

T.; and Aledavood, T

Yuan, Y.; Saha, K.; Keller, B.; Isomets \"a , E. T.; and Aledavood, T. 2023. Mental health coping stories on social media: a causal-inference study of Papageno effect. In ACM WebConf, 2677--2685

work page 2023
[62]

Yuan, Y.; Zhang, J.; Aledavood, T.; Zhang, R.; and Saha, K. 2026. Mental Health Impacts of AI Companions: Triangulating Social Media Quasi-Experiments, User Perspectives, and Relational Lens. In Proc. CHI

work page 2026
[63]

Zhu, J.; Zou, H.; Rosset, S.; Hastie, T.; et al. 2009. Multi-class adaboost. Statistics and its Interface, 2(3): 349--360

work page 2009

[1] [1]

Aleksic, A. 2025. Algospeak: How social media is transforming the future of language. Knopf

work page 2025

[2] [2]

D.; and Pischke, J.-S

Angrist, J. D.; and Pischke, J.-S. 2009. Mostly harmless econometrics: An empiricist's companion. Princeton university press

work page 2009

[3] [3]

Bakshy, E.; Messing, S.; and Adamic, L. A. 2015. Exposure to ideologically diverse news and opinion on Facebook. Science

work page 2015

[4] [4]

M.; Ng, A

Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirichlet allocation. JMLR, 3(Jan): 993--1022

work page 2003

[5] [5]

L.; Ashokkumar, A.; Seraj, S.; and Pennebaker, J

Boyd, R. L.; Ashokkumar, A.; Seraj, S.; and Pennebaker, J. W. 2022. The development and psychometric properties of LIWC-22

work page 2022

[6] [6]

Chan, J.; Choi, F.; Saha, K.; and Chandrasekharan, E. 2025. Examining algorithmic curation on social media: An empirical audit of Reddit's r/popular feed. arXiv preprint arXiv:2502.20491

work page arXiv 2025

[7] [7]

Chan, J.; Lambert, C.; Choi, F.; Chancellor, S.; and Chandrasekharan, E. 2024. Understanding Community Resilience: Quantifying the Effects of Sudden Popularity via Algorithmic Curation. In ICWSM

work page 2024

[8] [8]

Chandrasekharan, E.; Jhaver, S.; Bruckman, A.; and Gilbert, E. 2022. Quarantined! Examining the effects of a community-wide moderation intervention on Reddit. ACM TOCHI

work page 2022

[9] [9]

Chandrasekharan, E.; Pavalanathan, U.; Srinivasan, A.; Glynn, A.; Eisenstein, J.; and Gilbert, E. 2017. You can't stay here: The efficacy of reddit's 2015 ban examined through hate speech. PACM HCI, (CSCW)

work page 2017

[10] [10]

Chandrasekharan, E.; Samory, M.; Jhaver, S.; Charvat, H.; Bruckman, A.; Lampe, C.; Eisenstein, J.; and Gilbert, E. 2018. The Internet's hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales. PACM HCI, (CSCW)

work page 2018

[11] [11]

Choi, F.; and Chandrasekharan, E. 2025. Designing Usable Controls for Customizable Social Media Feeds. arXiv preprint arXiv:2509.19615

work page arXiv 2025

[12] [12]

A.; Saha, D.; Hasan, M

Chowdhury, F. A.; Saha, D.; Hasan, M. R.; Saha, K.; and Mueen, A. 2021. Examining factors associated with twitter account suspension following the 2020 us presidential election. In ASONAM

work page 2021

[13] [13]

A.; Mehl, M

Cohn, M. A.; Mehl, M. R.; and Pennebaker, J. W. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological science, 15(10): 687--693

work page 2004

[14] [14]

Danescu-Niculescu-Mizil, C.; Gamon, M.; and Dumais, S. 2011. Mark my words!: linguistic style accommodation in social media. In WWW

work page 2011

[15] [15]

Danescu-Niculescu-Mizil, C.; Sudhof, M.; Jurafsky, D.; Leskovec, J.; and Potts, C. 2013 a . A computational approach to politeness with application to social factors. In ACL

work page 2013

[16] [16]

Danescu-Niculescu-Mizil, C.; West, R.; Jurafsky, D.; Leskovec, J.; and Potts, C. 2013 b . No country for old members: User lifecycle and linguistic change in online communities. In WWW

work page 2013

[17] [17]

De Choudhury, M.; and K c man, E. 2017. The language of social support in social media and its effect on suicidal ideation risk. In ICWSM

work page 2017

[18] [18]

Egger, R.; and Yu, J. 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology

work page 2022

[19] [19]

El Malki, O.; Aubin Le Qu \'e r \'e , M.; Monroy-Hern \'a ndez, A.; and Horta Ribeiro, M. 2026. Bonsai: Intentional and personalized social media feeds. In Proc. CHI

work page 2026

[20] [20]

K.; Rizvi, A

Ernala, S. K.; Rizvi, A. F.; Birnbaum, M. L.; Kane, J. M.; and De Choudhury, M. 2017. Linguistic markers indicating therapeutic outcomes of social media disclosures of schizophrenia. PACM HCI, (CSCW)

work page 2017

[21] [21]

I’m in the bluesky tonight

Failla, A.; and Rossetti, G. 2024. “I’m in the bluesky tonight”: insights from a year worth of social data. PloS one

work page 2024

[22] [22]

mainstreaming

Gerbner, G.; Gross, L.; Morgan, M.; and Signorielli, N. 1980. The “mainstreaming” of America: Violence profile number 11. Journal of communication, 30(3): 10--29

work page 1980

[23] [23]

Gerbner, G.; et al. 1978. Cultural indicators: Violence profile no. 9. Journal of communication, 28(3): 176--207

work page 1978

[24] [24]

Giles, H.; Coupland, J.; and Coupland, N. 1991. Contexts of accommodation: Developments in applied sociolinguistics

work page 1991

[25] [25]

Goel, R.; Soni, S.; Goyal, N.; Paparrizos, J.; Wallach, H.; Diaz, F.; and Eisenstein, J. 2016. The social dynamics of language change in online networks. In SocInfo

work page 2016

[26] [26]

Goffman, E.; et al. 1959. The presentation of self in everyday life

work page 1959

[27] [27]

Goyal, A.; Lambert, C.; and Chandrasekharan, E. 2025. The language of approval: Identifying the drivers of positive feedback online. arXiv preprint arXiv:2509.10370

work page arXiv 2025

[28] [28]

Grimmelmann, J. 2015. The law and ethics of experiments on social media users. Colo. Tech. LJ, 13: 219

work page 2015

[29] [29]

Grootendorst, M. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794

work page internal anchor Pith review Pith/arXiv arXiv 2022

[30] [30]

Hermann, E.; Morgan, M.; and Shanahan, J. 2023. Cultivation and social media: A meta-analysis. New Media & Society, 25(9)

work page 2023

[31] [31]

W.; and Rubin, D

Imbens, G. W.; and Rubin, D. B. 2015. Causal inference in statistics, social, and biomedical sciences. Cambridge university press

work page 2015

[32] [32]

Kiciman, E.; Counts, S.; and Gasser, M. 2018. Using longitudinal social media analysis to understand the effects of early college alcohol use. In ICWSM

work page 2018

[33] [33]

Kleppmann, M.; Frazee, P.; Gold, J.; Graber, J.; Holmgren, D.; Ivy, D.; Johnson, J.; Newbold, B.; and Volpert, J. 2024. Bluesky and the at protocol: Usable decentralized social media. In ACM Conext-2024

work page 2024

[34] [34]

G.; Klein, M

Kolden, G. G.; Klein, M. H.; Wang, C.-C.; and Austin, S. B. 2011. Congruence/genuineness. Psychotherapy, 48(1): 65

work page 2011

[35] [35]

Lambert, C.; Saha, K.; and Chandrasekharan, E. 2025. Does Positive Reinforcement Work?: A Quasi-Experimental Study of the Effects of Positive Feedback on Reddit. In Proc. CHI

work page 2025

[36] [36]

F.; Berelson, B.; and Gaudet, H

Lazarsfeld, P. F.; Berelson, B.; and Gaudet, H. 1968. The people’s choice: How the voter makes up his mind in a presidential campaign. Columbia University Press

work page 1968

[37] [37]

E.; Qu \'e r \'e , M

Malki, O. E.; Qu \'e r \'e , M. A. L.; Monroy-Hern \'a ndez, A.; and Ribeiro, M. H. 2025. Bonsai: Intentional and personalized social media feeds. arXiv preprint arXiv:2509.10776

work page arXiv 2025

[38] [38]

E.; and Shaw, D

McCombs, M. E.; and Shaw, D. L. 1972. The agenda-setting function of mass media. Public opinion quarterly

work page 1972

[39] [39]

Metzler, H.; and Garcia, D. 2024. Social drivers and algorithmic mechanisms on digital media. Perspectives on Psychological Science

work page 2024

[40] [40]

A.; Goniu, N.; Moreno, P

Moreno, M. A.; Goniu, N.; Moreno, P. S.; and Diekema, D. 2013. Ethics of social media research: Common concerns and practical considerations. Cyberpsychology, behavior, and social networking

work page 2013

[41] [41]

Pal, O.; Goyal, A.; Chandrasekharan, E.; and Saha, K. 2026. The Hidden Toll of Social Media News: Causal Effects on Psychosocial Wellbeing. arXiv preprint arXiv:2601.13487

work page arXiv 2026

[42] [42]

Pariser, E. 2011. The filter bubble: What the Internet is hiding from you. Penguin UK

work page 2011

[43] [43]

Park, A.; Conway, M.; et al. 2018. Harnessing reddit to understand the written-communication challenges experienced by individuals with mental health disorders: analysis of texts from mental health communities. Journal of medical Internet research, 20(4): e8219

work page 2018

[44] [44]

Parshall, A. 2025. The Internet Is Making Us Fluent in Algospeak. Scientific American

work page 2025

[45] [45]

W.; Chung, C

Pennebaker, J. W.; Chung, C. K.; Frazee, J.; Lavergne, G. M.; and Beaver, D. I. 2014. When small words foretell academic success: The case of college admissions essays. PloS one

work page 2014

[46] [46]

W.; Mehl, M

Pennebaker, J. W.; Mehl, M. R.; and Niederhoffer, K. G. 2003. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology

work page 2003

[47] [47]

Reimers, N.; and Gurevych, I. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP-IJCNLP

work page 2019

[48] [48]

R.; and Rubin, D

Rosenbaum, P. R.; and Rubin, D. B. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1): 41--55

work page 1983

[49] [49]

Rubin, D. B. 2005. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American statistical Association

work page 2005

[50] [50]

Saha, K.; Chandrasekharan, E.; and De Choudhury, M. 2019. Prevalence and psychological effects of hateful speech in online college communities. In ACM WebSci

work page 2019

[51] [51]

Saha, K.; Jain, Y.; Liu, C.; Kaliappan, S.; and Karkar, R. 2025. Ai vs. humans for online support: Comparing the language of responses from llms and online communities of alzheimer’s disease. ACM HEALTH

work page 2025

[52] [52]

A.; Neves, L.; Shah, N.; and Bos, M

Saha, K.; Liu, Y.; Vincent, N.; Chowdhury, F. A.; Neves, L.; Shah, N.; and Bos, M. W. 2021. Advertiming matters: Examining user ad consumption for effective ad allocations on social media. In CHI

work page 2021

[53] [53]

Saha, K.; Sugar, B.; Torous, J.; Abrahao, B.; K c man, E.; and De Choudhury, M. 2019. A Social Media Study on the Effects of Psychiatric Medication Use. In ICWSM

work page 2019

[54] [54]

Saha, K.; Weber, I.; and De Choudhury, M. 2018. A Social Media Based Examination of the Effects of Counseling Recommendations After Student Deaths on College Campuses. In ICWSM

work page 2018

[55] [55]

S.; Nogara, G.; DeVerna, M

Sahneh, E. S.; Nogara, G.; DeVerna, M. R.; Liu, N.; Luceri, L.; Menczer, F.; Pierri, F.; and Giordano, S. 2024. The dawn of decentralized social media: an exploration of bluesky’s public opening. In ASONAM

work page 2024

[56] [56]

Schlessinger, J.; Garimella, K.; Jakesch, M.; and Eckles, D. 2023. Effects of Algorithmic Trend Promotion: Evidence from Coordinated Campaigns in Twitter's Trending Topics. In ICWSM

work page 2023

[57] [57]

Sharma, E.; and De Choudhury, M. 2018. Mental health support and its relationship to linguistic accommodation in online communities. In CHI

work page 2018

[58] [58]

Stewart, I.; Chancellor, S.; De Choudhury, M.; and Eisenstein, J. 2017. \# anorexia,\# anarexia,\# anarexyia: Characterizing online community practices with orthographic variation. In IEEE BigData

work page 2017

[59] [59]

R.; and Pennebaker, J

Tausczik, Y. R.; and Pennebaker, J. W. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. JLS

work page 2010

[60] [60]

Thorson, K.; and Wells, C. 2016. Curated flows: A framework for mapping media exposure in the digital age. Communication theory

work page 2016

[61] [61]

T.; and Aledavood, T

Yuan, Y.; Saha, K.; Keller, B.; Isomets \"a , E. T.; and Aledavood, T. 2023. Mental health coping stories on social media: a causal-inference study of Papageno effect. In ACM WebConf, 2677--2685

work page 2023

[62] [62]

Yuan, Y.; Zhang, J.; Aledavood, T.; Zhang, R.; and Saha, K. 2026. Mental Health Impacts of AI Companions: Triangulating Social Media Quasi-Experiments, User Perspectives, and Relational Lens. In Proc. CHI

work page 2026

[63] [63]

Zhu, J.; Zou, H.; Rosset, S.; Hastie, T.; et al. 2009. Multi-class adaboost. Statistics and its Interface, 2(3): 349--360

work page 2009