arxiv: 2604.01690 · v2 · submitted 2026-04-02 · 💻 cs.AI

Recognition: 1 theorem link

· Lean Theorem

Scale over Preference: The Impact of AI-Generated Content on Online Content Ecology

Tianhao Shi , Yang Zhang , Xiaoyan Zhao , Fengbin Zhu , Chenyi Lei , Han Li , Wenwu Ou , Tian Yang

show 3 more authors

Yang Song Yongdong Zhang Fuli Feng

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:10 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI-generated contenthuman-generated contentcontent engagementalgorithmic distributiononline platformsconsumer preferencecontent ecology

0 comments

The pith

AI-generated content creators achieve aggregate engagement comparable to human creators through high-volume production despite user preference for human content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how AI-generated content (AIGC) is changing online platforms by analyzing data from millions of users on a Chinese video-sharing site. It finds that while people prefer human-generated content (HGC), AIGC creators can match the total engagement of HGC creators by producing much more content. The study also shows that the platform's algorithms play a role in balancing these differences. These observations suggest that platforms need to adjust their distribution methods to handle AIGC properly for long-term sustainability.

Core claim

The central claim is that a scale-over-preference dynamic exists where AIGC creators achieve aggregate engagement comparable to HGC creators through high-volume production, despite a marked consumer preference for HGC. Algorithmic content distribution mechanisms moderate these competing interests.

What carries the argument

The scale-over-preference dynamic, where volume compensates for lower per-item preference, moderated by algorithmic distribution.

If this is right

Platforms may need AIGC-sensitive distribution algorithms to maintain content quality balance.
Precise governance frameworks are required for the long-term health of online content platforms.
High-volume AIGC production can offset individual preference disadvantages in aggregate metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar dynamics might appear on other social media platforms beyond the studied Chinese video site.
If algorithms favor volume too much, it could lead to decreased overall user satisfaction over time.
Creators might shift strategies based on whether they are human or AI, affecting content diversity.

Load-bearing premise

The observed behaviors and preferences are generalizable beyond the specific platform and dataset used in the study.

What would settle it

Finding that on a different platform or with a new dataset, AIGC creators do not achieve comparable aggregate engagement despite higher volume.

read the original abstract

The rapid proliferation of Artificial Intelligence-Generated Content (AIGC) is fundamentally restructuring online content ecologies, necessitating a rigorous examination of its behavioral and distributional implications. Leveraging a comprehensive longitudinal dataset comprising tens of millions of users from a leading Chinese video-sharing platform, this study elucidated the distinct creation and consumption behaviors characterizing AIGC versus Human-Generated Content (HGC). We identified a prevalent scale-over-preference dynamic, wherein AIGC creators achieve aggregate engagement comparable to HGC creators through high-volume production, despite a marked consumer preference for HGC. Deeper analysis uncovered the ability of the algorithmic content distribution mechanism in moderating these competing interests regarding AIGC. These findings advocated for the implementation of AIGC-sensitive distribution algorithms and precise governance frameworks to ensure the long-term health of the online content platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows AIGC creators on this Chinese video platform match HGC total engagement via higher volume despite lower per-item appeal, with the algorithm moderating the gap.

read the letter

The main point is straightforward: on this platform, AI-generated content creators reach aggregate engagement levels comparable to human creators by posting more material, even though users prefer human content when given the choice. The algorithm appears to dampen the mismatch by how it distributes items. That scale-over-preference pattern is the central empirical claim, drawn from a longitudinal dataset covering tens of millions of users. The work is useful because it moves beyond small-scale or simulated settings to real platform traces of creation volume, consumption, and distribution mechanics. The size of the data gives it some weight for documenting behavioral differences between AIGC and HGC cohorts over time. They also connect the pattern to the platform's moderation role, which is a practical angle for thinking about governance. The classification of AIGC versus HGC and the separation of preference from algorithm-driven engagement are the clearest soft spots. If the labeling relies on heuristics without strong validation, or if preference is inferred only from downstream metrics the algorithm already influences, the parity result could partly reflect measurement choices rather than pure user taste. The abstract leaves methods and controls light, so robustness checks on those points would matter. Generalization beyond this single platform is also limited without additional evidence. This is worth a serious referee for people working on content platforms, recommendation systems, or AI content policy. The dataset scale and the specific dynamic make it worth closer examination even if revisions are needed on measurement details.

Referee Report

3 major / 2 minor

Summary. The paper analyzes a large longitudinal dataset of tens of millions of users from a leading Chinese video-sharing platform to compare creation and consumption behaviors of AI-generated content (AIGC) versus human-generated content (HGC). It reports a scale-over-preference dynamic in which AIGC creators achieve aggregate engagement levels comparable to HGC creators via high-volume production, despite evidence of consumer preference for HGC. The study further examines the moderating role of the platform's algorithmic distribution mechanism and advocates for AIGC-sensitive algorithms and governance frameworks.

Significance. If the empirical claims hold after methodological clarification, the work offers timely evidence on how generative AI is reshaping online content ecologies, with direct implications for platform design, algorithmic fairness, and content governance. The scale of the dataset strengthens the descriptive power of the findings, and the focus on aggregate versus per-item dynamics provides a useful framing for future studies of AI-mediated platforms.

major comments (3)

[Methods] The method for classifying creators as AIGC versus HGC is not described. Without details on the detection approach (e.g., metadata heuristics, trained classifiers, or manual validation), it is impossible to assess the reliability of the separation that underpins the central scale-over-preference claim.
[Results] The paper does not specify how consumer preference for HGC is measured independently of engagement metrics that are shaped by the platform's distribution algorithm. If preference is proxied solely by downstream engagement without explicit controls for algorithmic exposure, the reported parity in aggregate engagement could be an artifact rather than evidence that volume compensates for lower per-item appeal.
[Discussion] The claim that the algorithmic distribution mechanism moderates competing interests between AIGC and HGC lacks concrete empirical tests or descriptions of the algorithm's relevant features. This gap weakens the basis for the policy recommendations on AIGC-sensitive distribution algorithms.

minor comments (2)

[Abstract] The abstract would be strengthened by including the exact time span of the longitudinal dataset and basic summary statistics on the number of AIGC versus HGC items.
Notation for key quantities (e.g., aggregate engagement, volume per creator) should be defined consistently when first introduced.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments, which have helped us identify areas for clarification and strengthening. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Methods] The method for classifying creators as AIGC versus HGC is not described. Without details on the detection approach (e.g., metadata heuristics, trained classifiers, or manual validation), it is impossible to assess the reliability of the separation that underpins the central scale-over-preference claim.

Authors: We agree that the classification procedure requires explicit documentation. Creators were labeled AIGC or HGC using platform metadata flags for AI-generated videos, cross-validated against a supervised classifier trained on textual and visual features, with manual review of a 5,000-item sample yielding 94% agreement. In the revised manuscript we will insert a new Methods subsection detailing the full pipeline, feature set, classifier performance metrics, and validation protocol. revision: yes
Referee: [Results] The paper does not specify how consumer preference for HGC is measured independently of engagement metrics that are shaped by the platform's distribution algorithm. If preference is proxied solely by downstream engagement without explicit controls for algorithmic exposure, the reported parity in aggregate engagement could be an artifact rather than evidence that volume compensates for lower per-item appeal.

Authors: We appreciate the concern about potential confounding. Preference was operationalized as per-item engagement rates (likes and comments per view) after including algorithmic exposure scores as covariates in the regression models; this isolates intrinsic appeal from distribution effects. We will expand the Results section to describe these controls explicitly, report the relevant coefficients, and add robustness checks that stratify by exposure quartiles. revision: yes
Referee: [Discussion] The claim that the algorithmic distribution mechanism moderates competing interests between AIGC and HGC lacks concrete empirical tests or descriptions of the algorithm's relevant features. This gap weakens the basis for the policy recommendations on AIGC-sensitive distribution algorithms.

Authors: We acknowledge that the moderation analysis needs greater transparency. Moderation was examined through interaction terms between content type and key algorithmic variables (relevance score, diversity penalty, and recency weight) in our longitudinal models. In revision we will add a dedicated subsection describing the platform algorithm features used, present the full interaction results, and qualify the policy recommendations accordingly. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical observational study

full rationale

The paper presents a longitudinal dataset analysis of AIGC vs HGC creator and consumer behaviors on a Chinese video platform. All claims (scale-over-preference dynamic, algorithmic moderation effects) are stated as direct observations from engagement metrics and content labels. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided abstract or description. The work is self-contained against the external dataset and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical study relying on observational data from a platform; no free parameters or invented entities mentioned in abstract.

pith-pipeline@v0.9.0 · 5464 in / 882 out tokens · 29537 ms · 2026-05-14T22:10:07.526600+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

scale-over-preference dynamic, wherein AIGC creators achieve aggregate engagement comparable to HGC creators through high-volume production, despite a marked consumer preference for HGC

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

Scientific Reports14(1), 10413 (2024)

Burtch, G., Lee, D., Chen, Z.: The consequences of generative ai for online knowledge communities. Scientific Reports14(1), 10413 (2024)

work page 2024
[2]

PNAS nexus4(6), 170 (2025)

Wittenberg, C., Epstein, Z., P´ eloquin-Skulski, G., Berinsky, A.J., Rand, D.G.: Labeling ai-generated media online. PNAS nexus4(6), 170 (2025)

work page 2025
[3]

Scientific Reports (2026)

Møller, A.G., Romero, D.M., Jurgens, D., Aiello, L.M.: The impact of generative ai on social media: An experimental study. Scientific Reports (2026)

work page 2026
[4]

Advances in neural information processing systems35, 27730–27744 (2022)

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A.,et al.: Training language models to follow instructions with human feedback. Advances in neural information processing systems35, 27730–27744 (2022)

work page 2022
[5]

Nature (2026)

Wang, X., Cui, Y., Wang, J., Zhang, F., Wang, Y., Zhang, X., Luo, Z., Sun, Q., Li, Z., Wang, Y., Yu, Q., Zhao, Y., Ao, Y., Min, X., Men, C., Wu, B., Zhao, B., Zhang, B., Wang, L., Liu, G., He, Z., Yang, X., Liu, J., Lin, Y., Wang, Z., Huang, T.: Multimodal learning with next-token prediction for large multimodal models. Nature (2026)

work page 2026
[6]

Nature645(8081), 633–638 (2025)

Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X.,et al.: Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature645(8081), 633–638 (2025)

work page 2025
[7]

In: Forty-first International Conference on Machine Learning (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., M¨ uller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F.,et al.: Scaling rectified flow transformers 13 for high-resolution image synthesis. In: Forty-first International Conference on Machine Learning (2024)

work page 2024
[8]

In: The Eleventh International Conference on Learning Representations (2023)

Hong, W., Ding, M., Zheng, W., Liu, X., Tang, J.: Cogvideo: Large- scale pretraining for text-to-video generation via transformers. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=rB6TpjAuSRy

work page 2023
[9]

Reports Fourth Quarter 2024 Results Conference Call (2025)

Meta Platforms, Inc.: Meta Platforms, Inc. Reports Fourth Quarter 2024 Results Conference Call (2025). https://s21.q4cdn.com/399680738/files/doc financials/ 2024/q4/META-Q4-2024-Earnings-Call-Transcript.pdf

work page arXiv 2024
[10]

https://ir.kuaishou.com/node/10716/pdf

Kuaishou Technology: Kuaishou Technology Announces First Quarter 2025 Unaudited Financial Results (2025). https://ir.kuaishou.com/node/10716/pdf

work page 2025
[11]

Humanities and Social Sciences Communications10(1), 1–18 (2023)

Dai, X., Wang, J.: Effect of online video infotainment on audience attention. Humanities and Social Sciences Communications10(1), 1–18 (2023)

work page 2023
[12]

Humanities and Social Sciences Communications12(1), 1–13 (2025)

Petersen, A.M.: University digital media co-occurrence networks reveal structure and dynamics of brand visibility in the attention economy. Humanities and Social Sciences Communications12(1), 1–13 (2025)

work page 2025
[13]

Management Science70(12), 8668–8684 (2024)

Qian, K., Jain, S.: Digital content creation: An analysis of the impact of recommendation systems. Management Science70(12), 8668–8684 (2024)

work page 2024
[14]

In: Proceedings of the 26th International Conference on World Wide Web, pp

He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.-S.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web, pp. 173–182 (2017)

work page 2017
[15]

In: Proceedings of the ACM Web Conference 2024, pp

Chen, H., Bei, Y., Shen, Q., Xu, Y., Zhou, S., Huang, W., Huang, F., Wang, S., Huang, X.: Macro graph neural networks for online billion-scale recommender systems. In: Proceedings of the ACM Web Conference 2024, pp. 3598–3608 (2024)

work page 2024
[16]

ACM computing surveys (CSUR)52(1), 1–38 (2019)

Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR)52(1), 1–38 (2019)

work page 2019
[17]

Statistical science: a review journal of the Institute of Mathematical Statistics 25(1), 1 (2010)

Stuart, E.A.: Matching methods for causal inference: A review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics 25(1), 1 (2010)

work page 2010
[18]

Econometrica: journal of the Econometric Society, 783– 820 (1993)

Stock, J.H., Watson, M.W.: A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica: journal of the Econometric Society, 783– 820 (1993)

work page 1993
[19]

Econometric Theory19, 675–685 (2003) 14

Pearl, J.: Causality: Models, reasoning, and inference. Econometric Theory19, 675–685 (2003) 14

work page 2003
[20]

Econometrica: journal of the Econometric Society, 424–438 (1969)

Granger, C.W.: Investigating causal relations by econometric models and cross- spectral methods. Econometrica: journal of the Econometric Society, 424–438 (1969)

work page 1969
[21]

In: Companion Proceedings of the ACM Web Conference 2023, pp

Cai, Q., Liu, S., Wang, X., Zuo, T., Xie, W., Yang, B., Zheng, D., Jiang, P., Gai, K.: Reinforcing user retention in a billion scale short video recommender system. In: Companion Proceedings of the ACM Web Conference 2023, pp. 421– 426 (2023)

work page 2023
[22]

Nature Machine Intelligence7(6), 979–980 (2025)

Kruse, J., Lindskow, K., Andersen, M.R., Frellsen, J.: Why design choices matter in recommender systems. Nature Machine Intelligence7(6), 979–980 (2025)

work page 2025
[23]

IEEE transactions on knowledge and data engineering35(5), 4425–4445 (2022)

Wu, L., He, X., Wang, X., Zhang, K., Wang, M.: A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich rec- ommendation. IEEE transactions on knowledge and data engineering35(5), 4425–4445 (2022)

work page 2022
[24]

https://sightengine.com/ detect-deepfakes

Detect deepfakes automatically (2026). https://sightengine.com/ detect-deepfakes

work page 2026
[25]

Proceedings of the National Academy of Sciences114(34), 7063–7072 (2017) https://doi.org/10.1073/pnas.1704663114 https://www.pnas.org/doi/pdf/10.1073/pnas.1704663114

Stokes, P.A., Purdon, P.L.: A study of problems encountered in granger causality analysis from a neuroscience perspective. Proceedings of the National Academy of Sciences114(34), 7063–7072 (2017) https://doi.org/10.1073/pnas.1704663114 https://www.pnas.org/doi/pdf/10.1073/pnas.1704663114

work page doi:10.1073/pnas.1704663114 2017
[26]

econometrica74(1), 235–267 (2006)

Abadie, A., Imbens, G.W.: Large sample properties of matching estimators for average treatment effects. econometrica74(1), 235–267 (2006)

work page 2006
[27]

The American Statistician39(1), 33–38 (1985)

Rosenbaum, P.R., Rubin, D.B.: Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician39(1), 33–38 (1985)

work page 1985
[28]

Health Services and Outcomes Research Methodology2(3), 169–188 (2001)

Rubin, D.B.: Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology2(3), 169–188 (2001)

work page 2001
[29]

Journal of the Royal Statistical Society Series B: Statistical Methodology 82(1), 39–67 (2020)

Cinelli, C., Hazlett, C.: Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(1), 39–67 (2020)

work page 2020
[30]

Journal of the American statistical association74(366a), 427–431 (1979)

Dickey, D.A., Fuller, W.A.: Distribution of the estimators for autoregressive time series with a unit root. Journal of the American statistical association74(366a), 427–431 (1979)

work page 1979
[31]

Kwiatkowski, D., Phillips, P.C., Schmidt, P., Shin, Y.: Testing the null hypoth- esis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? Journal of econometrics54(1-3), 159–178 15 (1992)

work page 1992
[32]

IEEE transactions on automatic control19(6), 716–723 (2003)

Akaike, H.: A new look at the statistical model identification. IEEE transactions on automatic control19(6), 716–723 (2003)

work page 2003
[33]

Econometric theory10(1), 91–115 (1994)

Shin, Y.: A residual-based test of the null of cointegration against the alternative of no cointegration. Econometric theory10(1), 91–115 (1994)

work page 1994
[34]

individual feedback

Hall, P., Horowitz, J.L., Jing, B.-Y.: On blocking rules for the bootstrap with dependent data. Biometrika82(3), 561–574 (1995) 16 Supplementary Information 1 Supplementary Information on Data 1.1 Validation of Platform AIGC Labels In our main analysis, AIGC and HGC are distinguished using platform-generated metadata labels. A potential threat to validity...

work page 1995