Auditing Algorithmic Personalization in TikTok Comment Sections
Pith reviewed 2026-05-15 01:08 UTC · model grok-4.3
The pith
TikTok comment rankings diverge more between left-leaning and right-leaning accounts than within the same group on certain election videos.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
While the set of prominent comments remains stable, the rank order of those comments differs more between accounts trained on opposing political preferences than between accounts trained on the same preference, and this between-group divergence correlates with higher comment volume, greater engagement inequality, and stronger partisan skew in the comment section itself.
What carries the argument
Trained sock-puppet accounts that maintain left- or right-leaning preferences, used to scrape and compare ranked comment lists on the same videos.
If this is right
- Videos with high comment volume and uneven engagement are more likely to show politically differentiated comment ordering.
- Comment sections can reinforce an account's existing political lean through selective ranking even when the underlying comments are the same.
- The strength of this effect varies by video, so blanket claims about comment personalization do not hold for all content.
- Audits limited to top comments may miss personalization that occurs lower in the ranking.
Where Pith is reading between the lines
- If the pattern scales to other platforms, comment sections could become another vector for political fragmentation beyond video recommendations.
- Platform designers could reduce the effect by down-weighting engagement inequality when ordering comments.
- Future audits might test whether the same divergence appears on non-political videos or on videos with low partisan skew.
Load-bearing premise
Trained sock-puppet accounts accurately reproduce the comment personalization that real users experience without triggering platform countermeasures or introducing their own behavioral artifacts.
What would settle it
Collect comment rankings for the same videos from thousands of real accounts whose political leaning can be inferred from their own posting history and test whether the between-group ranking divergence remains larger than the within-group divergence.
read the original abstract
Personalization algorithms are ubiquitous in modern social computing systems, yet their effects on comment sections remain underexplored. In this work, we conducted an algorithmic auditing experiment to examine comment personalization on TikTok. We trained sock-puppet accounts to exhibit left-leaning or right-leaning preferences and successfully validated 17 of them by analyzing the videos recommended on their For You Pages. We then scraped the comment sections shown to these trained partisan accounts, along with five cold-start accounts, across 65 politically neutral videos related to the 2024 U.S. presidential election that contain abundant discussions from both left-leaning and right-leaning perspectives. We find that while the composition of top comments remains largely consistent for all videos, ranking divergence between accounts from different political groups is significantly greater than that observed within the same group for some videos. This effect is strongly correlated with video-level metrics such as comment volume, engagement inequality, and partisan skew in the comment sections. Furthermore, through an exploratory case study, we find preliminary evidence that personalization can result in comment exposure aligned with an account's political leaning. However, this pattern is not universal, suggesting that the extent of politically oriented comment personalization is context-dependent.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports results from an algorithmic audit of comment-section personalization on TikTok. Seventeen sock-puppet accounts were trained to display left- or right-leaning preferences, validated via For You Page video recommendations, and then used to scrape comment rankings on 65 politically neutral 2024 U.S. election videos. The central empirical claims are that top-comment composition is largely consistent across accounts, yet ranking divergence between accounts from different political groups exceeds within-group divergence for some videos; this divergence correlates with video-level metrics including comment volume, engagement inequality, and partisan skew. An exploratory case study provides preliminary evidence that personalization can align comment exposure with an account’s political leaning, though the pattern is not universal.
Significance. If the methodological gaps identified below are resolved, the work would supply one of the first direct empirical measurements of political personalization inside TikTok comment sections—an underexplored domain. The observational design, use of validated sock puppets, and reported correlations with engagement metrics constitute concrete, falsifiable findings that could inform platform-auditing standards and studies of viewpoint diversity.
major comments (3)
- [Methods] Methods section: Validation of the 17 sock-puppet accounts is performed exclusively on For You Page video recommendations. No parallel validation, control condition, or divergence metric is reported for the comment-section rankings themselves. Consequently, it remains possible that observed between-group ranking differences arise from non-political behavioral traces (watch duration, like patterns, or engagement levels) introduced during training rather than from ideological signals.
- [Results] Results section: The statement that ranking divergence “is strongly correlated” with comment volume, engagement inequality, and partisan skew is presented without correlation coefficients, confidence intervals, or specification of the statistical procedure. Because these correlations are used to support the claim that personalization effects are context-dependent, the absence of quantified strength and robustness checks weakens the evidential basis.
- [Methods] Methods and Results: The selection criteria for the 65 videos are described only at a high level (“politically neutral” yet containing “abundant discussions from both sides”). No explicit operationalization of neutrality, sampling frame, or checks for selection bias are provided, leaving open the possibility that the observed divergence patterns are artifacts of the particular video sample rather than general features of TikTok comment personalization.
minor comments (2)
- [Abstract] Abstract: The phrase “significantly greater” is used without naming the divergence metric or the statistical test, which should be stated for precision even in the abstract.
- The manuscript would benefit from a short table or figure that explicitly lists the exact divergence metric, the number of accounts per political group, and the within- versus between-group comparison procedure.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review. The comments identify important areas where additional methodological detail and statistical reporting will strengthen the manuscript. We address each point below and will revise the paper accordingly.
read point-by-point responses
-
Referee: [Methods] Methods section: Validation of the 17 sock-puppet accounts is performed exclusively on For You Page video recommendations. No parallel validation, control condition, or divergence metric is reported for the comment-section rankings themselves. Consequently, it remains possible that observed between-group ranking differences arise from non-political behavioral traces (watch duration, like patterns, or engagement levels) introduced during training rather than from ideological signals.
Authors: We appreciate this observation. The sock-puppet training protocol was designed to embed ideological signals through repeated interactions (e.g., selective liking, commenting, and viewing of partisan content), and the FYP validation confirms that these signals successfully shifted the recommendation distribution toward left- or right-leaning content. Because comment-section ranking is governed by the same underlying personalization engine, we view the FYP results as a reasonable proxy. Nevertheless, we acknowledge that direct validation of comment rankings was not performed and that non-ideological behavioral traces could contribute to the observed divergence. In the revised manuscript we will add an explicit limitations paragraph discussing this possibility, including suggestions for future work that could include comment-specific validation metrics or control conditions. revision: yes
-
Referee: [Results] Results section: The statement that ranking divergence “is strongly correlated” with comment volume, engagement inequality, and partisan skew is presented without correlation coefficients, confidence intervals, or specification of the statistical procedure. Because these correlations are used to support the claim that personalization effects are context-dependent, the absence of quantified strength and robustness checks weakens the evidential basis.
Authors: We agree that the current presentation lacks the necessary quantitative detail. In the revised Results section we will report the exact correlation coefficients (Pearson or Spearman, as appropriate), 95% confidence intervals, p-values, and the precise statistical procedure used. We will also include any robustness checks (e.g., sensitivity to outlier videos or alternative divergence metrics) that were performed. These additions will allow readers to assess the strength and reliability of the reported relationships directly. revision: yes
-
Referee: [Methods] Methods and Results: The selection criteria for the 65 videos are described only at a high level (“politically neutral” yet containing “abundant discussions from both sides”). No explicit operationalization of neutrality, sampling frame, or checks for selection bias are provided, leaving open the possibility that the observed divergence patterns are artifacts of the particular video sample rather than general features of TikTok comment personalization.
Authors: We accept that the video-selection description requires greater precision. In the revised Methods section we will provide an explicit operational definition of “politically neutral” videos (videos from mainstream outlets without overt partisan framing, selected within a defined date range and meeting minimum view and comment thresholds), the sampling frame (e.g., stratified sampling across election-related topics), and the criteria used to confirm “abundant discussions from both sides” (e.g., keyword-based comment sentiment balance or manual coding of a subsample). We will also report any post-hoc checks for selection bias, such as comparison of engagement distributions against a larger pool of election videos. revision: yes
Circularity Check
No circularity: purely observational empirical audit
full rationale
The paper reports direct measurements from sock-puppet training, FYP validation, and comment-section scraping across 65 videos. All reported quantities (ranking divergence, correlations with comment volume/engagement) are computed from raw observed rankings and metadata with no equations, fitted parameters, ansatzes, or derivations. No self-citations are invoked as load-bearing premises, and no step reduces a claimed result to its own inputs by construction. The work is self-contained against external benchmarks via explicit data-collection procedures.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sock-puppet accounts trained on partisan preferences will trigger measurable personalization in comment rankings that reflects real-user experience.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.