Auditing Algorithmic Personalization in TikTok Comment Sections

Siqi Wu; Yueru Yan

arxiv: 2603.25061 · v1 · submitted 2026-03-26 · 💻 cs.SI · cs.CY

Auditing Algorithmic Personalization in TikTok Comment Sections

Yueru Yan , Siqi Wu This is my paper

Pith reviewed 2026-05-15 01:08 UTC · model grok-4.3

classification 💻 cs.SI cs.CY

keywords TikTokalgorithmic auditingcomment personalizationsock-puppet accountspolitical polarizationranking divergenceengagement metrics

0 comments

The pith

TikTok comment rankings diverge more between left-leaning and right-leaning accounts than within the same group on certain election videos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains sock-puppet accounts to show consistent left or right political preferences and then measures the comment sections these accounts see on 65 politically neutral videos about the 2024 U.S. election. It finds that the top comments themselves stay largely the same across accounts, but their order changes more sharply when the accounts belong to opposite political groups. This ranking divergence grows with video-level signals such as total comments, uneven engagement, and already partisan comment tone. A small case study also hints that some accounts may see comments that match their own leaning, though the pattern does not hold for every video.

Core claim

While the set of prominent comments remains stable, the rank order of those comments differs more between accounts trained on opposing political preferences than between accounts trained on the same preference, and this between-group divergence correlates with higher comment volume, greater engagement inequality, and stronger partisan skew in the comment section itself.

What carries the argument

Trained sock-puppet accounts that maintain left- or right-leaning preferences, used to scrape and compare ranked comment lists on the same videos.

If this is right

Videos with high comment volume and uneven engagement are more likely to show politically differentiated comment ordering.
Comment sections can reinforce an account's existing political lean through selective ranking even when the underlying comments are the same.
The strength of this effect varies by video, so blanket claims about comment personalization do not hold for all content.
Audits limited to top comments may miss personalization that occurs lower in the ranking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the pattern scales to other platforms, comment sections could become another vector for political fragmentation beyond video recommendations.
Platform designers could reduce the effect by down-weighting engagement inequality when ordering comments.
Future audits might test whether the same divergence appears on non-political videos or on videos with low partisan skew.

Load-bearing premise

Trained sock-puppet accounts accurately reproduce the comment personalization that real users experience without triggering platform countermeasures or introducing their own behavioral artifacts.

What would settle it

Collect comment rankings for the same videos from thousands of real accounts whose political leaning can be inferred from their own posting history and test whether the between-group ranking divergence remains larger than the within-group divergence.

read the original abstract

Personalization algorithms are ubiquitous in modern social computing systems, yet their effects on comment sections remain underexplored. In this work, we conducted an algorithmic auditing experiment to examine comment personalization on TikTok. We trained sock-puppet accounts to exhibit left-leaning or right-leaning preferences and successfully validated 17 of them by analyzing the videos recommended on their For You Pages. We then scraped the comment sections shown to these trained partisan accounts, along with five cold-start accounts, across 65 politically neutral videos related to the 2024 U.S. presidential election that contain abundant discussions from both left-leaning and right-leaning perspectives. We find that while the composition of top comments remains largely consistent for all videos, ranking divergence between accounts from different political groups is significantly greater than that observed within the same group for some videos. This effect is strongly correlated with video-level metrics such as comment volume, engagement inequality, and partisan skew in the comment sections. Furthermore, through an exploratory case study, we find preliminary evidence that personalization can result in comment exposure aligned with an account's political leaning. However, this pattern is not universal, suggesting that the extent of politically oriented comment personalization is context-dependent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds greater between-group ranking divergence in TikTok comments than within groups for some videos, but validation only covers For You Page recommendations.

read the letter

The main thing to know is that this paper reports greater divergence in comment rankings between partisan sock-puppet accounts than within groups on some TikTok videos, tied to engagement levels. They trained accounts to lean left or right, validated 17 on the For You Page, then looked at comments on 65 neutral election videos. The composition of top comments holds steady, but the ordering differs more across political lines for videos with high comment volume or skewed engagement. There's also a case study hinting at aligned exposure, though not always. This is a direct application of auditing methods to comment sections, which the abstract notes is underexplored. The data collection on real videos and the correlation with video metrics give it some grounding. The soft spot is the validation step. It only covers video recommendations, not comment ordering, so any non-political signals from training could drive the differences instead of ideology. The modest scale and lack of detail on exact metrics or controls in the abstract make it hard to assess robustness. The stress-test concern about artifacts holds up based on what's described. This paper is for researchers doing platform audits or studying personalization effects. It would interest people working on social media and politics. It deserves peer review to get the methods fleshed out and stats checked.

Referee Report

3 major / 2 minor

Summary. The manuscript reports results from an algorithmic audit of comment-section personalization on TikTok. Seventeen sock-puppet accounts were trained to display left- or right-leaning preferences, validated via For You Page video recommendations, and then used to scrape comment rankings on 65 politically neutral 2024 U.S. election videos. The central empirical claims are that top-comment composition is largely consistent across accounts, yet ranking divergence between accounts from different political groups exceeds within-group divergence for some videos; this divergence correlates with video-level metrics including comment volume, engagement inequality, and partisan skew. An exploratory case study provides preliminary evidence that personalization can align comment exposure with an account’s political leaning, though the pattern is not universal.

Significance. If the methodological gaps identified below are resolved, the work would supply one of the first direct empirical measurements of political personalization inside TikTok comment sections—an underexplored domain. The observational design, use of validated sock puppets, and reported correlations with engagement metrics constitute concrete, falsifiable findings that could inform platform-auditing standards and studies of viewpoint diversity.

major comments (3)

[Methods] Methods section: Validation of the 17 sock-puppet accounts is performed exclusively on For You Page video recommendations. No parallel validation, control condition, or divergence metric is reported for the comment-section rankings themselves. Consequently, it remains possible that observed between-group ranking differences arise from non-political behavioral traces (watch duration, like patterns, or engagement levels) introduced during training rather than from ideological signals.
[Results] Results section: The statement that ranking divergence “is strongly correlated” with comment volume, engagement inequality, and partisan skew is presented without correlation coefficients, confidence intervals, or specification of the statistical procedure. Because these correlations are used to support the claim that personalization effects are context-dependent, the absence of quantified strength and robustness checks weakens the evidential basis.
[Methods] Methods and Results: The selection criteria for the 65 videos are described only at a high level (“politically neutral” yet containing “abundant discussions from both sides”). No explicit operationalization of neutrality, sampling frame, or checks for selection bias are provided, leaving open the possibility that the observed divergence patterns are artifacts of the particular video sample rather than general features of TikTok comment personalization.

minor comments (2)

[Abstract] Abstract: The phrase “significantly greater” is used without naming the divergence metric or the statistical test, which should be stated for precision even in the abstract.
The manuscript would benefit from a short table or figure that explicitly lists the exact divergence metric, the number of accounts per political group, and the within- versus between-group comparison procedure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The comments identify important areas where additional methodological detail and statistical reporting will strengthen the manuscript. We address each point below and will revise the paper accordingly.

read point-by-point responses

Referee: [Methods] Methods section: Validation of the 17 sock-puppet accounts is performed exclusively on For You Page video recommendations. No parallel validation, control condition, or divergence metric is reported for the comment-section rankings themselves. Consequently, it remains possible that observed between-group ranking differences arise from non-political behavioral traces (watch duration, like patterns, or engagement levels) introduced during training rather than from ideological signals.

Authors: We appreciate this observation. The sock-puppet training protocol was designed to embed ideological signals through repeated interactions (e.g., selective liking, commenting, and viewing of partisan content), and the FYP validation confirms that these signals successfully shifted the recommendation distribution toward left- or right-leaning content. Because comment-section ranking is governed by the same underlying personalization engine, we view the FYP results as a reasonable proxy. Nevertheless, we acknowledge that direct validation of comment rankings was not performed and that non-ideological behavioral traces could contribute to the observed divergence. In the revised manuscript we will add an explicit limitations paragraph discussing this possibility, including suggestions for future work that could include comment-specific validation metrics or control conditions. revision: yes
Referee: [Results] Results section: The statement that ranking divergence “is strongly correlated” with comment volume, engagement inequality, and partisan skew is presented without correlation coefficients, confidence intervals, or specification of the statistical procedure. Because these correlations are used to support the claim that personalization effects are context-dependent, the absence of quantified strength and robustness checks weakens the evidential basis.

Authors: We agree that the current presentation lacks the necessary quantitative detail. In the revised Results section we will report the exact correlation coefficients (Pearson or Spearman, as appropriate), 95% confidence intervals, p-values, and the precise statistical procedure used. We will also include any robustness checks (e.g., sensitivity to outlier videos or alternative divergence metrics) that were performed. These additions will allow readers to assess the strength and reliability of the reported relationships directly. revision: yes
Referee: [Methods] Methods and Results: The selection criteria for the 65 videos are described only at a high level (“politically neutral” yet containing “abundant discussions from both sides”). No explicit operationalization of neutrality, sampling frame, or checks for selection bias are provided, leaving open the possibility that the observed divergence patterns are artifacts of the particular video sample rather than general features of TikTok comment personalization.

Authors: We accept that the video-selection description requires greater precision. In the revised Methods section we will provide an explicit operational definition of “politically neutral” videos (videos from mainstream outlets without overt partisan framing, selected within a defined date range and meeting minimum view and comment thresholds), the sampling frame (e.g., stratified sampling across election-related topics), and the criteria used to confirm “abundant discussions from both sides” (e.g., keyword-based comment sentiment balance or manual coding of a subsample). We will also report any post-hoc checks for selection bias, such as comparison of engagement distributions against a larger pool of election videos. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical audit

full rationale

The paper reports direct measurements from sock-puppet training, FYP validation, and comment-section scraping across 65 videos. All reported quantities (ranking divergence, correlations with comment volume/engagement) are computed from raw observed rankings and metadata with no equations, fitted parameters, ansatzes, or derivations. No self-citations are invoked as load-bearing premises, and no step reduces a claimed result to its own inputs by construction. The work is self-contained against external benchmarks via explicit data-collection procedures.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on the domain assumption that sock-puppet accounts can reliably elicit and reveal personalization without introducing artifacts, plus the representativeness of the chosen videos.

axioms (1)

domain assumption Sock-puppet accounts trained on partisan preferences will trigger measurable personalization in comment rankings that reflects real-user experience.
Invoked in the account training and validation steps described in the abstract.

pith-pipeline@v0.9.0 · 5506 in / 1211 out tokens · 65172 ms · 2026-05-15T01:08:24.006621+00:00 · methodology

Auditing Algorithmic Personalization in TikTok Comment Sections

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)