Machine Translation in the Wild: User Reaction to Xiaohongshu's Built-In Translation Feature

Sui He

arxiv: 2603.15922 · v2 · submitted 2026-03-16 · 💻 cs.HC · cs.CL

Machine Translation in the Wild: User Reaction to Xiaohongshu's Built-In Translation Feature

Sui He This is my paper

Pith reviewed 2026-05-15 09:29 UTC · model grok-4.3

classification 💻 cs.HC cs.CL

keywords machine translationuser reactionssocial mediaXiaohongshusentiment analysisthematic analysistranslation accuracyuser testing

0 comments

The pith

Xiaohongshu users reacted positively to its new machine translation feature but tested it mostly on slang, abbreviations, symbols and encoded text rather than everyday language.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how users responded when Xiaohongshu added a built-in machine translation tool in January 2025. Researchers gathered 6,723 comments from eleven official posts and applied both sentiment and thematic analysis to the reactions. Overall sentiment was positive, though users raised concerns about accuracy, functionality, and accessibility. A striking pattern appeared: people tried the tool on atypical inputs such as standalone words, internet slang, abbreviations, and symbolic forms, and they responded favorably when the system decoded those inputs correctly. Testing of ordinary sentences stayed limited, which the authors argue could encourage users to accept machine outputs without checking them in daily use.

Core claim

Analysis of 6,723 comments shows that users generally welcomed the translation feature on Xiaohongshu despite noting issues with functionality, accessibility, and accuracy. Users actively experimented with atypical inputs such as standalone words, abbreviations, internet slang, and symbolic or encoded forms, reacting positively when the system decoded them successfully. In contrast, testing on conventional language was limited. The paper concludes this pattern may encourage uncritical acceptance of machine translation outputs in real-world online communication.

What carries the argument

Sentiment and thematic analysis of 6,723 user comments collected from eleven official Xiaohongshu posts announcing the translation feature, used to identify testing patterns and acceptance tendencies.

If this is right

Platform machine translation must improve accuracy on non-standard language such as slang and symbols to match user testing habits.
Limited testing of conventional text means users may miss translation errors in normal communication.
Closer collaboration among computer scientists, translation scholars, and platform designers is required to refine performance and encourage informed use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar testing patterns may appear on other social platforms when they add translation tools.
Platforms could add simple confidence indicators for translations to prompt users to check outputs more often.
Over time, repeated success on unusual text might increase overall trust in machine translation even when it is applied to standard language.

Load-bearing premise

The 6,723 comments from the eleven official posts are representative of typical user reactions and the combined sentiment and thematic analysis captures genuine perceptions without major selection or interpretation bias.

What would settle it

A follow-up observation that users test the tool extensively on standard everyday sentences and routinely verify or correct its outputs would show the limited conventional testing and risk of uncritical acceptance do not hold.

read the original abstract

This paper examines user reactions to the launch of the machine translation (MT) feature on Xiaohongshu, a Chinese social media and e-commerce platform, in January 2025. Drawing on a dataset of 6,723 comments collected from 11 official posts promoting the translation function, this paper combines sentiment analysis with thematic analysis to investigate how users perceived and experimented with the function. Results show that reactions were generally positive, although concerns regarding functionality, accessibility, and translation accuracy were also expressed. In addition, users actively tested the function with inputs that are atypical for everyday online communication, including stand-alone words and phrases, abbreviations, internet slang, and symbolic or encoded forms. Successful decoding of these texts elicited positive responses, while testing of more conventional language remained fairly limited. This could lead to uncritical acceptance of MT outputs by users, highlighting the importance of closer collaboration among computer scientists, translation scholars, and platform designers to improve MT performance and promote informed user engagement in real-world scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a useful snapshot of users testing Xiaohongshu's new MT feature on slang and symbols, but the sample from promotional posts limits how much we can generalize the positive reactions and testing patterns.

read the letter

The main thing to know is that this work documents real user comments on a live MT rollout in January 2025 and finds that people tried the tool on stand-alone words, abbreviations, slang, and symbols more than on ordinary sentences, with successful cases getting positive feedback. That platform-specific pattern is the clearest new piece here, since earlier MT studies have not focused on this exact Chinese social-commerce site right after launch.

Referee Report

3 major / 1 minor

Summary. This paper examines user reactions to the launch of a built-in machine translation feature on the Xiaohongshu platform in January 2025. It analyzes a dataset of 6,723 comments from 11 official promotional posts using sentiment analysis combined with thematic analysis, reporting generally positive reactions alongside concerns about functionality, accessibility, and accuracy. The study further finds that users preferentially test the feature with atypical inputs such as stand-alone words, abbreviations, internet slang, and symbolic forms, with successful decoding eliciting positive responses while conventional language testing remains limited, potentially risking uncritical acceptance of MT outputs.

Significance. If the methodological and sampling issues are resolved, the work provides timely empirical evidence on real-world MT interactions in social media, particularly the tendency to experiment with non-standard language. It usefully highlights implications for user trust in MT and calls for collaboration between computer scientists, translation scholars, and designers. The observational approach on a major Chinese platform adds practical value, though generalizability remains constrained without broader validation.

major comments (3)

[Data collection] Data collection (abstract and methods): The 6,723 comments are drawn exclusively from 11 official promotional posts. This sampling frame introduces self-selection bias toward users already following the account and engaging with marketing content, systematically over-representing early adopters and tech-interested individuals. No justification or comparison to comments on ordinary user posts is provided, undermining claims that the observed preference for atypical inputs and positive reactions reflect typical user behavior.
[Methods] Methods section: The abstract states that sentiment analysis and thematic analysis were combined, yet no details are supplied on the sentiment analysis tool or lexicon, the thematic coding scheme, inter-rater reliability statistics, or the procedure for sampling or filtering the 6,723 comments. These omissions make it impossible to evaluate the reliability of the reported positive sentiment distribution or the identification of input-type categories.
[Results] Results and discussion: The central claim that testing of conventional language remained limited while atypical inputs produced positive responses, thereby risking uncritical MT acceptance, lacks quantitative support such as percentages of input types, cross-tabulations of input category by sentiment, or representative comment examples. Without these, the inference about potential over-acceptance rests on qualitative impression rather than documented patterns.

minor comments (1)

[Discussion] The manuscript would benefit from a dedicated limitations subsection explicitly addressing platform-specific context and sample bias.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We have carefully considered each point and provide detailed responses below, along with our plans for revision.

read point-by-point responses

Referee: [Data collection] Data collection (abstract and methods): The 6,723 comments are drawn exclusively from 11 official promotional posts. This sampling frame introduces self-selection bias toward users already following the account and engaging with marketing content, systematically over-representing early adopters and tech-interested individuals. No justification or comparison to comments on ordinary user posts is provided, undermining claims that the observed preference for atypical inputs and positive reactions reflect typical user behavior.

Authors: We acknowledge the potential for self-selection bias in our sampling approach, as the comments are indeed drawn from official promotional posts. Our intention was to focus on the immediate user reactions to the feature launch as announced by the platform, which these posts represent. However, we recognize that this may over-represent engaged users. In the revised manuscript, we will add explicit justification for this sampling choice in the Methods section and expand the Limitations section to discuss the implications for generalizability. We will also clarify that our claims are specific to reactions on promotional content and suggest future studies on user-generated posts for comparison. revision: partial
Referee: [Methods] Methods section: The abstract states that sentiment analysis and thematic analysis were combined, yet no details are supplied on the sentiment analysis tool or lexicon, the thematic coding scheme, inter-rater reliability statistics, or the procedure for sampling or filtering the 6,723 comments. These omissions make it impossible to evaluate the reliability of the reported positive sentiment distribution or the identification of input-type categories.

Authors: We apologize for the lack of detail in the Methods section. In the revision, we will expand it to include the specific sentiment analysis approach used (including any tools or lexicons), the thematic coding scheme with examples, inter-rater reliability statistics if applicable, and the detailed procedure for sampling and filtering the comments. This will enable readers to better assess the reliability of our findings. revision: yes
Referee: [Results] Results and discussion: The central claim that testing of conventional language remained limited while atypical inputs produced positive responses, thereby risking uncritical MT acceptance, lacks quantitative support such as percentages of input types, cross-tabulations of input category by sentiment, or representative comment examples. Without these, the inference about potential over-acceptance rests on qualitative impression rather than documented patterns.

Authors: We agree that additional quantitative support would strengthen our claims. In the revised manuscript, we will include percentages for the distribution of input types, cross-tabulations of input categories by sentiment, and more representative comment examples to document the patterns observed. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical analysis

full rationale

The paper collects 6,723 comments from 11 official posts and applies sentiment plus thematic analysis to report user reactions and testing behaviors. No mathematical derivations, equations, fitted parameters, predictions, or first-principles claims appear. Central results (generally positive reactions, preference for atypical inputs) are direct summaries of the collected data rather than outputs constructed from inputs by definition or self-citation. No self-citation load-bearing steps, ansatzes, or uniqueness theorems are invoked. Sampling bias is a methodological limitation but does not create circularity in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard assumptions of qualitative social media analysis without introducing new parameters or entities.

axioms (1)

domain assumption Comments posted on official promotional posts reflect authentic user perceptions of the translation feature
Data collection method assumes comments are genuine reactions rather than performative or bot-generated.

pith-pipeline@v0.9.0 · 5466 in / 1312 out tokens · 44887 ms · 2026-05-15T09:29:17.810037+00:00 · methodology

Machine Translation in the Wild: User Reaction to Xiaohongshu's Built-In Translation Feature

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)