pith. sign in

arxiv: 2605.04887 · v1 · submitted 2026-05-06 · 💻 cs.CL

Sentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithm

Pith reviewed 2026-05-08 16:52 UTC · model grok-4.3

classification 💻 cs.CL
keywords sentiment analysisXGBoostTF-IDFcustomer satisfactionYouTube commentse-commercePyCaretmachine learning
0
0 comments X

The pith

XGBoost with TF-IDF on YouTube comments predicts e-commerce customer satisfaction while exposing heavy socio-political influence on polarity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a predictive model that turns unstructured YouTube comments from e-commerce review videos into numerical features via TF-IDF, then classifies satisfaction levels with an XGBoost classifier tuned inside the PyCaret framework. It reports that this setup yields resilient classification performance on the collected comments. At the same time, lexical analysis and feature-importance inspection show that socio-political terms dominate the discourse and shift the predicted polarity of audience satisfaction. A sympathetic reader would care because manual review of comment volume is impractical and because the political contamination finding suggests satisfaction signals are entangled with external context rather than pure product experience.

Core claim

Using a secondary dataset of YouTube comments from e-commerce review videos, the study applies TF-IDF vectorization followed by PyCaret-optimized XGBoost classification and finds both strong predictive resilience and the infiltration of socio-political terminology that alters sentiment polarity.

What carries the argument

PyCaret-optimized XGBoost classifier operating on TF-IDF features extracted from preprocessed YouTube comment text.

If this is right

  • Large volumes of unstructured comments can be tracked automatically instead of manually.
  • Feature-importance maps can flag when external terminology such as political language begins to dominate satisfaction signals.
  • Polarity predictions for audience satisfaction become conditional on the surrounding socio-political context captured in the comments.
  • Preprocessing and PyCaret tuning steps can be reused as a template for similar comment-based prediction tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Satisfaction models built on social video comments may need explicit context filters to separate product opinion from political overlay.
  • The same pipeline could be tested on comments from other platforms to check whether the socio-political infiltration is YouTube-specific.
  • If political terms reliably shift polarity, e-commerce platforms might monitor comment streams for early signals of external events affecting brand perception.

Load-bearing premise

The chosen secondary YouTube comment collection from e-commerce videos is representative of customer satisfaction and that the PyCaret-tuned XGBoost truly outperforms alternatives without any detailed baseline comparisons or error analysis presented.

What would settle it

Retraining and testing the identical pipeline on a fresh, direct e-commerce review dataset that lacks socio-political terms and shows materially lower classification accuracy would falsify both the performance claim and the infiltration claim.

Figures

Figures reproduced from arXiv: 2605.04887 by Ardika Satria, Ihsan Maulana Yusuf, Luluk Muthoharoh, Martin Clinton Tosima Manullang, Muhammad Aqil Ramadhan, Ridho Benedictus Togi Manik.

Figure 1
Figure 1. Figure 1: Methodological Architecture 4 Results and Discussion 4.1 Exploratory Data Analysis (EDA) 4.1.1 Sentiment and Emotion Dispersion An initial examination of the sentiment labels exposes a drastically skewed distribution ( view at source ↗
Figure 2
Figure 2. Figure 2: Proportional Distribution of Sentiment Labels view at source ↗
Figure 3
Figure 3. Figure 3: Emotion Categories (Top) and Emotion￾Sentiment Crosstabulation (Bottom) 4.1.2 Textual Volume Characteristics Behavioral patterns become evident when analyzing com￾ment lengths. As illustrated in view at source ↗
Figure 5
Figure 5. Figure 5: High-Frequency Unigrams (Top) and Bigrams view at source ↗
Figure 8
Figure 8. Figure 8: Verification Word Cloud Post-Cleansing 4.3 Predictive Performance Benchmarking 4.3.1 Machine Learning Ensembles To identify the premier classifier, various models were rigorously tested. Support Vector Machines (SVM) ini￾tially secured the highest traditional baseline (Accuracy 76%, F1-Score 72%). An Automated Machine Learning (AutoML) architecture powered by PyCaret further refined these parameters, maint… view at source ↗
Figure 6
Figure 6. Figure 6: Corpus Word Clouds: Aggregated (Top), by Sen view at source ↗
Figure 9
Figure 9. Figure 9: Algorithm Benchmarks (Top) and Optimized view at source ↗
Figure 7
Figure 7. Figure 7: Text Transformation Comparison A secondary word cloud constructed purely from the cleansed text ensures that critical contextual markers were preserved while grammatical noise was successfully dis￾carded view at source ↗
Figure 10
Figure 10. Figure 10: LSTM Confusion Matrix (Top) and Model Con view at source ↗
read the original abstract

The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous challenge for manual sentiment tracking. This study investigates and constructs a predictive model for customer satisfaction leveraging the Extreme Gradient Boosting (XGBoost) architecture coupled with Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. By utilizing a secondary dataset of YouTube comments retrieved from e-commerce review videos, the raw text underwent rigorous preprocessing to generate normalized numerical features. The experimental results demonstrate that the PyCaret-optimized machine learning framework delivers superior classification resilience. Beyond standard performance metrics, lexical evaluations and feature-importance mapping uncover a notable phenomenon: e-commerce discourse is heavily infiltrated by socio-political terminologies, which ultimately influence the polarity of audience satisfaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript applies the XGBoost algorithm with TF-IDF vectorization to a secondary dataset of YouTube comments from Indonesian e-commerce review videos. It uses PyCaret for hyperparameter optimization to build a sentiment classifier for predicting customer satisfaction, claiming superior classification resilience, and uses feature-importance analysis to argue that socio-political terminology infiltrates e-commerce discourse and influences sentiment polarity.

Significance. If the performance claims were supported by concrete metrics, baselines, and validation details, the work could contribute modestly to applied NLP for consumer sentiment in social media, particularly by highlighting lexical overlaps between commercial and political language in emerging markets. The socio-political infiltration observation, if rigorously evidenced, might interest interdisciplinary researchers, but the current lack of empirical grounding limits any broader impact.

major comments (3)
  1. [Abstract] Abstract: The central claim that 'the PyCaret-optimized machine learning framework delivers superior classification resilience' is unsupported by any reported accuracy, F1, precision/recall, confusion matrix, cross-validation scores, or statistical tests. No baseline comparisons (e.g., logistic regression, SVM, or BERT) are mentioned, rendering the superiority assertion untestable.
  2. [Results] Experimental results / feature-importance section: The conclusion that 'e-commerce discourse is heavily infiltrated by socio-political terminologies, which ultimately influence the polarity of audience satisfaction' relies on lexical evaluations and feature-importance mapping, yet no top features, example terms, quantitative influence scores, or validation of this lexical effect are provided.
  3. [Methodology] Methodology / Data section: No dataset statistics (size, number of comments, class balance), labeling procedure for sentiment or satisfaction labels, collection method for the secondary YouTube dataset, or justification of its representativeness for customer satisfaction are given, undermining reproducibility and generalizability.
minor comments (1)
  1. [Abstract] The abstract and introduction could more explicitly define 'customer satisfaction' as operationalized from YouTube comments versus traditional review ratings.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the detailed and constructive feedback on our manuscript. We have carefully reviewed each major comment and will revise the paper to address the concerns regarding empirical support, reproducibility, and clarity. Our responses to the points are provided below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'the PyCaret-optimized machine learning framework delivers superior classification resilience' is unsupported by any reported accuracy, F1, precision/recall, confusion matrix, cross-validation scores, or statistical tests. No baseline comparisons (e.g., logistic regression, SVM, or BERT) are mentioned, rendering the superiority assertion untestable.

    Authors: We agree that the abstract's claim of superior classification resilience requires explicit numerical support to be verifiable. While the results section presents performance from the PyCaret-optimized XGBoost model, these metrics were not summarized in the abstract. In the revised manuscript, we will update the abstract to include concrete metrics such as accuracy, F1-score, precision, recall, and cross-validation scores. We will also add baseline comparisons against logistic regression and SVM (and note any limitations with BERT due to computational constraints) to substantiate the performance claims. revision: yes

  2. Referee: [Results] Experimental results / feature-importance section: The conclusion that 'e-commerce discourse is heavily infiltrated by socio-political terminologies, which ultimately influence the polarity of audience satisfaction' relies on lexical evaluations and feature-importance mapping, yet no top features, example terms, quantitative influence scores, or validation of this lexical effect are provided.

    Authors: The referee is correct that the socio-political infiltration claim needs more granular evidence. The manuscript performs feature-importance analysis via XGBoost and lexical evaluation, but specific top features and examples were not listed. In the revision, we will include a table of the highest-ranked features with their importance scores, provide concrete examples of socio-political terms (e.g., political or social-issue vocabulary appearing in comments), and discuss their quantitative influence on sentiment polarity with supporting data excerpts. revision: yes

  3. Referee: [Methodology] Methodology / Data section: No dataset statistics (size, number of comments, class balance), labeling procedure for sentiment or satisfaction labels, collection method for the secondary YouTube dataset, or justification of its representativeness for customer satisfaction are given, undermining reproducibility and generalizability.

    Authors: We acknowledge the omission of these essential details, which limits reproducibility. The study uses a secondary YouTube comments dataset from Indonesian e-commerce videos, but specifics were not elaborated. The revised manuscript will add dataset statistics (total comments, class balance), a full description of the labeling procedure for sentiment and satisfaction, the collection approach, and a justification of representativeness for e-commerce customer satisfaction in the Indonesian context. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML pipeline with standard tuning and no derivations

full rationale

The manuscript applies XGBoost classification to a secondary YouTube comments dataset after TF-IDF vectorization and PyCaret hyperparameter search. No equations, uniqueness theorems, or first-principles derivations appear; performance claims rest on experimental outputs rather than any reduction of predictions to fitted inputs by construction. Feature-importance observations about socio-political terms are post-hoc interpretations of model results, not self-definitional or self-cited premises. Standard library usage (XGBoost, PyCaret) and external dataset sourcing introduce no load-bearing self-citation chains. The work is therefore self-contained against external benchmarks with zero circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard supervised learning assumptions plus the unverified claim that the chosen dataset and vectorization capture customer satisfaction. No new entities are postulated.

free parameters (1)
  • XGBoost hyperparameters
    Tuned automatically by PyCaret; specific values not stated in abstract but required for the reported performance.
axioms (2)
  • domain assumption TF-IDF vectorization produces features that reliably encode sentiment polarity in YouTube comments
    Invoked during preprocessing step to convert raw text to numerical input for XGBoost.
  • domain assumption The secondary YouTube comment dataset is representative of general e-commerce customer satisfaction
    Required for the prediction model to generalize beyond the collected videos.

pith-pipeline@v0.9.0 · 5474 in / 1461 out tokens · 118807 ms · 2026-05-08T16:52:12.128258+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

  1. [1]

    Daza, A., et al. (2024). Sentiment Analysis on E-Commerce Product Re- views Using Machine Learning and Deep Learning Algorithms.Interna- tional Journal of Information Management Data Insights

  2. [2]

    A., & Rozi, F

    Ramadhani, W. A., & Rozi, F. (2025). Prediksi Kepuasan Pelanggan Berdasarkan Ulasan Produk di Lazada Indonesia Menggunakan Algoritma Decision Tree C4.5.Infotek

  3. [3]

    E., et al

    Sondakh, D. E., et al. (2024). Sentiment Analysis of Customer Satisfaction of Shopee Service Quality.11th International Scholars Conference (ISC)

  4. [4]

    Darmawan, T. D. (2022).Analisis Sentimen Review Pelanggan E-Commerce Di Indonesia Menggunakan Algoritma Naïve Bayes Classifier

  5. [5]

    Bahri, S., & Widodo, A. M. (2024). Penerapan Algoritma Pengklasifikasi Untuk Mengukur Kepuasan Pelanggan E-Commerce (Studi Kasus: Shopee). ADIJAYA

  6. [6]

    Tribuana, D., Baharuddin, & Resky, A. M. (2025). Penerapan Algoritma XGBoost Untuk Prediksi Kepuasan Pelanggan Pada Layanan E-Commerce. JTBC

  7. [7]

    S., et al

    Budi, E. S., et al. (2024). Analisa Kepuasan Pelanggan Terhadap Layanan Aplikasi E-Commerce Menggunakan Algoritma C4.5.RESOLUSI

  8. [8]

    Dewi, T., Asrianda, & Afrillia, Y . (2025). Sentiment Analysis of Customer Satisfaction Towards Shopee and Lazada E-commerce Platform Using the Random Forest Algorithm Classifier.IJESIT

  9. [9]

    Amari, O. E. S., & Udayasuriyan, A. (2026). Analyzing Customer Review Sentiments using Machine Learning.IJIRE

  10. [10]

    Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System.Proceedings of the 22nd ACM SIGKDD. 5