pith. sign in

arxiv: 2604.26312 · v1 · submitted 2026-04-29 · 💻 cs.CL

Classification of Public Opinion on the Free Nutritional Meal Program on YouTube Media Using the LSTM Method

Pith reviewed 2026-05-07 13:19 UTC · model grok-4.3

classification 💻 cs.CL
keywords sentiment analysisLSTMYouTube commentspublic opinionIndonesian textclass imbalancefree meal programsocial media
0
0 comments X

The pith

An LSTM model classifies sentiments in 7,733 YouTube comments on Indonesia's free nutritious meal program and reaches 89 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies the LSTM method to sort 7,733 YouTube comments about the Free Nutritional Meal Program into positive and negative sentiments. It reports that the model reaches 89 percent accuracy overall. The work shows LSTM can handle Indonesian-language social media text for policy feedback. A key issue is that negative comments make up most of the data, which affects how well the model spots positive opinions. This approach offers a way to gauge public reaction to government programs through online comments.

Core claim

The LSTM model achieves 89% accuracy in classifying sentiments from YouTube comments on the Free Nutritious Meal Program, with an F1-score of 0.94 for negative sentiment and 0.55 for positive sentiment, primarily due to the dataset having 87.7% negative comments. This demonstrates the method's effectiveness for Indonesian text while underscoring the need to address class imbalance in sentiment analysis tasks.

What carries the argument

The Long Short-Term Memory (LSTM) neural network, which processes sequential text data to classify comments as positive or negative sentiment.

If this is right

  • Public policy makers can use similar models to evaluate community responses to nutrition programs via social media.
  • The method confirms LSTM's suitability for analyzing Indonesian-language comments on platforms like YouTube.
  • Addressing class imbalance could improve detection of minority positive opinions in future studies.
  • This contributes to using online data for real-time public opinion monitoring on government initiatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Balancing the dataset or using techniques like oversampling might raise the positive sentiment F1-score without losing overall accuracy.
  • Similar LSTM setups could apply to other policy areas, such as education or health programs, to gather feedback.
  • If label accuracy is high, this could scale to larger comment volumes for ongoing monitoring.

Load-bearing premise

The assumption that the sentiment labels assigned to the 7,733 comments accurately reflect true public opinion without significant noise or bias in labeling.

What would settle it

A manual review of a sample of comments by multiple independent annotators showing low agreement with the original labels would indicate that the reported accuracy does not measure true model performance.

Figures

Figures reproduced from arXiv: 2604.26312 by Ardika Satria, Berliana Enda Putri, Lisa Diani Amelia, Luluk Muthoharoh, Martin Clinton Tosima Manullang, Muhammad Zaky Zaiddan.

Figure 1
Figure 1. Figure 1: Distribution of MBG Comment Sentiments 4.2 LSTM Model Results The Long Short-Term Memory (LSTM) model was trained using data that had undergone preprocessing with a total vocabulary of 16,378 unique tokens. The training process was conducted for 20 epochs using the Adam optimizer with a learning rate of 0.0005 and categorical crossentropy loss function. The model has a total of 2,228,738 parameters. Based … view at source ↗
Figure 2
Figure 2. Figure 2: This figure illustrates how the model’s performance improved progressively throughout the training phase. view at source ↗
Figure 2
Figure 2. Figure 2: Training & Validation Loss and Accuracy of the LSTM Model 4.3 Model Evaluation Evaluation of the model on 963 test samples yielded an accuracy of 89%. For the negative class, the model achieved a precision of 0.94, recall of 0.93, and f1-score of 0.94. Meanwhile, for the positive class, the model achieved a precision of 0.54, recall of 0.57, and f1-score of 0.55. These results indicate that the LSTM model … view at source ↗
Figure 3
Figure 3. Figure 3: Confusion Matrix 6 view at source ↗
read the original abstract

Public opinion towards the Free Nutritious Meal Program (MBG) on YouTube social media reflects diverse community responses. This study applies the Long Short-Term Memory (LSTM) method to classify sentiments from 7,733 YouTube comments. The results show that the LSTM model achieves 89% accuracy, with strong performance on negative sentiment (F1-score 0.94) but weaker performance on positive sentiment (F1-score 0.55) due to class imbalance, as negative data account for 87.7% of the dataset. These findings confirm the effectiveness of LSTM for sentiment analysis of Indonesian text while highlighting the challenge of imbalanced data. This research contributes to social media-based public policy evaluation

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper applies the LSTM model to perform sentiment classification on a dataset of 7,733 YouTube comments about Indonesia's Free Nutritional Meal Program (MBG). It reports an overall accuracy of 89%, with an F1-score of 0.94 on the negative class and 0.55 on the positive class, and attributes the performance gap to severe class imbalance (87.7% negative comments). The work positions this as evidence of LSTM's suitability for Indonesian-language social media sentiment analysis and as a tool for public-policy evaluation.

Significance. If the ground-truth labels prove reliable and the experimental protocol is properly documented, the study offers a concrete case study of LSTM-based sentiment analysis on policy-related Indonesian text. It could be useful for practitioners monitoring public reaction to government programs via social media. The explicit discussion of class imbalance is a minor positive, but the absence of label validation, baselines, and standard experimental controls substantially reduces the work's contribution to the cs.CL literature.

major comments (3)
  1. [Abstract and Results] Abstract and Results section: the headline claim of 89% accuracy is only 1.3 points above the 87.7% negative-class majority rate. Without a reported train/test split, hyperparameter search procedure, or comparison against a majority-class or bag-of-words baseline, it is impossible to determine whether the LSTM has learned anything beyond the class prior.
  2. [Data collection / annotation] Data collection / annotation description (wherever it appears): the manuscript provides no information on how the 7,733 comments were labeled for sentiment (number of annotators, guidelines, adjudication process, or inter-annotator agreement such as Cohen's kappa). Because all reported F1 scores rest on these labels, the lack of validation renders the performance numbers uninterpretable.
  3. [Experimental setup] Experimental setup: no mention is made of how class imbalance was addressed (class weights, oversampling, threshold tuning) or of any external validation set. These omissions are load-bearing for the claim that LSTM is “effective” for this task.
minor comments (2)
  1. [Title and Abstract] The paper should expand the acronym MBG on first use and consistently use the full program name in the title and abstract.
  2. [Results] Figure or table presenting the confusion matrix or per-class precision/recall would make the imbalance effect clearer than the current text description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate additional experimental details, baselines, and documentation of the annotation process.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results section: the headline claim of 89% accuracy is only 1.3 points above the 87.7% negative-class majority rate. Without a reported train/test split, hyperparameter search procedure, or comparison against a majority-class or bag-of-words baseline, it is impossible to determine whether the LSTM has learned anything beyond the class prior.

    Authors: We agree that the 89% accuracy is close to the majority baseline and that explicit comparisons are needed to demonstrate the model's value. In the revised manuscript we will add a majority-class baseline and a bag-of-words logistic regression baseline, report the 80/20 train/test split used, and describe the grid-search hyperparameter procedure performed on a validation set. These additions will show the incremental benefit of the LSTM beyond the class prior. revision: yes

  2. Referee: [Data collection / annotation] Data collection / annotation description (wherever it appears): the manuscript provides no information on how the 7,733 comments were labeled for sentiment (number of annotators, guidelines, adjudication process, or inter-annotator agreement such as Cohen's kappa). Because all reported F1 scores rest on these labels, the lack of validation renders the performance numbers uninterpretable.

    Authors: We acknowledge the omission. The comments were labeled by two annotators experienced in Indonesian social-media text using explicit positive/negative guidelines focused on expressed opinions about the program; disagreements were resolved by discussion. We will add a new subsection detailing the annotation protocol and report the observed inter-annotator agreement in the revised version. revision: yes

  3. Referee: [Experimental setup] Experimental setup: no mention is made of how class imbalance was addressed (class weights, oversampling, threshold tuning) or of any external validation set. These omissions are load-bearing for the claim that LSTM is “effective” for this task.

    Authors: No explicit imbalance-handling techniques (class weights, oversampling, or threshold tuning) were applied, which explains the lower positive-class F1-score already noted in the paper. A standard train/validation/test split was used. We will expand the experimental-setup section to state these choices explicitly, treat the lack of balancing as a limitation, and discuss implications for the effectiveness claim. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical LSTM pipeline with no self-referential reduction

full rationale

The paper collects 7,733 YouTube comments on the MBG program, applies standard LSTM-based sentiment classification after preprocessing, and reports accuracy (89%) plus per-class F1 scores on the resulting labels. No equation, derivation step, or 'prediction' reduces to its own inputs by construction. There are no fitted parameters renamed as predictions, no self-citation load-bearing uniqueness claims, no ansatz smuggled via prior work, and no renaming of known results. The evaluation metrics are computed from model outputs versus the (externally supplied) sentiment labels; this is ordinary supervised learning rather than a tautological loop. While the labeling process itself is undescribed, that is a validity concern, not a circularity in the derivation chain. The work remains self-contained as an empirical application against its own dataset.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that the collected comments form a representative sample and that manual or rule-based labeling produced reliable ground truth; no free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5447 in / 1139 out tokens · 39820 ms · 2026-05-07T13:19:46.801343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

  1. [1]

    Sentiment Analysis and Opinion Mining,

    B. Liu, “Sentiment Analysis and Opinion Mining,”Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1–167, 2012

  2. [2]

    Twitter as a Corpus for Sentiment Analysis and Opinion Mining,

    A. Pak and P. Paroubek, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” inProceedings of the International Conference on Language Resources and Evaluation (LREC), 2010, pp. 1320–1326

  3. [3]

    Sentiment Analysis Algorithms and Applications: A Survey,

    W. Medhat, A. Hassan, and H. Korashy, “Sentiment Analysis Algorithms and Applications: A Survey,”Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, Dec. 2014

  4. [4]

    Long Short-Term Memory,

    S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997

  5. [5]

    Sentiment Analysis and Topic Classification with LSTM Networks and TextRazor,

    J. Jose and S. R, “Sentiment Analysis and Topic Classification with LSTM Networks and TextRazor,”International Journal of Data Informatics and Intelligent Computing, vol. 3, no. 2, pp. 42–51, 2024

  6. [6]

    LSTM Based Sentiment Analysis,

    M. Bargavi and S. K. Rekha, “LSTM Based Sentiment Analysis,”International Journal of Trend in Scientific Research and Development, vol. 5, no. 4, pp. 728–732, 2021

  7. [7]

    Sentiment Analysis of Negative Comments on Social Media Using Long Short-Term Memory (LSTM) Method with TensorFlow Framework,

    I. G. Waluyo and Juwono, “Sentiment Analysis of Negative Comments on Social Media Using Long Short-Term Memory (LSTM) Method with TensorFlow Framework,”International Journal of Integrative Sciences, vol. 2, no. 7, pp. 4990–4998, 2023

  8. [8]

    Comparison of RNN and LSTM Classifiers for Sentiment Analysis of Airline Tweets,

    R. Y . Ahmed, N. F. Yuosif, S. A. Ahmed, and A. A. Mohammed, “Comparison of RNN and LSTM Classifiers for Sentiment Analysis of Airline Tweets,”Journal of Information Systems and Informatics, vol. 7, no. 2, pp. 1–12, 2025

  9. [9]

    A Hybrid Sentiment Analysis Model Based on Deep Learning and Ensemble Algorithms,

    A. Onan, S. Koruko ˘glu, and H. Bulut, "A Hybrid Sentiment Analysis Model Based on Deep Learning and Ensemble Algorithms," IEEE Access, vol. 4, pp. 4790–4805, 2016

  10. [10]

    Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia pada Twitter dengan Metode Support Vector Machine dan Lexicon Based Features,

    M. R. Alfina, R. S. Perdana, and M. A. Fauzi, "Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia pada Twitter dengan Metode Support Vector Machine dan Lexicon Based Features," Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 1, no. 12, pp. 1625–1632, 2017. 7