pith. sign in

arxiv: 2604.27439 · v1 · submitted 2026-04-30 · 💻 cs.CL

Sentiment Analysis of AI Adoption in Indonesian Higher Education Using Machine Learning and Transformer-Based Models

Pith reviewed 2026-05-07 09:47 UTC · model grok-4.3

classification 💻 cs.CL
keywords sentiment analysisAI adoptionhigher educationIndonesian studentsDistilBERTmachine learningtransformer modelsSVM
0
0 comments X

The pith

DistilBERT outperforms SVM and other models in classifying Indonesian student sentiments on AI adoption in higher education.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares traditional machine learning models using TF-IDF features against a fine-tuned DistilBERT transformer on a dataset of Indonesian opinions about AI in universities. It finds that the transformer model achieves the highest accuracy and F1-score, while SVM leads among the machine learning approaches. A sympathetic reader would care because understanding student views can inform how universities introduce AI tools, and knowing which analysis method works best helps scale such studies. The work applies these techniques to a specific cultural and educational context in Indonesia.

Core claim

By combining 1,154 student opinions with lexical sentiment data to form 2,295 labeled samples, the study evaluates LightGBM, Random Forest, SVM, and DistilBERT for binary sentiment classification. DistilBERT reaches 84.78% accuracy and 84.75% F1-score, surpassing SVM's 82.14% test accuracy and F1-score, indicating that transformer-based models better handle contextual information in this domain.

What carries the argument

Fine-tuned DistilBERT for binary sentiment classification, compared against TF-IDF vectorized machine learning models like SVM.

Load-bearing premise

The 2,295 labeled samples accurately reflect true student sentiments without significant labeling errors or bias.

What would settle it

Retraining and testing the models on a fresh set of independently verified student opinions from Indonesian universities that shows DistilBERT no longer leading in accuracy.

Figures

Figures reproduced from arXiv: 2604.27439 by Ahmad Sahidin Akbar, Ardika Satria, Happy Syahrul Ramadhan, Karin Yehezkiel Sinaga, Luluk Muthoharoh, Martin C.T. Manullang.

Figure 1
Figure 1. Figure 1: Research workflow view at source ↗
Figure 2
Figure 2. Figure 2: Machine learning pipeline for sentiment classification. view at source ↗
Figure 3
Figure 3. Figure 3: DistilBERT architecture for binary sentiment classification. view at source ↗
Figure 4
Figure 4. Figure 4: Confusion matrix of the SVM model on the test data. view at source ↗
Figure 5
Figure 5. Figure 5: Confusion matrix of the DistilBERT model on the test data. view at source ↗
read the original abstract

This study analyzes Indonesian student opinions on the adoption of artificial intelligence in higher education using two approaches: TF-IDF-based machine learning and Transformer-based deep learning. The dataset consists of 2,295 labeled samples, combining 1,154 student opinions with additional lexical sentiment data. LightGBM, Random Forest, and Support Vector Machine (SVM) are evaluated as machine learning models, while DistilBERT is fine-tuned for binary sentiment classification. The results show that SVM achieves the best performance among the machine learning models with 82.14% test accuracy and F1-score, while DistilBERT performs best overall with 84.78% accuracy and 84.75% F1-score. These findings indicate that Transformer-based models better capture contextual information, although SVM remains a competitive and efficient alternative for sentiment classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper evaluates TF-IDF-based machine learning classifiers (LightGBM, Random Forest, SVM) against a fine-tuned DistilBERT model for binary sentiment classification of Indonesian-language opinions on AI adoption in higher education. It uses a combined dataset of 2,295 labeled samples (1,154 student opinions plus lexical sentiment data), reports SVM as the strongest ML model at 82.14% test accuracy and F1, and DistilBERT as the overall best at 84.78% accuracy and 84.75% F1.

Significance. If the performance numbers hold under verified labeling and evaluation protocols, the work supplies a useful empirical baseline for transformer versus classical ML trade-offs on a low-resource language task in an education domain. The direct comparison of three ML models with one transformer is a practical strength.

major comments (1)
  1. [Abstract and Methods] Abstract and Methods: the central performance claims (DistilBERT 84.78% accuracy / 84.75% F1; SVM 82.14% accuracy / F1) rest on the 2,295-sample dataset whose labeling process, lexicon source, inter-annotator agreement for the 1,154 student opinions, and distributional match between lexical and student-opinion components are not described. Without these details the reported metrics and model ranking cannot be interpreted or reproduced.
minor comments (2)
  1. [Results] Results section: the abstract and results give point estimates for accuracy and F1 but omit the train-test split ratio, whether stratified sampling or cross-validation was used, and any statistical significance test for the observed differences between models.
  2. [Methods] Methods: hyperparameter search procedure, learning-rate schedule for DistilBERT fine-tuning, and exact TF-IDF configuration (n-gram range, vocabulary size) are not reported, limiting reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and will revise the paper accordingly to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [Abstract and Methods] Abstract and Methods: the central performance claims (DistilBERT 84.78% accuracy / 84.75% F1; SVM 82.14% accuracy / F1) rest on the 2,295-sample dataset whose labeling process, lexicon source, inter-annotator agreement for the 1,154 student opinions, and distributional match between lexical and student-opinion components are not described. Without these details the reported metrics and model ranking cannot be interpreted or reproduced.

    Authors: We agree that the current manuscript provides insufficient detail on dataset construction. In the revised version we will expand the Methods section with a new subsection that explicitly describes: (1) the source and curation of the lexical sentiment data, (2) the collection and labeling protocol for the 1,154 student opinions (including annotation guidelines), (3) inter-annotator agreement statistics for the student-opinion subset, and (4) a quantitative comparison of sentiment label distributions between the lexical and student-opinion components to justify their combination. These additions will directly support interpretation and reproducibility of the reported accuracy and F1 scores. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model evaluation on held-out data

full rationale

The paper reports direct empirical results from training ML models (LightGBM, Random Forest, SVM) and fine-tuning DistilBERT on a 2,295-sample dataset, then measuring accuracy and F1 on a test split. No equations, derivations, or self-citations are invoked to reduce the reported metrics to quantities defined by the same fitted parameters. The performance numbers (e.g., DistilBERT 84.78% accuracy) are computed outputs from standard train/test evaluation, not predictions forced by construction from the input labels or model choices. Label quality concerns are a separate validity issue, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central performance claims rest on the validity of the collected labels and the assumption that standard supervised learning procedures transfer without hidden biases or data leakage.

axioms (2)
  • domain assumption The 2,295 samples are correctly labeled and representative of the target population of Indonesian student opinions.
    Invoked implicitly when reporting training and test accuracies as meaningful performance measures.
  • standard math TF-IDF features plus standard classifiers and DistilBERT fine-tuning constitute appropriate methods for the binary sentiment task.
    Background assumption drawn from prior NLP literature; no derivation supplied in the abstract.

pith-pipeline@v0.9.0 · 5465 in / 1369 out tokens · 68638 ms · 2026-05-07T09:47:35.379873+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    Sentiment analysis and opinion mining

    Bing Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2012. doi:10.2200/S00416ED1V01Y201204HLT016

  2. [2]

    Gomez, ukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

  3. [3]

    Sentiment analysis algorithms and applications: A survey

    Walaa Medhat, Ahmed Hassan, and Hoda Korashy. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5 0 (4): 0 1093--1113, 2014. doi:10.1016/j.asej.2014.04.011

  4. [4]

    IndoNLU : Benchmark and resources for evaluating I ndonesian natural language understanding

    Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, and Ayu Purwarianti. IndoNLU : Benchmark and resources for evaluating I ndonesian natural language understanding. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association ...

  5. [5]

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT , a distilled version of BERT : Smaller, faster, cheaper and lighter. In 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing ( EMC ^2 ) at NeurIPS 2019 , 2019. URL https://arxiv.org/abs/1910.01108

  6. [6]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT : Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 4171--4186. Association for Computational Linguistics...

  7. [7]

    Sutoyo, A

    E. Sutoyo, A. Almaarif, and A. Kurniawan. PRDECT-ID : Indonesian product reviews dataset for emotions classification tasks. Data in Brief, 44: 0 108554, 2022

  8. [8]

    Journal of Computational and Applied Mathematics, 20:53–65

    Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24 0 (5): 0 513--523, 1988. doi:10.1016/0306-4573(88)90021-0

  9. [9]

    Cortes, V

    Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20 0 (3): 0 273--297, 1995. doi:10.1007/BF00994018

  10. [10]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R \'e mi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yannig Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the...

  11. [11]

    Opportunities and challenges of integrating ChatGPT in education: Sentiment analysis and topic modeling

    Surat Teerakapibal and Poompak Kusawat. Opportunities and challenges of integrating ChatGPT in education: Sentiment analysis and topic modeling. Journal of Education for Business, pages 1--12, 2025. doi:10.1080/08832323.2025.2536255

  12. [12]

    Dhole, et al

    Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali ...