Sentiment Analysis of AI Adoption in Indonesian Higher Education Using Machine Learning and Transformer-Based Models

Ahmad Sahidin Akbar; Ardika Satria; Happy Syahrul Ramadhan; Karin Yehezkiel Sinaga; Luluk Muthoharoh; Martin C.T. Manullang

arxiv: 2604.27439 · v1 · submitted 2026-04-30 · 💻 cs.CL

Sentiment Analysis of AI Adoption in Indonesian Higher Education Using Machine Learning and Transformer-Based Models

Happy Syahrul Ramadhan , Ahmad Sahidin Akbar , Karin Yehezkiel Sinaga , Luluk Muthoharoh , Ardika Satria , Martin C.T. Manullang This is my paper

Pith reviewed 2026-05-07 09:47 UTC · model grok-4.3

classification 💻 cs.CL

keywords sentiment analysisAI adoptionhigher educationIndonesian studentsDistilBERTmachine learningtransformer modelsSVM

0 comments

The pith

DistilBERT outperforms SVM and other models in classifying Indonesian student sentiments on AI adoption in higher education.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares traditional machine learning models using TF-IDF features against a fine-tuned DistilBERT transformer on a dataset of Indonesian opinions about AI in universities. It finds that the transformer model achieves the highest accuracy and F1-score, while SVM leads among the machine learning approaches. A sympathetic reader would care because understanding student views can inform how universities introduce AI tools, and knowing which analysis method works best helps scale such studies. The work applies these techniques to a specific cultural and educational context in Indonesia.

Core claim

By combining 1,154 student opinions with lexical sentiment data to form 2,295 labeled samples, the study evaluates LightGBM, Random Forest, SVM, and DistilBERT for binary sentiment classification. DistilBERT reaches 84.78% accuracy and 84.75% F1-score, surpassing SVM's 82.14% test accuracy and F1-score, indicating that transformer-based models better handle contextual information in this domain.

What carries the argument

Fine-tuned DistilBERT for binary sentiment classification, compared against TF-IDF vectorized machine learning models like SVM.

Load-bearing premise

The 2,295 labeled samples accurately reflect true student sentiments without significant labeling errors or bias.

What would settle it

Retraining and testing the models on a fresh set of independently verified student opinions from Indonesian universities that shows DistilBERT no longer leading in accuracy.

Figures

Figures reproduced from arXiv: 2604.27439 by Ahmad Sahidin Akbar, Ardika Satria, Happy Syahrul Ramadhan, Karin Yehezkiel Sinaga, Luluk Muthoharoh, Martin C.T. Manullang.

**Figure 1.** Figure 1: Research workflow view at source ↗

**Figure 2.** Figure 2: Machine learning pipeline for sentiment classification. view at source ↗

**Figure 3.** Figure 3: DistilBERT architecture for binary sentiment classification. view at source ↗

**Figure 4.** Figure 4: Confusion matrix of the SVM model on the test data. view at source ↗

**Figure 5.** Figure 5: Confusion matrix of the DistilBERT model on the test data. view at source ↗

read the original abstract

This study analyzes Indonesian student opinions on the adoption of artificial intelligence in higher education using two approaches: TF-IDF-based machine learning and Transformer-based deep learning. The dataset consists of 2,295 labeled samples, combining 1,154 student opinions with additional lexical sentiment data. LightGBM, Random Forest, and Support Vector Machine (SVM) are evaluated as machine learning models, while DistilBERT is fine-tuned for binary sentiment classification. The results show that SVM achieves the best performance among the machine learning models with 82.14% test accuracy and F1-score, while DistilBERT performs best overall with 84.78% accuracy and 84.75% F1-score. These findings indicate that Transformer-based models better capture contextual information, although SVM remains a competitive and efficient alternative for sentiment classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard application of DistilBERT and TF-IDF models to a new Indonesian dataset on AI sentiment in higher ed, but the 84.78% headline number rests on undocumented labels.

read the letter

The main thing to know is that this paper applies off-the-shelf sentiment tools to Indonesian student opinions about AI in universities and reports DistilBERT slightly ahead of SVM at 84.78% accuracy. Nothing in the method is new, but the localized data could matter for people tracking education tech in that region. They combine 1,154 student opinions with lexical sentiment data to reach 2,295 samples, run LightGBM, Random Forest, and SVM on TF-IDF features, then fine-tune DistilBERT for binary classification. The results line up with the usual pattern that the transformer picks up context better than bag-of-words models, and SVM holds its own as a fast baseline. That part is straightforward and useful as a data point. The soft spot is the data itself. The abstract gives no information on how the labels were created, where the lexical component came from, whether anyone checked agreement between annotators, or how the added lexical samples were validated against the student opinions. If the lexical data introduces noise or a distribution shift, both the absolute scores and the ranking between models become difficult to trust. No train-test split details or significance tests are mentioned either. The paper does not claim a new algorithm or theoretical insight, so those gaps matter more than they would in a methods paper. Readers who need empirical numbers on Indonesian higher-ed sentiment might still pull something from the dataset if it is released with better documentation. For a general NLP audience the work is too routine to stand out. I would send it to peer review rather than desk reject, but only if the authors add the missing labeling procedure and validation steps; without them the central performance claims stay hard to evaluate.

Referee Report

1 major / 2 minor

Summary. The paper evaluates TF-IDF-based machine learning classifiers (LightGBM, Random Forest, SVM) against a fine-tuned DistilBERT model for binary sentiment classification of Indonesian-language opinions on AI adoption in higher education. It uses a combined dataset of 2,295 labeled samples (1,154 student opinions plus lexical sentiment data), reports SVM as the strongest ML model at 82.14% test accuracy and F1, and DistilBERT as the overall best at 84.78% accuracy and 84.75% F1.

Significance. If the performance numbers hold under verified labeling and evaluation protocols, the work supplies a useful empirical baseline for transformer versus classical ML trade-offs on a low-resource language task in an education domain. The direct comparison of three ML models with one transformer is a practical strength.

major comments (1)

[Abstract and Methods] Abstract and Methods: the central performance claims (DistilBERT 84.78% accuracy / 84.75% F1; SVM 82.14% accuracy / F1) rest on the 2,295-sample dataset whose labeling process, lexicon source, inter-annotator agreement for the 1,154 student opinions, and distributional match between lexical and student-opinion components are not described. Without these details the reported metrics and model ranking cannot be interpreted or reproduced.

minor comments (2)

[Results] Results section: the abstract and results give point estimates for accuracy and F1 but omit the train-test split ratio, whether stratified sampling or cross-validation was used, and any statistical significance test for the observed differences between models.
[Methods] Methods: hyperparameter search procedure, learning-rate schedule for DistilBERT fine-tuning, and exact TF-IDF configuration (n-gram range, vocabulary size) are not reported, limiting reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and will revise the paper accordingly to improve clarity and reproducibility.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and Methods: the central performance claims (DistilBERT 84.78% accuracy / 84.75% F1; SVM 82.14% accuracy / F1) rest on the 2,295-sample dataset whose labeling process, lexicon source, inter-annotator agreement for the 1,154 student opinions, and distributional match between lexical and student-opinion components are not described. Without these details the reported metrics and model ranking cannot be interpreted or reproduced.

Authors: We agree that the current manuscript provides insufficient detail on dataset construction. In the revised version we will expand the Methods section with a new subsection that explicitly describes: (1) the source and curation of the lexical sentiment data, (2) the collection and labeling protocol for the 1,154 student opinions (including annotation guidelines), (3) inter-annotator agreement statistics for the student-opinion subset, and (4) a quantitative comparison of sentiment label distributions between the lexical and student-opinion components to justify their combination. These additions will directly support interpretation and reproducibility of the reported accuracy and F1 scores. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model evaluation on held-out data

full rationale

The paper reports direct empirical results from training ML models (LightGBM, Random Forest, SVM) and fine-tuning DistilBERT on a 2,295-sample dataset, then measuring accuracy and F1 on a test split. No equations, derivations, or self-citations are invoked to reduce the reported metrics to quantities defined by the same fitted parameters. The performance numbers (e.g., DistilBERT 84.78% accuracy) are computed outputs from standard train/test evaluation, not predictions forced by construction from the input labels or model choices. Label quality concerns are a separate validity issue, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central performance claims rest on the validity of the collected labels and the assumption that standard supervised learning procedures transfer without hidden biases or data leakage.

axioms (2)

domain assumption The 2,295 samples are correctly labeled and representative of the target population of Indonesian student opinions.
Invoked implicitly when reporting training and test accuracies as meaningful performance measures.
standard math TF-IDF features plus standard classifiers and DistilBERT fine-tuning constitute appropriate methods for the binary sentiment task.
Background assumption drawn from prior NLP literature; no derivation supplied in the abstract.

pith-pipeline@v0.9.0 · 5465 in / 1369 out tokens · 68638 ms · 2026-05-07T09:47:35.379873+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Sentiment analysis and opinion mining

Bing Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2012. doi:10.2200/S00416ED1V01Y201204HLT016

work page doi:10.2200/s00416ed1v01y201204hlt016 2012
[2]

Gomez, ukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017
[3]

Sentiment analysis algorithms and applications: A survey

Walaa Medhat, Ahmed Hassan, and Hoda Korashy. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5 0 (4): 0 1093--1113, 2014. doi:10.1016/j.asej.2014.04.011

work page doi:10.1016/j.asej.2014.04.011 2014
[4]

IndoNLU : Benchmark and resources for evaluating I ndonesian natural language understanding

Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, and Ayu Purwarianti. IndoNLU : Benchmark and resources for evaluating I ndonesian natural language understanding. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association ...

work page 2020
[5]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT , a distilled version of BERT : Smaller, faster, cheaper and lighter. In 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing ( EMC ^2 ) at NeurIPS 2019 , 2019. URL https://arxiv.org/abs/1910.01108

work page internal anchor Pith review arXiv 2019
[6]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT : Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 4171--4186. Association for Computational Linguistics...

work page doi:10.18653/v1/n19-1423 2019
[7]

Sutoyo, A

E. Sutoyo, A. Almaarif, and A. Kurniawan. PRDECT-ID : Indonesian product reviews dataset for emotions classification tasks. Data in Brief, 44: 0 108554, 2022

work page 2022
[8]

Journal of Computational and Applied Mathematics, 20:53–65

Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24 0 (5): 0 513--523, 1988. doi:10.1016/0306-4573(88)90021-0

work page doi:10.1016/0306-4573(88)90021-0 1988
[9]

Cortes, V

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20 0 (3): 0 273--297, 1995. doi:10.1007/BF00994018

work page doi:10.1007/bf00994018 1995
[10]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R \'e mi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yannig Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[11]

Opportunities and challenges of integrating ChatGPT in education: Sentiment analysis and topic modeling

Surat Teerakapibal and Poompak Kusawat. Opportunities and challenges of integrating ChatGPT in education: Sentiment analysis and topic modeling. Journal of Education for Business, pages 1--12, 2025. doi:10.1080/08832323.2025.2536255

work page doi:10.1080/08832323.2025.2536255 2025
[12]

Dhole, et al

Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali ...

work page doi:10.18653/v1/2023.findings-acl.868 2023

[1] [1]

Sentiment analysis and opinion mining

Bing Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2012. doi:10.2200/S00416ED1V01Y201204HLT016

work page doi:10.2200/s00416ed1v01y201204hlt016 2012

[2] [2]

Gomez, ukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017

[3] [3]

Sentiment analysis algorithms and applications: A survey

Walaa Medhat, Ahmed Hassan, and Hoda Korashy. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5 0 (4): 0 1093--1113, 2014. doi:10.1016/j.asej.2014.04.011

work page doi:10.1016/j.asej.2014.04.011 2014

[4] [4]

IndoNLU : Benchmark and resources for evaluating I ndonesian natural language understanding

Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, and Ayu Purwarianti. IndoNLU : Benchmark and resources for evaluating I ndonesian natural language understanding. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association ...

work page 2020

[5] [5]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT , a distilled version of BERT : Smaller, faster, cheaper and lighter. In 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing ( EMC ^2 ) at NeurIPS 2019 , 2019. URL https://arxiv.org/abs/1910.01108

work page internal anchor Pith review arXiv 2019

[6] [6]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT : Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 4171--4186. Association for Computational Linguistics...

work page doi:10.18653/v1/n19-1423 2019

[7] [7]

Sutoyo, A

E. Sutoyo, A. Almaarif, and A. Kurniawan. PRDECT-ID : Indonesian product reviews dataset for emotions classification tasks. Data in Brief, 44: 0 108554, 2022

work page 2022

[8] [8]

Journal of Computational and Applied Mathematics, 20:53–65

Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24 0 (5): 0 513--523, 1988. doi:10.1016/0306-4573(88)90021-0

work page doi:10.1016/0306-4573(88)90021-0 1988

[9] [9]

Cortes, V

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20 0 (3): 0 273--297, 1995. doi:10.1007/BF00994018

work page doi:10.1007/bf00994018 1995

[10] [10]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R \'e mi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yannig Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020

[11] [11]

Opportunities and challenges of integrating ChatGPT in education: Sentiment analysis and topic modeling

Surat Teerakapibal and Poompak Kusawat. Opportunities and challenges of integrating ChatGPT in education: Sentiment analysis and topic modeling. Journal of Education for Business, pages 1--12, 2025. doi:10.1080/08832323.2025.2536255

work page doi:10.1080/08832323.2025.2536255 2025

[12] [12]

Dhole, et al

Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali ...

work page doi:10.18653/v1/2023.findings-acl.868 2023