CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification
Pith reviewed 2026-05-18 05:45 UTC · model grok-4.3
The pith
A cosine-similarity gate tied to a learned toxicity prototype lets a compact LSTM emphasize minority-class features and surpass fine-tuned BERT on imbalanced toxic comments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a prototype-guided cosine-similarity gate in feature space counters gradient dilution for minority toxic classes. Token embeddings are multiplied by a scalar gate equal to their cosine similarity with a learned toxicity prototype vector, which adaptively amplifies directions informative for labels such as threat and identity_hate. When this gate is combined with frozen GloVe, FastText, and BERT-CLS embeddings, a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss, the resulting model reaches 0.881 macro-F1 on the Jigsaw benchmark while remaining an order of magnitude smaller than BERT.
What carries the argument
cosine-similarity feature gating mechanism that rescales token embeddings according to their directional alignment with a learned toxicity prototype vector
If this is right
- The gating step accounts for the largest share of the performance lift, with removal causing a 4.8-point macro-F1 drop in ablations.
- Gains concentrate on the rarest labels, reaching +71 percent F1 on severe_toxic relative to fine-tuned BERT.
- The full model uses roughly 15 times fewer parameters than BERT and runs at 48 ms CPU latency while still beating both BERT and XGBoost.
- The same architecture transfers to a second abuse dataset at 0.71 macro-F1 in zero-shot mode and 0.73 after light threshold adjustment.
Where Pith is reading between the lines
- The same prototype-plus-cosine gate could be inserted into other recurrent or transformer backbones for any text task where a few critical classes are outnumbered by orders of magnitude.
- Tracking how the prototype vector moves during training might expose which dimensions in the frozen embeddings are most diagnostic for different subtypes of toxicity.
- If the gate generalizes, it offers a route to keep model size small even when new, even rarer abuse categories are added to moderation pipelines.
Load-bearing premise
The learned toxicity prototype and the cosine-similarity computation will consistently identify and boost the embedding directions that matter for minority classes without overfitting to the particular distribution of the training data.
What would settle it
An ablation that replaces the learned prototype with a fixed random vector and still records the same macro-F1 on the Jigsaw test set would show that the specific prototype-guided direction selection is not required for the reported gains.
Figures
read the original abstract
Toxic text classification for online moderation remains challenging under extreme class imbalance, where rare but high-risk labels such as threat and severe_toxic are consistently underdetected by conventional models. We propose CoGate-LSTM, a parameter-efficient recurrent architecture built around a novel cosine-similarity feature gating mechanism that adaptively rescales token embeddings by their directional similarity to a learned toxicity prototype. Unlike token-position attention, the gate emphasizes feature directions most informative for minority toxic classes. The model combines frozen multi-source embeddings (GloVe, FastText, and BERT-CLS), a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss. On the Jigsaw Toxic Comment benchmark, CoGate-LSTM achieves 0.881 macro-F1 (95% CI: [0.873, 0.889]) and 96.0% accuracy, outperforming fine-tuned BERT by 6.9 macro-F1 points (p < 0.001) and XGBoost by 4.7, while using only 7.3M parameters (about 15$\times$ fewer than BERT) and 48 ms CPU inference latency. Gains are strongest on minority labels, with F1 improvements of +71% for severe_toxic, +33% for threat, and +28% for identity_hate relative to fine-tuned BERT. Ablations identify cosine gating as the primary driver of performance (-4.8 macro-F1 when removed), with additional benefits from character-level fusion (-2.4) and multi-head attention (-2.9). CoGate-LSTM also transfers reasonably across datasets, reaching a 0.71 macro-F1 zero-shot on the Contextual Abuse Dataset and 0.73 with lightweight threshold adaptation. These results show that direction-aware feature gating offers an effective and efficient alternative to large, fully fine-tuned transformers for classifying imbalanced toxic comments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CoGate-LSTM, a parameter-efficient recurrent model that uses a learned toxicity prototype vector to drive cosine-similarity feature-space gating on token embeddings. The architecture fuses frozen multi-source embeddings (GloVe, FastText, BERT-CLS), a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss. On the Jigsaw Toxic Comment benchmark the model reports 0.881 macro-F1 (95% CI [0.873, 0.889]) and 96.0% accuracy, outperforming fine-tuned BERT by 6.9 macro-F1 points (p < 0.001) while using 7.3 M parameters and 48 ms CPU latency; largest gains appear on minority labels. Ablations attribute the primary improvement to the gating mechanism, and zero-shot transfer to the Contextual Abuse Dataset yields 0.71 macro-F1.
Significance. If the central empirical claims hold, the work demonstrates a lightweight, direction-aware gating alternative to full transformer fine-tuning for severely imbalanced toxic-comment tasks. The reported parameter count, inference latency, statistical significance testing, confidence intervals, and ablation deltas are clear strengths that support reproducibility and practical utility. The approach could be relevant for resource-constrained moderation pipelines, provided the prototype generalizes beyond the training distribution.
major comments (2)
- [Abstract / Results] The central performance claims rest on the learned toxicity prototype and cosine-similarity gate (abstract and methods). Because the prototype is end-to-end optimized on the Jigsaw training split, it may encode dataset-specific annotation biases or label co-occurrences rather than transferable toxicity directions; the single 0.71 zero-shot result on Contextual Abuse does not yet rule out mild overfitting that would inflate the reported +71% relative F1 lift on severe_toxic and the 6.9-point margin over BERT.
- [Ablation study] §4 (ablation study): the -4.8 macro-F1 drop when cosine gating is removed is load-bearing for the architectural claim, yet the exact configuration of the ablated baseline (whether the prototype vector is still present, how the gate is replaced, and whether focal-loss parameters remain identical) is not stated, preventing direct attribution of the gain to the proposed mechanism.
minor comments (3)
- [Methods] The manuscript should report the exact dimensionality of the toxicity prototype vector and its initialization scheme to allow replication of the gating computation.
- [Results] Table or figure presenting per-class F1 scores should include the corresponding values for all baselines (BERT, XGBoost) so that the +71%, +33%, and +28% relative gains can be verified directly.
- [Abstract / Results] The 96.0% accuracy figure is reported alongside macro-F1; given the extreme class imbalance, a brief clarification of whether this is micro-averaged or overall accuracy would avoid misinterpretation.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below. We have revised the manuscript to clarify the ablation configurations and added discussion on prototype generalization.
read point-by-point responses
-
Referee: [Abstract / Results] The central performance claims rest on the learned toxicity prototype and cosine-similarity gate (abstract and methods). Because the prototype is end-to-end optimized on the Jigsaw training split, it may encode dataset-specific annotation biases or label co-occurrences rather than transferable toxicity directions; the single 0.71 zero-shot result on Contextual Abuse does not yet rule out mild overfitting that would inflate the reported +71% relative F1 lift on severe_toxic and the 6.9-point margin over BERT.
Authors: We agree the prototype is optimized on Jigsaw and could capture dataset-specific patterns. The zero-shot 0.71 macro-F1 on Contextual Abuse provides supporting evidence of transfer. Gains on minority labels align with the gating mechanism's role in addressing imbalance, as shown in ablations. We have added a limitations paragraph in the discussion acknowledging single-dataset prototype risks and outlining future multi-dataset prototype experiments. revision: yes
-
Referee: [Ablation study] §4 (ablation study): the -4.8 macro-F1 drop when cosine gating is removed is load-bearing for the architectural claim, yet the exact configuration of the ablated baseline (whether the prototype vector is still present, how the gate is replaced, and whether focal-loss parameters remain identical) is not stated, preventing direct attribution of the gain to the proposed mechanism.
Authors: The referee correctly identifies insufficient detail in the original ablation description. We have revised §4 to specify that the 'no cosine gating' variant retains the prototype vector (unused for gating), replaces the cosine gate with an identity operation, and keeps embeddings, BiLSTM, SMOTE, and focal loss parameters unchanged. This isolates the gating contribution. revision: yes
Circularity Check
Empirical benchmark results show no circularity in claimed performance gains
full rationale
The paper proposes CoGate-LSTM as an architecture with a cosine-similarity gate to a learned toxicity prototype, then reports empirical results on the fixed Jigsaw Toxic Comment benchmark including macro-F1, accuracy, comparisons to BERT and XGBoost, ablations, and zero-shot transfer. These performance numbers are produced by standard training and evaluation on public data splits rather than reducing by the paper's equations to quantities defined solely by fitted parameters or self-citations. No derivation chain, uniqueness theorem, or ansatz is invoked that would make the central claims tautological; the model description and gating mechanism are the proposed method, validated externally. This is the most common honest finding for an empirical ML architecture paper.
Axiom & Free-Parameter Ledger
free parameters (2)
- toxicity prototype vector
- focal loss gamma and alpha
axioms (2)
- domain assumption Frozen GloVe, FastText, and BERT-CLS embeddings contain sufficient semantic signal for toxicity detection when combined.
- domain assumption Cosine similarity in embedding space corresponds to directional informativeness for minority toxic classes.
invented entities (2)
-
toxicity prototype vector
no independent evidence
-
CoGate feature-space gating mechanism
no independent evidence
Reference graph
Works this paper leans on
-
[1]
K. B. Nelatoori and H. B. Kommanti. Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning.Language Resources and Evalua- tion, 59:161–190, 2025.https://doi.org/10.1007/s10579-023-09708-6
-
[2]
N. Tiwari, R. Singh, and P. Kumar. Efficient deep learning models for toxic comments iden- tification using LSTM. InProceedings of the International Conference on Data Science and Information Systems, volume 886, pages 345–356, 2025. https://doi.org/10.1007/ 978-3-031-56732-4_25 20
work page 2025
- [3]
-
[4]
R. Patel, S. Sharma, and A. Gupta. Detecting toxic comments on social media: An extensive evaluation of machine learning techniques.Journal of Computational Social Science, 8:20–39, 2024.https://doi.org/10.1007/s42001-023-00230-w
-
[5]
R. Jessica, V . Kumar, and P. Singh. Hybrid deep learning using BERT and CNN for toxic com- ment classification. InProceedings of the International Conference on Information Management and Technology, pages 393–398, 2024.https://doi.org/10.1145/3647442.3647481
-
[6]
G. Tarun, S. Reddy, and A. Kumar. Exploring BERT and Bi-LSTM for toxic comment classifi- cation. InProceedings of the 2nd International Conference on Data Science and Information Systems, pages 1–6, 2024.https://doi.org/10.1109/ICDSIS59814.2024.00008
-
[7]
N. Reddy and A. Aggarwal. Efficient toxic comment detection using ML techniques. InAd- vances in Computer and Data Sciences, pages 345–356, 2024. https://doi.org/10.1007/ 978-981-99-7643-7_28
work page 2024
- [8]
-
[9]
IEEE Transactions on Multimedia 25, 942– 952 (2020) https://doi.org/10.1109/tmm
S. Gupta, R. Patel, and A. Kumar. Multimodal toxicity detection using vision-language models. IEEE Transactions on Multimedia, 26:1234–1246, 2024. https://doi.org/10.1109/TMM. 2024.3356781
work page doi:10.1109/tmm 2024
-
[10]
L. Wang, Y . Zhang, and X. Chen. Federated learning for privacy-preserving toxic comment detection.ACM Transactions on Intelligent Systems and Technology, 15(3):45–60, 2024.https: //doi.org/10.1145/3643728
-
[11]
H. Liu, J. Dacon, W. Fan, H. Liu, Z. Liu, and J. Tang. Does gender matter? Towards fairness in dialogue systems.Computational Linguistics, 50(1):123–145, 2024. https://doi.org/10. 1162/coli_a_00456
work page 2024
-
[12]
X. Chen, Y . Wang, and Z. Zhang. Cross-domain toxic content detection with adversarial adaptation. InProceedings of the AAAI Conference on Artificial Intelligence, 38:1234–1242, 2024.https://doi.org/10.1609/aaai.v38i2.12345
-
[13]
A. Kumar, S. Singh, and R. Patel. Explainable AI for toxic comment classification: A survey. ACM Computing Surveys, 56(8):1–35, 2024.https://doi.org/10.1145/3631294
-
[14]
J. Smith, M. Johnson, and K. Brown. Real-time toxic comment filtering for live streaming platforms. InProceedings of the ACM Web Conference, pages 567–578, 2024. https://doi. org/10.1145/3589334.3645567
-
[15]
The Impact of Speaker -Independent Experiments on the Validity of Speech-Based Affective Computing,
L. Zhang, H. Wang, and X. Li. Multilingual toxic comment detection with cross-lingual transfer learning.IEEE Transactions on Affective Computing, 15(2):234–247, 2024. https: //doi.org/10.1109/TAFFC.2024.3356789
-
[16]
PeerJ Computer Science10, e1856 (Feb 2024).https://doi.org/10.7717/peerj-cs
S. Jahan and M. Oussala. Leveraging deep learning for toxic comment detection in cursive languages.PeerJ Computer Science, 9:e1345, 2023. https://doi.org/10.7717/peerj-cs. 1345
-
[17]
M. A. Rahman, M. S. Hossain, and M. R. Islam. How do machine learning algorithms effectively classify toxic comments.International Journal of Intelligent Systems and Applications, 15(4):1– 10, 2023.https://doi.org/10.5815/ijisa.2023.04.01
- [18]
- [19]
-
[20]
DeLong, Ramon Fernandez Mir, and Jacques D
L. Johnson and Y . Zhang. Detecting toxic comments using CNN.IEEE Transactions on Neural Networks and Learning Systems, 34(7):114–128, 2023. https://doi.org/10.1109/TNNLS. 2023.3245671
-
[21]
H. Kwon and S. Park. Real-time toxicity monitoring using lightweight LSTM models.Comput- ers, 12(4):88, 2023.https://doi.org/10.3390/computers12040088
-
[22]
S. Roy, A. Kumar, and P. Singh. Domain-specific toxicity detection using BERT variants. In Proceedings of the International Conference on Responsible AI, pages 78–89, 2023. https: //doi.org/10.1145/3578352.3578361
-
[23]
S. Santhiya, R. Priya, and S. Kumar. Transfer learning-based YouTube toxic comments identifi- cation. InProceedings of the Speech and Language Technologies for Low-Resource Languages, pages 231–245, 2023.https://doi.org/10.1007/978-3-031-25794-5_18
-
[24]
X. Chen, L. Wang, and Y . Zhang. Cross-lingual toxic comment detection with zero-shot learning. InProceedings of the International Conference on Machine Learning, pages 567–578, 2023. https://proceedings.mlr.press/v202/chen23w.html
work page 2023
-
[25]
J. Lee, S. Kim, and H. Park. Context-aware toxic comment detection with graph neural networks.Knowledge-Based Systems, 260:110145, 2023. https://doi.org/10.1016/j. knosys.2022.110145
work page doi:10.1016/j 2023
-
[26]
R. Wilson, K. Thompson, and M. Davis. Adversarial training for robust toxic comment classifi- cation. InFindings of the Association for Computational Linguistics, pages 1234–1245, 2023. https://doi.org/10.18653/v1/2023.findings-acl.89
-
[27]
T. Brown, A. Miller, and L. Garcia. Few-shot learning for toxic comment detection in emerg- ing platforms.IEEE Access, 11:45678–45689, 2023. https://doi.org/10.1109/ACCESS. 2023.3278901
-
[28]
P. Anderson, R. White, and S. Green. Multimodal hate speech detection: Combining text and user behavior. InProceedings of the International AAAI Conference on Web and Social Media, 17:23–34, 2023.https://doi.org/10.1609/icwsm.v17i1.22145
-
[29]
R. Mishra and N. Agarwal. Offensive language detection using CNN-LSTM model.Procedia Computer Science, 199:464–470, 2022. https://doi.org/10.1016/j.procs.2022.01. 056
-
[30]
S. Kumar and V . Singh. Toxic comment detection using attention-based deep learning models. Journal of Ambient Intelligence and Humanized Computing, 13(5):2457–2468, 2022. https: //doi.org/10.1007/s12652-021-03575-1
-
[31]
D. K. Shah, M. A. Sanghvi, R. P. Mehta, P. S. Shah, and A. Singh. Multilabel toxic comment classification using supervised machine learning algorithms. InLecture Notes in Networks and Systems, 23:23–32, 2021.https://doi.org/10.1007/978-981-15-7106-0_3
-
[32]
R. Kumar, A. K. Ojha, S. Malmasi, and M. Zampieri. Comment toxicity detection via a multichannel convolutional bidirectional gated recurrent unit.Neurocomputing, 448:140–153, 2021.https://doi.org/10.1016/j.neucom.2021.03.058
- [33]
-
[34]
Z. Zhao, H. Lu, V . W. Zheng, D. Cai, X. He, and Y . Zhuang. A comparative study of using pre-trained language models for toxic comment classification. InCompanion Proceedings of the Web Conference, pages 500–507, 2021.https://doi.org/10.1145/3442442.3452306
- [35]
-
[36]
P. Malik, P. Bansal, and R. Singh. Toxic speech detection using BERT and FastText embeddings. InProceedings of the 5th International Conference on Computing Methodologies and Communi- cation, pages 1254–1259, 2021. https://doi.org/10.1109/ICCMC51019.2021.9418272 22
- [37]
- [38]
-
[39]
A. Singh, S. Kumar, and P. Gupta. Toxic comment classification using LSTM and GloVe embeddings.International Journal of Computer Applications, 182(24):1–5, 2021. https: //doi.org/10.5120/ijca2021921804
-
[40]
H.-Y . Chen and C.-T. Li. HENIN: Learning heterogeneous neural interaction networks for explainable cyberbullying detection.arXiv preprint, 2020. https://arxiv.org/abs/2010. 04576
work page 2020
-
[41]
V . Maslej-Krešˇnáková, M. Sarnovský, P. Butka, and K. Machová. Comparison of deep learning models and pre-processing techniques for toxic comment classification.Applied Sciences, 10(23):8631, 2020.https://doi.org/10.3390/app10238631
- [42]
-
[43]
B. Vidgen, D. Nguyen, H. Maronite, Z. Waseem, S. Hale, and H. Margetts. Introducing CAD: The contextual abuse dataset. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2289–2303, 2021.https://doi.org/10.18653/v1/2021.naacl-main.182
-
[44]
Proceedings of the 2019 Conference of the North
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 4171–4186, 2019.https://doi.org/10.18653/v1/N19-1423
-
[45]
M. Sap, D. Card, S. Gabriel, Y . Choi, and N. A. Smith. The risk of racial bias in hate speech detection. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, 2019.https://doi.org/10.18653/v1/P19-1163
-
[46]
M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar. SemEval-2019 Task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 75–86, 2019. https://doi.org/10.18653/v1/S19-2010
-
[47]
Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 3456–3469, 2019.https://doi.org/10.18653/v1/D19-1350
-
[48]
Language Models are Few-Shot Learners
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, et al. Language models are few-shot learners.arXiv preprint, 2019.https://arxiv.org/abs/2005.14165
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[49]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, 30:5998–6008, 2017.https://arxiv.org/abs/1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[50]
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.arXiv preprint, 2014. https://arxiv.org/abs/1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[51]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpfer, E. Yang, Z. DeVito, M. Raison, A. Te- jani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, et al. PyTorch: An imperative style, high- performance deep learning library. InAdvances in Neural Information Processing Systems, ...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[52]
S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735– 1780, 1997.https://doi.org/10.1162/neco.1997.9.8.1735
-
[53]
M. Turk and A. Pentland. Eigenfaces for recognition.Journal of Cognitive Neuroscience, 3(1):71–86, 1991.https://doi.org/10.1162/jocn.1991.3.1.71
-
[54]
C. J. Adams, J. Sorensen, J. Elliott, L. Dixon, M. McDonald, Nithum, and W. Cukier- ski. Toxic Comment Classification Challenge.Kaggle, 2017. https://www.kaggle.com/ competitions/jigsaw-toxic-comment-classification-challenge 24
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.