pith. sign in

arxiv: 2510.17018 · v2 · submitted 2025-10-19 · 💻 cs.CL · cs.LG

CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification

Pith reviewed 2026-05-18 05:45 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords toxic comment classificationclass imbalancefeature gatingcosine similarityprototype vectorLSTMmacro-F1imbalanced learning
0
0 comments X

The pith

A cosine-similarity gate tied to a learned toxicity prototype lets a compact LSTM emphasize minority-class features and surpass fine-tuned BERT on imbalanced toxic comments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Toxic comment detection fails on rare but serious cases such as severe_toxic and threat because standard models receive diluted training signals under heavy imbalance. The paper tests a gating layer that rescales each token embedding by its cosine similarity to a single learned prototype vector representing overall toxicity. This directional emphasis is meant to steer gradients toward the embedding directions that best distinguish minority classes, while the rest of the model stays small by freezing multi-source embeddings and using focal loss. A reader would care because the approach claims to deliver higher accuracy on dangerous content with far less compute than full transformer fine-tuning. If the mechanism works as described, direction-aware rescaling in embedding space becomes a practical substitute for scale in safety-related text tasks.

Core claim

The paper claims that a prototype-guided cosine-similarity gate in feature space counters gradient dilution for minority toxic classes. Token embeddings are multiplied by a scalar gate equal to their cosine similarity with a learned toxicity prototype vector, which adaptively amplifies directions informative for labels such as threat and identity_hate. When this gate is combined with frozen GloVe, FastText, and BERT-CLS embeddings, a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss, the resulting model reaches 0.881 macro-F1 on the Jigsaw benchmark while remaining an order of magnitude smaller than BERT.

What carries the argument

cosine-similarity feature gating mechanism that rescales token embeddings according to their directional alignment with a learned toxicity prototype vector

If this is right

  • The gating step accounts for the largest share of the performance lift, with removal causing a 4.8-point macro-F1 drop in ablations.
  • Gains concentrate on the rarest labels, reaching +71 percent F1 on severe_toxic relative to fine-tuned BERT.
  • The full model uses roughly 15 times fewer parameters than BERT and runs at 48 ms CPU latency while still beating both BERT and XGBoost.
  • The same architecture transfers to a second abuse dataset at 0.71 macro-F1 in zero-shot mode and 0.73 after light threshold adjustment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prototype-plus-cosine gate could be inserted into other recurrent or transformer backbones for any text task where a few critical classes are outnumbered by orders of magnitude.
  • Tracking how the prototype vector moves during training might expose which dimensions in the frozen embeddings are most diagnostic for different subtypes of toxicity.
  • If the gate generalizes, it offers a route to keep model size small even when new, even rarer abuse categories are added to moderation pipelines.

Load-bearing premise

The learned toxicity prototype and the cosine-similarity computation will consistently identify and boost the embedding directions that matter for minority classes without overfitting to the particular distribution of the training data.

What would settle it

An ablation that replaces the learned prototype with a fixed random vector and still records the same macro-F1 on the Jigsaw test set would show that the specific prototype-guided direction selection is not required for the reported gains.

Figures

Figures reproduced from arXiv: 2510.17018 by Noor Islam S. Mohammad.

Figure 1
Figure 1. Figure 1: Multi-task transformer-based architecture for toxic span prediction. The model jointly [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic of the proposed Multichannel Convolutional BiLSTM (MCBLSTM). Each [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Neural architecture for text normalization, convolutional feature extraction, and attention [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: illustrates the architectural diversity of proposed memory-augmented LSTM extensions. The hgLSTM dynamically constructs a learnable hypergraph over recurrent states, enabling message passing and sparse connectivity modulation through an adjacency-MLP mechanism. The noLSTM integrates rhythmic phase–amplitude coupling into its recurrent dynamics, capturing periodic temporal dependencies through oscillatory g… view at source ↗
Figure 5
Figure 5. Figure 5: Architecture of the Proposed Multiview Sequential Forecasting Network (MSFN). [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dataset overview showing (a) Toxic Comment, (b) Label frequency distribution, (c) Class [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Model Training and Evaluation 7.3.1 Deep Learning Baselines BiLSTM + GloVe: A bidirectional LSTM trained on pretrained GloVe embeddings, enhanced with multi-head attention (8 heads) but without cosine gating or character-level fusion. This architecture isolates the contribution of the proposed gating and fusion mechanisms within xLSTM. It consists of two stacked BiLSTM layers (256 hidden units each), optim… view at source ↗
Figure 8
Figure 8. Figure 8: Model Performance Metrics [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Confusion Matrix 15 [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
read the original abstract

Toxic text classification for online moderation remains challenging under extreme class imbalance, where rare but high-risk labels such as threat and severe_toxic are consistently underdetected by conventional models. We propose CoGate-LSTM, a parameter-efficient recurrent architecture built around a novel cosine-similarity feature gating mechanism that adaptively rescales token embeddings by their directional similarity to a learned toxicity prototype. Unlike token-position attention, the gate emphasizes feature directions most informative for minority toxic classes. The model combines frozen multi-source embeddings (GloVe, FastText, and BERT-CLS), a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss. On the Jigsaw Toxic Comment benchmark, CoGate-LSTM achieves 0.881 macro-F1 (95% CI: [0.873, 0.889]) and 96.0% accuracy, outperforming fine-tuned BERT by 6.9 macro-F1 points (p < 0.001) and XGBoost by 4.7, while using only 7.3M parameters (about 15$\times$ fewer than BERT) and 48 ms CPU inference latency. Gains are strongest on minority labels, with F1 improvements of +71% for severe_toxic, +33% for threat, and +28% for identity_hate relative to fine-tuned BERT. Ablations identify cosine gating as the primary driver of performance (-4.8 macro-F1 when removed), with additional benefits from character-level fusion (-2.4) and multi-head attention (-2.9). CoGate-LSTM also transfers reasonably across datasets, reaching a 0.71 macro-F1 zero-shot on the Contextual Abuse Dataset and 0.73 with lightweight threshold adaptation. These results show that direction-aware feature gating offers an effective and efficient alternative to large, fully fine-tuned transformers for classifying imbalanced toxic comments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces CoGate-LSTM, a parameter-efficient recurrent model that uses a learned toxicity prototype vector to drive cosine-similarity feature-space gating on token embeddings. The architecture fuses frozen multi-source embeddings (GloVe, FastText, BERT-CLS), a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss. On the Jigsaw Toxic Comment benchmark the model reports 0.881 macro-F1 (95% CI [0.873, 0.889]) and 96.0% accuracy, outperforming fine-tuned BERT by 6.9 macro-F1 points (p < 0.001) while using 7.3 M parameters and 48 ms CPU latency; largest gains appear on minority labels. Ablations attribute the primary improvement to the gating mechanism, and zero-shot transfer to the Contextual Abuse Dataset yields 0.71 macro-F1.

Significance. If the central empirical claims hold, the work demonstrates a lightweight, direction-aware gating alternative to full transformer fine-tuning for severely imbalanced toxic-comment tasks. The reported parameter count, inference latency, statistical significance testing, confidence intervals, and ablation deltas are clear strengths that support reproducibility and practical utility. The approach could be relevant for resource-constrained moderation pipelines, provided the prototype generalizes beyond the training distribution.

major comments (2)
  1. [Abstract / Results] The central performance claims rest on the learned toxicity prototype and cosine-similarity gate (abstract and methods). Because the prototype is end-to-end optimized on the Jigsaw training split, it may encode dataset-specific annotation biases or label co-occurrences rather than transferable toxicity directions; the single 0.71 zero-shot result on Contextual Abuse does not yet rule out mild overfitting that would inflate the reported +71% relative F1 lift on severe_toxic and the 6.9-point margin over BERT.
  2. [Ablation study] §4 (ablation study): the -4.8 macro-F1 drop when cosine gating is removed is load-bearing for the architectural claim, yet the exact configuration of the ablated baseline (whether the prototype vector is still present, how the gate is replaced, and whether focal-loss parameters remain identical) is not stated, preventing direct attribution of the gain to the proposed mechanism.
minor comments (3)
  1. [Methods] The manuscript should report the exact dimensionality of the toxicity prototype vector and its initialization scheme to allow replication of the gating computation.
  2. [Results] Table or figure presenting per-class F1 scores should include the corresponding values for all baselines (BERT, XGBoost) so that the +71%, +33%, and +28% relative gains can be verified directly.
  3. [Abstract / Results] The 96.0% accuracy figure is reported alongside macro-F1; given the extreme class imbalance, a brief clarification of whether this is micro-averaged or overall accuracy would avoid misinterpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below. We have revised the manuscript to clarify the ablation configurations and added discussion on prototype generalization.

read point-by-point responses
  1. Referee: [Abstract / Results] The central performance claims rest on the learned toxicity prototype and cosine-similarity gate (abstract and methods). Because the prototype is end-to-end optimized on the Jigsaw training split, it may encode dataset-specific annotation biases or label co-occurrences rather than transferable toxicity directions; the single 0.71 zero-shot result on Contextual Abuse does not yet rule out mild overfitting that would inflate the reported +71% relative F1 lift on severe_toxic and the 6.9-point margin over BERT.

    Authors: We agree the prototype is optimized on Jigsaw and could capture dataset-specific patterns. The zero-shot 0.71 macro-F1 on Contextual Abuse provides supporting evidence of transfer. Gains on minority labels align with the gating mechanism's role in addressing imbalance, as shown in ablations. We have added a limitations paragraph in the discussion acknowledging single-dataset prototype risks and outlining future multi-dataset prototype experiments. revision: yes

  2. Referee: [Ablation study] §4 (ablation study): the -4.8 macro-F1 drop when cosine gating is removed is load-bearing for the architectural claim, yet the exact configuration of the ablated baseline (whether the prototype vector is still present, how the gate is replaced, and whether focal-loss parameters remain identical) is not stated, preventing direct attribution of the gain to the proposed mechanism.

    Authors: The referee correctly identifies insufficient detail in the original ablation description. We have revised §4 to specify that the 'no cosine gating' variant retains the prototype vector (unused for gating), replaces the cosine gate with an identity operation, and keeps embeddings, BiLSTM, SMOTE, and focal loss parameters unchanged. This isolates the gating contribution. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark results show no circularity in claimed performance gains

full rationale

The paper proposes CoGate-LSTM as an architecture with a cosine-similarity gate to a learned toxicity prototype, then reports empirical results on the fixed Jigsaw Toxic Comment benchmark including macro-F1, accuracy, comparisons to BERT and XGBoost, ablations, and zero-shot transfer. These performance numbers are produced by standard training and evaluation on public data splits rather than reducing by the paper's equations to quantities defined solely by fitted parameters or self-citations. No derivation chain, uniqueness theorem, or ansatz is invoked that would make the central claims tautological; the model description and gating mechanism are the proposed method, validated externally. This is the most common honest finding for an empirical ML architecture paper.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The model rests on standard pre-trained embedding quality, the existence of a single directional prototype that separates toxic from non-toxic features, and the effectiveness of focal loss plus SMOTE for imbalance; the prototype vector itself is a fitted entity without external validation.

free parameters (2)
  • toxicity prototype vector
    Learned vector that defines the target direction for gating; its value is fitted during training and central to the gating operation.
  • focal loss gamma and alpha
    Hyperparameters controlling focus on hard minority examples; chosen to handle imbalance.
axioms (2)
  • domain assumption Frozen GloVe, FastText, and BERT-CLS embeddings contain sufficient semantic signal for toxicity detection when combined.
    The architecture treats these as fixed inputs without further adaptation.
  • domain assumption Cosine similarity in embedding space corresponds to directional informativeness for minority toxic classes.
    Core justification for the gating mechanism.
invented entities (2)
  • toxicity prototype vector no independent evidence
    purpose: Serves as reference direction for cosine-based feature gating.
    Newly introduced learned entity whose independent evidence is limited to performance gains on the training distribution.
  • CoGate feature-space gating mechanism no independent evidence
    purpose: Adaptively rescales token embeddings based on directional similarity to the prototype.
    Core novel component proposed in the paper.

pith-pipeline@v0.9.0 · 5887 in / 1739 out tokens · 43224 ms · 2026-05-18T05:45:52.959131+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 4 internal anchors

  1. [1]

    K. B. Nelatoori and H. B. Kommanti. Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning.Language Resources and Evalua- tion, 59:161–190, 2025.https://doi.org/10.1007/s10579-023-09708-6

  2. [2]

    Tiwari, R

    N. Tiwari, R. Singh, and P. Kumar. Efficient deep learning models for toxic comments iden- tification using LSTM. InProceedings of the International Conference on Data Science and Information Systems, volume 886, pages 345–356, 2025. https://doi.org/10.1007/ 978-3-031-56732-4_25 20

  3. [3]

    M. Beck, K. Pöppel, M. Spanring, A. Auer, O. Prudnikova, M. Kopp, G. Klambauer, J. Brand- stetter, and S. Hochreiter. xLSTM: Extended Long Short-Term Memory.arXiv preprint, 2024. https://arxiv.org/abs/2405.04517

  4. [4]

    Patel, S

    R. Patel, S. Sharma, and A. Gupta. Detecting toxic comments on social media: An extensive evaluation of machine learning techniques.Journal of Computational Social Science, 8:20–39, 2024.https://doi.org/10.1007/s42001-023-00230-w

  5. [5]

    Jessica, V

    R. Jessica, V . Kumar, and P. Singh. Hybrid deep learning using BERT and CNN for toxic com- ment classification. InProceedings of the International Conference on Information Management and Technology, pages 393–398, 2024.https://doi.org/10.1145/3647442.3647481

  6. [6]

    Tarun, S

    G. Tarun, S. Reddy, and A. Kumar. Exploring BERT and Bi-LSTM for toxic comment classifi- cation. InProceedings of the 2nd International Conference on Data Science and Information Systems, pages 1–6, 2024.https://doi.org/10.1109/ICDSIS59814.2024.00008

  7. [7]

    Reddy and A

    N. Reddy and A. Aggarwal. Efficient toxic comment detection using ML techniques. InAd- vances in Computer and Data Sciences, pages 345–356, 2024. https://doi.org/10.1007/ 978-981-99-7643-7_28

  8. [8]

    Y . Zhao, R. Liu, and H. Wang. Enhancing LLM-based toxicity detection with meta-toxic knowledge graph.arXiv preprint, 2024.https://arxiv.org/abs/2412.15268

  9. [9]

    IEEE Transactions on Multimedia 25, 942– 952 (2020) https://doi.org/10.1109/tmm

    S. Gupta, R. Patel, and A. Kumar. Multimodal toxicity detection using vision-language models. IEEE Transactions on Multimedia, 26:1234–1246, 2024. https://doi.org/10.1109/TMM. 2024.3356781

  10. [10]

    L. Wang, Y . Zhang, and X. Chen. Federated learning for privacy-preserving toxic comment detection.ACM Transactions on Intelligent Systems and Technology, 15(3):45–60, 2024.https: //doi.org/10.1145/3643728

  11. [11]

    H. Liu, J. Dacon, W. Fan, H. Liu, Z. Liu, and J. Tang. Does gender matter? Towards fairness in dialogue systems.Computational Linguistics, 50(1):123–145, 2024. https://doi.org/10. 1162/coli_a_00456

  12. [12]

    X. Chen, Y . Wang, and Z. Zhang. Cross-domain toxic content detection with adversarial adaptation. InProceedings of the AAAI Conference on Artificial Intelligence, 38:1234–1242, 2024.https://doi.org/10.1609/aaai.v38i2.12345

  13. [13]

    Kumar, S

    A. Kumar, S. Singh, and R. Patel. Explainable AI for toxic comment classification: A survey. ACM Computing Surveys, 56(8):1–35, 2024.https://doi.org/10.1145/3631294

  14. [14]

    Smith, M

    J. Smith, M. Johnson, and K. Brown. Real-time toxic comment filtering for live streaming platforms. InProceedings of the ACM Web Conference, pages 567–578, 2024. https://doi. org/10.1145/3589334.3645567

  15. [15]

    The Impact of Speaker -Independent Experiments on the Validity of Speech-Based Affective Computing,

    L. Zhang, H. Wang, and X. Li. Multilingual toxic comment detection with cross-lingual transfer learning.IEEE Transactions on Affective Computing, 15(2):234–247, 2024. https: //doi.org/10.1109/TAFFC.2024.3356789

  16. [16]

    PeerJ Computer Science10, e1856 (Feb 2024).https://doi.org/10.7717/peerj-cs

    S. Jahan and M. Oussala. Leveraging deep learning for toxic comment detection in cursive languages.PeerJ Computer Science, 9:e1345, 2023. https://doi.org/10.7717/peerj-cs. 1345

  17. [17]

    M. A. Rahman, M. S. Hossain, and M. R. Islam. How do machine learning algorithms effectively classify toxic comments.International Journal of Intelligent Systems and Applications, 15(4):1– 10, 2023.https://doi.org/10.5815/ijisa.2023.04.01

  18. [18]

    T. A. Belal, M. S. Hossain, and M. M. Rahman. Interpretable multi-labeled Bengali toxic comments classification using deep learning.arXiv preprint, 2023. https://arxiv.org/ abs/2304.04087

  19. [19]

    Zhang, Y

    J. Zhang, Y . Liu, H. Wang, and X. Chen. Efficient toxic content detection by bootstrapping and distilling LLMs.arXiv preprint, 2023.https://arxiv.org/abs/2312.08303 21

  20. [20]

    DeLong, Ramon Fernandez Mir, and Jacques D

    L. Johnson and Y . Zhang. Detecting toxic comments using CNN.IEEE Transactions on Neural Networks and Learning Systems, 34(7):114–128, 2023. https://doi.org/10.1109/TNNLS. 2023.3245671

  21. [21]

    Kwon and S

    H. Kwon and S. Park. Real-time toxicity monitoring using lightweight LSTM models.Comput- ers, 12(4):88, 2023.https://doi.org/10.3390/computers12040088

  22. [22]

    S. Roy, A. Kumar, and P. Singh. Domain-specific toxicity detection using BERT variants. In Proceedings of the International Conference on Responsible AI, pages 78–89, 2023. https: //doi.org/10.1145/3578352.3578361

  23. [23]

    Santhiya, R

    S. Santhiya, R. Priya, and S. Kumar. Transfer learning-based YouTube toxic comments identifi- cation. InProceedings of the Speech and Language Technologies for Low-Resource Languages, pages 231–245, 2023.https://doi.org/10.1007/978-3-031-25794-5_18

  24. [24]

    X. Chen, L. Wang, and Y . Zhang. Cross-lingual toxic comment detection with zero-shot learning. InProceedings of the International Conference on Machine Learning, pages 567–578, 2023. https://proceedings.mlr.press/v202/chen23w.html

  25. [25]

    J. Lee, S. Kim, and H. Park. Context-aware toxic comment detection with graph neural networks.Knowledge-Based Systems, 260:110145, 2023. https://doi.org/10.1016/j. knosys.2022.110145

  26. [26]

    Wilson, K

    R. Wilson, K. Thompson, and M. Davis. Adversarial training for robust toxic comment classifi- cation. InFindings of the Association for Computational Linguistics, pages 1234–1245, 2023. https://doi.org/10.18653/v1/2023.findings-acl.89

  27. [27]

    IEEE Access12(2024)

    T. Brown, A. Miller, and L. Garcia. Few-shot learning for toxic comment detection in emerg- ing platforms.IEEE Access, 11:45678–45689, 2023. https://doi.org/10.1109/ACCESS. 2023.3278901

  28. [28]

    Anderson, R

    P. Anderson, R. White, and S. Green. Multimodal hate speech detection: Combining text and user behavior. InProceedings of the International AAAI Conference on Web and Social Media, 17:23–34, 2023.https://doi.org/10.1609/icwsm.v17i1.22145

  29. [29]

    Mishra and N

    R. Mishra and N. Agarwal. Offensive language detection using CNN-LSTM model.Procedia Computer Science, 199:464–470, 2022. https://doi.org/10.1016/j.procs.2022.01. 056

  30. [30]

    Kumar and V

    S. Kumar and V . Singh. Toxic comment detection using attention-based deep learning models. Journal of Ambient Intelligence and Humanized Computing, 13(5):2457–2468, 2022. https: //doi.org/10.1007/s12652-021-03575-1

  31. [31]

    D. K. Shah, M. A. Sanghvi, R. P. Mehta, P. S. Shah, and A. Singh. Multilabel toxic comment classification using supervised machine learning algorithms. InLecture Notes in Networks and Systems, 23:23–32, 2021.https://doi.org/10.1007/978-981-15-7106-0_3

  32. [32]

    Kumar, A

    R. Kumar, A. K. Ojha, S. Malmasi, and M. Zampieri. Comment toxicity detection via a multichannel convolutional bidirectional gated recurrent unit.Neurocomputing, 448:140–153, 2021.https://doi.org/10.1016/j.neucom.2021.03.058

  33. [33]

    J. Wei, K. Zou, and Y . Zhang. Offensive language and hate speech detection with deep learning and transfer learning.arXiv preprint, 2021.https://arxiv.org/abs/2108.03305

  34. [34]

    Z. Zhao, H. Lu, V . W. Zheng, D. Cai, X. He, and Y . Zhuang. A comparative study of using pre-trained language models for toxic comment classification. InCompanion Proceedings of the Web Conference, pages 500–507, 2021.https://doi.org/10.1145/3442442.3452306

  35. [35]

    Dessì, D

    D. Dessì, D. R. Recupero, G. Fenu, and S. Consoli. An assessment of deep learning models and word embeddings for toxicity detection.Electronics, 10(7):779, 2021. https://doi.org/10. 3390/electronics10070779

  36. [36]

    Malik, P

    P. Malik, P. Bansal, and R. Singh. Toxic speech detection using BERT and FastText embeddings. InProceedings of the 5th International Conference on Computing Methodologies and Communi- cation, pages 1254–1259, 2021. https://doi.org/10.1109/ICCMC51019.2021.9418272 22

  37. [37]

    H. B. Giglou, A. H. Razavi, and M. Kaedi. Bi-GRU with multi-embedding for toxicity detection: SemEval-2021 Task 5.arXiv preprint, 2021.https://arxiv.org/abs/2104.13164

  38. [38]

    K. Wang, Y . Liu, and Z. Zhang. A survey of toxic comment classification methods.arXiv preprint, 2021.https://arxiv.org/abs/2112.06412

  39. [39]

    Singh, S

    A. Singh, S. Kumar, and P. Gupta. Toxic comment classification using LSTM and GloVe embeddings.International Journal of Computer Applications, 182(24):1–5, 2021. https: //doi.org/10.5120/ijca2021921804

  40. [40]

    Chen and C.-T

    H.-Y . Chen and C.-T. Li. HENIN: Learning heterogeneous neural interaction networks for explainable cyberbullying detection.arXiv preprint, 2020. https://arxiv.org/abs/2010. 04576

  41. [41]

    Maslej-Krešˇnáková, M

    V . Maslej-Krešˇnáková, M. Sarnovský, P. Butka, and K. Machová. Comparison of deep learning models and pre-processing techniques for toxic comment classification.Applied Sciences, 10(23):8631, 2020.https://doi.org/10.3390/app10238631

  42. [42]

    A. G. D’Sa, I. Illina, and D. Fohr. Towards non-toxic landscapes: Toxic comment detection using deep neural networks.arXiv preprint, 2020.https://arxiv.org/abs/1911.08395

  43. [43]

    Vidgen, D

    B. Vidgen, D. Nguyen, H. Maronite, Z. Waseem, S. Hale, and H. Margetts. Introducing CAD: The contextual abuse dataset. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2289–2303, 2021.https://doi.org/10.18653/v1/2021.naacl-main.182

  44. [44]

    Proceedings of the 2019 Conference of the North

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 4171–4186, 2019.https://doi.org/10.18653/v1/N19-1423

  45. [45]

    M. Sap, D. Card, S. Gabriel, Y . Choi, and N. A. Smith. The risk of racial bias in hate speech detection. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, 2019.https://doi.org/10.18653/v1/P19-1163

  46. [46]

    Zampieri, S

    M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar. SemEval-2019 Task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 75–86, 2019. https://doi.org/10.18653/v1/S19-2010

  47. [47]

    Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 3456–3469, 2019.https://doi.org/10.18653/v1/D19-1350

  48. [48]

    Language Models are Few-Shot Learners

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, et al. Language models are few-shot learners.arXiv preprint, 2019.https://arxiv.org/abs/2005.14165

  49. [49]

    Attention Is All You Need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, 30:5998–6008, 2017.https://arxiv.org/abs/1706.03762

  50. [50]

    D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.arXiv preprint, 2014. https://arxiv.org/abs/1412.6980

  51. [51]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpfer, E. Yang, Z. DeVito, M. Raison, A. Te- jani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, et al. PyTorch: An imperative style, high- performance deep learning library. InAdvances in Neural Information Processing Systems, ...

  52. [52]

    Long short- term memory,

    S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735– 1780, 1997.https://doi.org/10.1162/neco.1997.9.8.1735

  53. [53]

    Turk and A

    M. Turk and A. Pentland. Eigenfaces for recognition.Journal of Cognitive Neuroscience, 3(1):71–86, 1991.https://doi.org/10.1162/jocn.1991.3.1.71

  54. [54]

    C. J. Adams, J. Sorensen, J. Elliott, L. Dixon, M. McDonald, Nithum, and W. Cukier- ski. Toxic Comment Classification Challenge.Kaggle, 2017. https://www.kaggle.com/ competitions/jigsaw-toxic-comment-classification-challenge 24