super hub Mixed citations

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Julien Chaumond, Lysandre Debut, Thomas Wolf, Victor Sanh · 2019 · cs.CL · arXiv 1910.01108

Mixed citation behavior. Most common role is background (62%).

217 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 217 citing papers more from Julien Chaumond arXiv PDF

abstract

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 18 method 11

citation-polarity summary

background 18 use method 11

claims ledger

abstract As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge di

authors

Julien Chaumond Lysandre Debut Thomas Wolf Victor Sanh

co-cited works

representative citing papers

Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback

cs.LG · 2026-06-29 · unverdicted · novelty 8.0

Noisy expert imitation learning requires exponential samples for offline methods but polynomial for a variant of on-policy distillation under a noise condition.

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models

cs.AI · 2026-06-09 · conditional · novelty 8.0

Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.

Canonical Regularisation of Wide Feature-Learning Neural Networks

stat.ML · 2026-05-18 · unverdicted · novelty 8.0

Derives geodesic ridge regularization and Riemannian Gibbs Process prior for feature-learning wide neural networks, generalizing kernel-regime results via function-space axiomatization.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

Learning the Signature of Memorization in Autoregressive Language Models

cs.CL · 2026-04-03 · accept · novelty 8.0

A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

cs.CL · 2023-05-12 · conditional · novelty 8.0

Tiny language models under 10M parameters trained on a synthetic children's story dataset generate fluent, consistent, multi-paragraph English text with near-perfect grammar and reasoning.

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

Learning from Acquisition: Metadata-driven Multimodal Pre-training for Cardiac MRI

cs.CV · 2026-06-27 · unverdicted · novelty 7.0

MetaCLIP-CMR applies CLIP-style contrastive learning to cardiac MRI by treating acquisition metadata as text labels, delivering 86.8% modality and 86.5% view accuracy plus top Dice scores on ACDC/M&Ms segmentation with far less pre-training data than recent large-scale CMR models.

Toward Calibrated Mixture-of-Experts Under Distribution Shift

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

Expert calibration suffices for MoE calibration under distribution shifts in hard-routed models but not soft-routed ones; adversarial reweighting improves the accuracy-calibration tradeoff across models and shifts.

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

ZPPO improves distillation to small vision-language models by using binary and negative candidate prompts plus a replay buffer for hard questions, outperforming standard distillation and GRPO on a 31-benchmark suite with largest gains at the 0.8B scale.

Otters++: A Time-to-first-spike Based Energy Efficient Optical Spiking Transformer

cs.AI · 2026-06-11 · unverdicted · novelty 7.0

Otters++ realizes TTFS via measured device decay in optical synapses, uses hybrid QNN-equivalent training with noise awareness, and reports 84.17% average GLUE score with energy gains over prior spiking transformers.

Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

AR-OPD disentangles privileged supervision via anchored residual guidance to reduce hindsight leakage in on-policy distillation, reporting gains of 2.3 points over full privileged OPD and 7.9 over SFT on reasoning tasks.

Leveraging Machine-Learned Advice in Strategic Interactions with No-Regret Learners

cs.GT · 2026-06-09 · unverdicted · novelty 7.0

Introduces a pseudo-metric to quantify advice usefulness and shows reliable advice enables efficient approximate Stackelberg strategies while unreliable advice blocks simultaneous near-Stackelberg and no-regret guarantees but permits weak dominance in some correlated equilibria.

OPRD: On-Policy Representation Distillation

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

OPRD performs distillation in hidden-state space on on-policy data for deterministic gradients and better math benchmark performance, plus OPRD-Bridge for cross-architecture transfer via low-rank projectors.

Toward Calibrated, Fair, and accurate Deepfake Detection

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Introduces (ε,q,t,A)-behavioral indistinguishability and shows via Qwen/Llama experiments that LoRA distillation boosts semantic similarity but leaves detectable behavioral differences under adversarial evaluation.

MATCHA: Matching Text via Contrastive Semantic Alignment

cs.CL · 2026-05-26 · unverdicted · novelty 7.0

MATCHA introduces a dual-view contrastive metric measuring proximity to gold text and distance from adversarial contradictions, outperforming ROUGE and BERTScore by up to 20% on TruthfulQA and other NLP benchmarks.

Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging

hep-ex · 2026-05-20 · unverdicted · novelty 7.0

PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.

Distribution-free root cause analysis

stat.ME · 2026-05-20 · unverdicted · novelty 7.0

CROC constructs finite-sample valid confidence sets for the root-cause index in multi-stream change detection using conformal p-values under independence and exchangeability assumptions.

AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

AIGaitor is the first claimed end-to-end on-device monocular motion-capture and deep-learning gait analysis pipeline demonstrated on consumer smartphones.

Layer-wise Token Compression for Efficient Document Reranking

cs.IR · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs

TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics

cs.OS · 2026-05-18 · unverdicted · novelty 7.0

TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

Differentially Private Motif-Preserving Multi-modal Hashing

cs.IR · 2026-05-14 · unverdicted · novelty 7.0

DMP-MH clips degrees to control triangle sensitivity, synthesizes an edge-DP graph with Noisy Mirror Descent, and distills it into dual-stream hash networks, beating private baselines by up to 11.4 mAP on MIRFlickr-25K and NUS-WIDE while keeping 92.5% of non-private performance.

citing papers explorer

Showing 50 of 66 citing papers after filters.

Learning the Signature of Memorization in Autoregressive Language Models cs.CL · 2026-04-03 · accept · none · ref 18 · internal anchor
A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? cs.CL · 2023-05-12 · conditional · none · ref 25 · internal anchor
Tiny language models under 10M parameters trained on a synthetic children's story dataset generate fluent, consistent, multi-paragraph English text with near-perfect grammar and reasoning.
Language Models are Few-Shot Learners cs.CL · 2020-05-28 · accept · none · ref 71 · internal anchor
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients cs.CL · 2026-06-16 · unverdicted · none · ref 19 · internal anchor
ZPPO improves distillation to small vision-language models by using binary and negative candidate prompts plus a replay buffer for hard questions, outperforming standard distillation and GRPO on a 31-benchmark suite with largest gains at the 0.8B scale.
MATCHA: Matching Text via Contrastive Semantic Alignment cs.CL · 2026-05-26 · unverdicted · none · ref 2 · internal anchor
MATCHA introduces a dual-view contrastive metric measuring proximity to gold text and distance from adversarial contradictions, outperforming ROUGE and BERTScore by up to 20% on TruthfulQA and other NLP benchmarks.
Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling cs.CL · 2026-05-18 · unverdicted · none · ref 129 · internal anchor
RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis cs.CL · 2026-05-02 · unverdicted · none · ref 55 · internal anchor
Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.
RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian cs.CL · 2026-04-21 · unverdicted · none · ref 13 · internal anchor
RoLegalGEC is the first Romanian legal-domain dataset for grammatical error detection and correction, consisting of 350,000 examples, with evaluations of several neural models.
Multilingual Multi-Label Emotion Classification at Scale with Synthetic Data cs.CL · 2026-04-14 · conditional · none · ref 8 · internal anchor
Synthetic data of 1M+ multi-label samples across 23 languages trains models that match or exceed English-only specialists on zero-shot benchmarks for emotion classification.
Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models cs.CL · 2026-04-09 · unverdicted · none · ref 17 · internal anchor
OPD for LLMs suffers length inflation and repetition collapse; StableOPD uses reference divergence and rollout mixing to prevent it and improve math reasoning performance by 7.2% on average.
Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention cs.CL · 2026-04-09 · unverdicted · none · ref 7 · 2 links · internal anchor
Kathleen performs byte-level text classification via recurrent oscillator banks, FFT wavetable encoding, and phase harmonics, matching pretrained baselines on standard benchmarks with 36% fewer parameters.
Explainable Semantic Textual Similarity via Dissimilar Span Detection cs.CL · 2026-03-22 · unverdicted · none · ref 15 · internal anchor
Introduces the Dissimilar Span Detection task and Span Similarity Dataset to explain semantic textual similarity by identifying differing spans between text pairs.
Accelerating Large Language Model Decoding with Speculative Sampling cs.CL · 2023-02-02 · accept · none · ref 16 · internal anchor
Speculative sampling accelerates LLM decoding 2-2.5x by letting a draft model propose short sequences that the target model scores in parallel, then applies modified rejection sampling to keep the exact target distribution.
Using Text-Based Causal Inference to Disentangle Factors Influencing Online Review Ratings cs.CL · 2026-06-02 · unverdicted · none · ref 13 · internal anchor
Enhances CausalBERT with temperature scaling, hyperparameter optimization, and interpretability to identify administration and benchmark performance as key drivers of school review ratings from over 600K reviews.
MixSD: Mixed Contextual Self-Distillation for Knowledge Injection cs.CL · 2026-05-16 · unverdicted · none · ref 3 · internal anchor
MixSD uses dynamic mixing of the model's expert and naive conditionals to create distribution-aligned supervision that improves the memorization-retention tradeoff over standard SFT.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 85 · internal anchor
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
Distribution Corrected Offline Data Distillation for Large Language Models cs.CL · 2026-05-13 · unverdicted · none · ref 32 · internal anchor
A distribution-correction framework for offline LLM reasoning distillation improves accuracy on math benchmarks by adaptively aligning teacher supervision with the student's inference-time distribution.
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts cs.CL · 2026-05-08 · conditional · none · ref 6 · internal anchor
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding cs.CL · 2026-04-30 · unverdicted · none · ref 26 · internal anchor
TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.
ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models cs.CL · 2026-04-27 · unverdicted · none · ref 12 · internal anchor
ADE scales multi-anchor word representations to transformers via Vocabulary Projection, Grouped Positional Encoding, and context-aware reweighting, achieving 98.7% fewer trainable parameters than DeBERTa-v3-base while matching or exceeding it on two text-classification benchmarks and compressing the
RouteNLP: Closed-Loop LLM Routing with Conformal Cascading and Distillation Co-Optimization cs.CL · 2026-04-26 · unverdicted · none · ref 4 · internal anchor
RouteNLP is a closed-loop LLM routing framework using conformal cascading and distillation co-optimization that cut inference costs by 58% in an 8-week enterprise deployment while preserving 91% acceptance and high quality on benchmarks.
LaScA: Language-Conditioned Scalable Modelling of Affective Dynamics cs.CL · 2026-04-08 · unverdicted · none · ref 30 · internal anchor
A framework converts interpretable facial and acoustic features into language descriptions, feeds them to a pretrained LM for semantic embeddings, and uses those embeddings as priors to improve valence and arousal change prediction on Aff-Wild2 and SEWA while remaining transparent.
Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch cs.CL · 2026-03-23 · unverdicted · none · ref 16 · internal anchor
The authors introduce DSKD-CMA-GA using generative adversarial learning to fix key-query distribution mismatches in cross-tokenizer knowledge distillation, reporting modest average ROUGE-L gains of 0.37 especially on out-of-distribution data.
Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation cs.CL · 2026-02-24 · unverdicted · none · ref 15 · internal anchor
A modified divergence decouples top-K teacher probabilities from the distribution tail during distillation, yielding competitive performance on decoder models with standard compute.
User eXperience Perception Insights Dataset (UXPID): Synthetic User Feedback from Public Industrial Forums cs.CL · 2025-09-15 · unverdicted · none · ref 15 · internal anchor
The paper introduces UXPID, a new dataset of 7130 LLM-annotated synthetic user feedback branches from industrial forums to support UX analysis and NLP tasks in software engineering.
MiniMax-01: Scaling Foundation Models with Lightning Attention cs.CL · 2025-01-14 · unverdicted · none · ref 34 · internal anchor
MiniMax-01 models match GPT-4o and Claude-3.5-Sonnet performance while providing 20-32 times longer context windows through lightning attention and MoE scaling.
GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization cs.CL · 2024-10-31 · unverdicted · none · ref 63 · internal anchor
GigaCheck detects LLM-generated text at both document and span levels by combining fine-tuned language-model embeddings with a DETR-like architecture that treats generated intervals as detectable objects.
Retrieval-Augmented Generation for Natural Language Processing: A Survey cs.CL · 2024-07-18 · accept · none · ref 152 · internal anchor
The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.
Chain-of-Verification Reduces Hallucination in Large Language Models cs.CL · 2023-09-20 · unverdicted · none · ref 154 · internal anchor
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
MiniLLM: On-Policy Distillation of Large Language Models cs.CL · 2023-06-14 · conditional · none · ref 17 · internal anchor
MiniLLM distills large language models into smaller ones via reverse KL divergence and on-policy optimization, yielding higher-quality responses with lower exposure bias than standard KD baselines.
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning cs.CL · 2020-10-08 · conditional · none · ref 3 · internal anchor
ALFWorld aligns text-based and embodied visual environments so agents can learn abstract policies in TextWorld that transfer to better performance on ALFRED tasks than visual-only training.
HuggingFace's Transformers: State-of-the-art Natural Language Processing cs.CL · 2019-10-09 · accept · none · ref 180 · internal anchor
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
Evaluating Pluralism in LLMs through Latent Perspectives cs.CL · 2026-06-11 · unverdicted · none · ref 32 · internal anchor
A domain-agnostic framework extracts perspectives from book reviews showing LLMs underrepresent rarer viewpoints relative to human text.
ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference cs.CL · 2026-06-01 · unverdicted · none · ref 7 · internal anchor
ProbScale finds layer subsets in SLMs like RoBERTa-Large and T5-Base that cut parameters 5-10x while retaining 95-98% of original task performance by maximizing aggregated probe scores under a budget.
When Meaning Travels: A Granular Lens on Hybrid-MoE's Role in Idiomatic Understanding for Language Models cs.CL · 2026-06-01 · unverdicted · none · ref 54 · internal anchor
HybridMoE with controlled hybridization and idiomatic property signals yields 5-6% gains in figurative language representation for multilingual vision-language models.
Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptation and Verification-Guided Deferral cs.CL · 2026-05-29 · unverdicted · none · ref 38 · internal anchor
IndicBERT-HPA with language-aware adapters and verification-guided deferral outperforms baselines on multilingual orthopedic note classification, reaching 0.8792 Macro-F1 overall and 84.4% selective accuracy at 72.3% coverage.
Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers cs.CL · 2026-05-23 · unverdicted · none · ref 11 · internal anchor
Grammatically-Guided Sparse Attention uses POS tags to generate hard or soft masks that constrain self-attention, achieving 0.8200 and 0.8165 accuracy on SST-2 versus 0.8200 for full attention in a DistilBERT-like model.
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling cs.CL · 2026-05-19 · unverdicted · none · ref 16 · 2 links · internal anchor
PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning cs.CL · 2026-05-16 · conditional · none · ref 157 · internal anchor
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
Sakura at BEA 2026 Shared Task 1: What Makes Vocabulary Difficult? cs.CL · 2026-05-14 · accept · none · ref 26 · 2 links · internal anchor
Fine-tuned LLM and explainable models predict vocabulary difficulty with correlations r > 0.91 and r > 0.77, showing spelling difficulty and test item construction as key influences in addition to word production difficulty.
ReAD: Reinforcement-Guided Capability Distillation for Large Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 25 · internal anchor
ReAD applies a contextual bandit to allocate fixed-token distillation budget across interdependent LLM capabilities, yielding higher task utility and fewer negative spillovers than standard methods.
G-Loss: Graph-Guided Fine-Tuning of Language Models cs.CL · 2026-04-28 · unverdicted · none · ref 32 · internal anchor
G-Loss builds a document-similarity graph and uses semi-supervised label propagation to guide fine-tuning of language models, yielding higher accuracy than standard losses on five classification benchmarks.
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency cs.CL · 2026-04-27 · conditional · none · ref 35 · internal anchor
Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.
A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM cs.CL · 2026-04-08 · unverdicted · none · ref 59 · internal anchor
G-Defense builds claim-centered graphs from sub-claims, applies RAG for evidence and competing explanations, then uses graph inference to detect fake news veracity and generate intuitive explanation graphs, claiming SOTA results.
A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm cs.CL · 2025-09-14 · unverdicted · none · ref 24 · internal anchor
Benchmarks five compressed transformer models for multi-platform sentiment classification on 15-minute city discourse, reporting DistilRoBERTa highest F1 of 0.8292 and platform-specific performance differences.
ALIEN: Aligned Entropy Head for Improving Uncertainty Estimation of LLMs cs.CL · 2025-05-21 · conditional · none · ref 33 · internal anchor
ALIEN trains a lightweight uncertainty head initialized to model entropy and refined via supervised regularization to improve detection of incorrect predictions and calibration on classification and NER tasks.
Social media polarization during conflict: Insights from an ideological stance dataset on Israel-Palestine Reddit comments cs.CL · 2025-02-01 · unverdicted · none · ref 30 · internal anchor
A new labeled dataset of 9,969 Israel-Palestine Reddit comments is created and used to compare stance classification methods, with a specific Mixtral prompt achieving the highest performance.
Remember what you did so you know what to do next cs.CL · 2023-10-30 · unverdicted · none · ref 23 · internal anchor
GPT-J with full action history achieves 3.5x improvement over RL in ScienceWorld and matches a two-stage system using 29x larger models.
From Sentiment to Actionable Insights: A Data-Driven Public Sentiment Analysis of Advanced Air Mobility cs.CL · 2026-06-18 · unverdicted · none · ref 83 · internal anchor
Applies standard sentiment classifiers and topic modeling to a large AAM discussion corpus, identifies six clusters of public concern, and lists strategies to address them.
Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit cs.CL · 2026-06-02 · unverdicted · none · ref 30 · internal anchor
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer