super hub Mixed citations

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Julien Chaumond, Lysandre Debut, Thomas Wolf, Victor Sanh · 2019 · cs.CL · arXiv 1910.01108

Mixed citation behavior. Most common role is background (62%).

185 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 185 citing papers more from Julien Chaumond arXiv PDF

abstract

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 18 method 11

citation-polarity summary

background 18 use method 11

claims ledger

abstract As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge di

authors

Julien Chaumond Lysandre Debut Thomas Wolf Victor Sanh

co-cited works

representative citing papers

Canonical Regularisation of Wide Feature-Learning Neural Networks

stat.ML · 2026-05-18 · unverdicted · novelty 8.0

Derives geodesic ridge regularization and Riemannian Gibbs Process prior for feature-learning wide neural networks, generalizing kernel-regime results via function-space axiomatization.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

Learning the Signature of Memorization in Autoregressive Language Models

cs.CL · 2026-04-03 · accept · novelty 8.0

A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

cs.CL · 2023-05-12 · conditional · novelty 8.0

Tiny language models under 10M parameters trained on a synthetic children's story dataset generate fluent, consistent, multi-paragraph English text with near-perfect grammar and reasoning.

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

Learning from Acquisition: Metadata-driven Multimodal Pre-training for Cardiac MRI

cs.CV · 2026-06-27 · unverdicted · novelty 7.0

MetaCLIP-CMR applies CLIP-style contrastive learning to cardiac MRI by treating acquisition metadata as text labels, delivering 86.8% modality and 86.5% view accuracy plus top Dice scores on ACDC/M&Ms segmentation with far less pre-training data than recent large-scale CMR models.

Toward Calibrated, Fair, and accurate Deepfake Detection

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Introduces (ε,q,t,A)-behavioral indistinguishability and shows via Qwen/Llama experiments that LoRA distillation boosts semantic similarity but leaves detectable behavioral differences under adversarial evaluation.

MATCHA: Matching Text via Contrastive Semantic Alignment

cs.CL · 2026-05-26 · unverdicted · novelty 7.0

MATCHA introduces a dual-view contrastive metric measuring proximity to gold text and distance from adversarial contradictions, outperforming ROUGE and BERTScore by up to 20% on TruthfulQA and other NLP benchmarks.

Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging

hep-ex · 2026-05-20 · unverdicted · novelty 7.0

PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.

Distribution-free root cause analysis

stat.ME · 2026-05-20 · unverdicted · novelty 7.0

CROC constructs finite-sample valid confidence sets for the root-cause index in multi-stream change detection using conformal p-values under independence and exchangeability assumptions.

Layer-wise Token Compression for Efficient Document Reranking

cs.IR · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs

TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics

cs.OS · 2026-05-18 · unverdicted · novelty 7.0

TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

Differentially Private Motif-Preserving Multi-modal Hashing

cs.IR · 2026-05-14 · unverdicted · novelty 7.0

DMP-MH clips degrees to control triangle sensitivity, synthesizes an edge-DP graph with Noisy Mirror Descent, and distills it into dual-stream hash networks, beating private baselines by up to 11.4 mAP on MIRFlickr-25K and NUS-WIDE while keeping 92.5% of non-private performance.

When More Parameters Hurt: Foundation Model Priors Amplify Worst-Client Disparity Under Extreme Federated Heterogeneity

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Foundation model priors amplify worst-client disparity under extreme federated heterogeneity, creating a fairness paradox where larger models perform worse for disadvantaged clients.

Switchcraft: AI Model Router for Agentic Tool Calling

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

Switchcraft routes agentic tool-calling queries to the lowest-cost model that preserves correctness, reaching 82.9% accuracy and 84% cost reduction on five benchmarks.

TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.

A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

cs.CR · 2026-04-30 · unverdicted · novelty 7.0

VOW formulates LLM watermark detection as a secure two-party computation using a Verifiable Oblivious Pseudorandom Function to achieve private and cryptographically verifiable detection.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

AgentPulse: A Continuous Multi-Signal Framework for Evaluating AI Agents in Deployment

cs.AI · 2026-04-27 · conditional · novelty 7.0

AgentPulse is a continuous multi-signal framework that scores AI agents on benchmark performance, adoption, sentiment and ecosystem health, showing these factors are complementary and that benchmark-plus-sentiment predicts external adoption metrics.

RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

RoLegalGEC is the first Romanian legal-domain dataset for grammatical error detection and correction, consisting of 350,000 examples, with evaluations of several neural models.

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

cs.CR · 2026-04-19 · unverdicted · novelty 7.0

Open-source LLMs detect phishing intent at high rates but still generate actionable phishing content, and GuardPhish supplies a dataset plus modular classifiers to close the gap.

citing papers explorer

Showing 50 of 185 citing papers.

MiniLLM: On-Policy Distillation of Large Language Models cs.CL · 2023-06-14 · conditional · none · ref 17 · internal anchor
MiniLLM distills large language models into smaller ones via reverse KL divergence and on-policy optimization, yielding higher-quality responses with lower exposure bias than standard KD baselines.
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance cs.LG · 2023-05-09 · accept · none · ref 18 · internal anchor
FrugalGPT learns query-specific cascades across heterogeneous LLM APIs to match or exceed top-model accuracy at far lower cost.
R3M: A Universal Visual Representation for Robot Manipulation cs.RO · 2022-03-23 · unverdicted · none · ref 71 · internal anchor
A visual encoder pre-trained on diverse human videos with contrastive and language objectives improves simulated robot manipulation success by over 20% versus training from scratch and enables real Franka arm tasks from 20 demonstrations.
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed cs.LG · 2021-01-07 · unverdicted · none · ref 32 · internal anchor
Denoising Student distills the multi-step denoising process of score-based and diffusion models into a single forward pass, matching GAN sampling speed while producing comparable sample quality on CIFAR-10, CelebA, and 256x256 LSUN.
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning cs.CL · 2020-10-08 · conditional · none · ref 3 · internal anchor
ALFWorld aligns text-based and embodied visual environments so agents can learn abstract policies in TextWorld that transfer to better performance on ALFRED tasks than visual-only training.
Linformer: Self-Attention with Linear Complexity cs.LG · 2020-06-08 · conditional · none · ref 16 · internal anchor
Linformer approximates self-attention with a low-rank projection to achieve O(n) time and space complexity while matching Transformer accuracy on standard NLP tasks.
HuggingFace's Transformers: State-of-the-art Natural Language Processing cs.CL · 2019-10-09 · accept · none · ref 180 · internal anchor
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
DOPD: Dual On-policy Distillation cs.AI · 2026-06-29 · unverdicted · none · ref 32 · internal anchor
DOPD is an advantage-aware dual distillation method that dynamically assigns token supervision from either privileged teacher or student to transfer capability while mitigating non-replicable information asymmetry in on-policy distillation.
SurrogateShield: Beyond Redaction for High-Utility, Privacy-Preserving LLM Interactions cs.CR · 2026-06-28 · unverdicted · none · ref 25 · internal anchor
SurrogateShield replaces detected PII with device-local surrogates before LLM API calls and restores originals afterward, achieving 98.87% F1 detection and 13.26 pp higher BERTScore than placeholder redaction while blocking real PII transmission.
Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation cs.AI · 2026-06-08 · unverdicted · none · ref 15 · internal anchor
DPVR-LF routes saturated vision tokens into a one-layer side branch after layer 4, runs text-only processing through layers 5-17, and performs late fusion at the final layer to reduce visual computation while preserving multimodal performance.
Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptation and Verification-Guided Deferral cs.CL · 2026-05-29 · unverdicted · none · ref 38 · internal anchor
IndicBERT-HPA with language-aware adapters and verification-guided deferral outperforms baselines on multilingual orthopedic note classification, reaching 0.8792 Macro-F1 overall and 84.4% selective accuracy at 72.3% coverage.
SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching cs.LG · 2026-05-29 · unverdicted · none · ref 29 · internal anchor
SemStruct models tables as heterogeneous graphs with GNNs on frozen PLM embeddings to incorporate row co-occurrences for schema matching and reports SOTA results on Valentine and SOTAB-SM benchmarks.
BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices cs.AI · 2026-05-28 · unverdicted · none · ref 43 · internal anchor
BitTP applies weight-only 1.58-bit quantization to LLM trajectory predictors, claiming improved ADE/FDE over BF16 baseline with reduced resource demands on edge devices.
From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints cs.AI · 2026-05-27 · unverdicted · none · ref 24 · internal anchor
An LLM+BM25+graph pipeline tags learning resources to competencies with evidence spans, reaching 0.57 micro-F1 and 0.50 macro-F1 at fragment level on a 22-competency university dataset while outperforming baselines.
When Mean CE Fails: Median CE Can Better Track Language Model Quality cs.AI · 2026-05-23 · unverdicted · none · ref 8 · internal anchor
Median cross-entropy tracks language model task performance more reliably than mean cross-entropy during synthetic fact-learning SFT and top-K distillation.
Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers cs.CL · 2026-05-23 · unverdicted · none · ref 11 · internal anchor
Grammatically-Guided Sparse Attention uses POS tags to generate hard or soft masks that constrain self-attention, achieving 0.8200 and 0.8165 accuracy on SST-2 versus 0.8200 for full attention in a DistilBERT-like model.
FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection cs.SE · 2026-05-21 · unverdicted · none · ref 22 · internal anchor
FAME achieves F1 of 98.16 on BGL and 99.95 on Thunderbird for message-level log anomaly detection using at most K=100 labels per template, reducing annotation effort by 76x while detecting anomalies from unseen EventIDs.
Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation cs.LG · 2026-05-21 · unverdicted · none · ref 23 · internal anchor
A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.
Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation eess.AS · 2026-05-21 · unverdicted · none · ref 47 · internal anchor
DMA-KWS achieves 97.85% AUC and 6.13% EER on LibriPhrase Hard via dual-stage CTC/QbyT matching, multi-modal enrollment, and lightweight continual adaptation with 187k parameters.
OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind cs.AI · 2026-05-19 · unverdicted · none · ref 25 · internal anchor
OSCToM uses RL-guided generation with an extended DSL and surrogate models to create nested belief conflict tasks, raising FANToM accuracy from 0.2% to 76% while being 6x more efficient.
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling cs.CL · 2026-05-19 · unverdicted · none · ref 16 · 2 links · internal anchor
PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.
Traditional statistical representations outperform generative AI in identifying expert peer reviewers cs.IR · 2026-05-18 · unverdicted · none · ref 114 · internal anchor
TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization cs.RO · 2026-05-17 · unverdicted · none · ref 73 · internal anchor
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning cs.CL · 2026-05-16 · conditional · none · ref 157 · internal anchor
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
Sakura at BEA 2026 Shared Task 1: What Makes Vocabulary Difficult? cs.CL · 2026-05-14 · accept · none · ref 26 · 2 links · internal anchor
Fine-tuned LLM and explainable models predict vocabulary difficulty with correlations r > 0.91 and r > 0.77, showing spelling difficulty and test item construction as key influences in addition to word production difficulty.
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity cs.LG · 2026-05-13 · unverdicted · none · ref 50 · internal anchor
Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.
PhishSigma++: Malicious Email Detection with Typed Entity Relations cs.CR · 2026-05-12 · unverdicted · none · ref 22 · internal anchor
PhishSigma++ reaches 0.9675 F1 on clean data and holds 0.9579 F1 under adversarial text padding by modeling typed entity relations in emails, outperforming text-only baselines that drop sharply.
ReAD: Reinforcement-Guided Capability Distillation for Large Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 25 · internal anchor
ReAD applies a contextual bandit to allocate fixed-token distillation budget across interdependent LLM capabilities, yielding higher task utility and fewer negative spillovers than standard methods.
Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs cs.LG · 2026-05-08 · unverdicted · none · ref 41 · internal anchor
Graph representation learning plus iterative augmented Lagrangian optimization creates stronger, harder-to-detect model manipulation attacks on federated LLM fine-tuning, cutting global accuracy by up to 26%.
PRIMED: Adaptive Modality Suppression for Referring Audio-Visual Segmentation via Biased Competition cs.CV · 2026-05-08 · unverdicted · none · ref 36 · internal anchor
PRIMED improves referring audio-visual segmentation by using a modality prior decoder and competition-aware fusion to adaptively suppress irrelevant modalities.
Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing cs.LG · 2026-05-07 · unverdicted · none · ref 15 · internal anchor
NPD accelerates on-policy distillation 8.1 times faster than baselines by using asynchronous SFT with Δ-IFD filtering, outperforming standard SFT and enabling a 1B model to achieve 68.73% SOTA score.
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection cs.AI · 2026-05-04 · unverdicted · none · ref 38 · internal anchor
Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.
eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem cs.CR · 2026-04-29 · unverdicted · none · ref 54 · internal anchor
eDySec is a deep learning-based framework that detects malicious PyPI packages through dynamic analysis, halving feature dimensionality, reducing false positives by 82%, false negatives by 79%, and boosting accuracy by 3% with near-perfect stability.
G-Loss: Graph-Guided Fine-Tuning of Language Models cs.CL · 2026-04-28 · unverdicted · none · ref 32 · internal anchor
G-Loss builds a document-similarity graph and uses semi-supervised label propagation to guide fine-tuning of language models, yielding higher accuracy than standard losses on five classification benchmarks.
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency cs.CL · 2026-04-27 · conditional · none · ref 35 · internal anchor
Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.
ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting cs.CV · 2026-04-20 · unverdicted · none · ref 40 · internal anchor
ESsEN is a parameter-efficient two-tower vision-language transformer that matches larger models on discriminative tasks after training end-to-end with limited data and resources.
GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology cs.AI · 2026-04-16 · unverdicted · none · ref 17 · internal anchor
GIST extracts a semantically annotated 2D navigation topology from consumer mobile point clouds to improve spatial grounding for embodied AI in dense environments.
A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM cs.CL · 2026-04-08 · unverdicted · none · ref 59 · internal anchor
G-Defense builds claim-centered graphs from sub-claims, applies RAG for evidence and competing explanations, then uses graph inference to detect fake news veracity and generate intuitive explanation graphs, claiming SOTA results.
ProtoSiTex: Learning Semi-Interpretable Prototypes for Multi-label Text Classification cs.AI · 2025-10-14 · unverdicted · none · ref 46 · internal anchor
ProtoSiTex introduces dual-phase prototype learning with hierarchical consistency loss for semi-interpretable multi-label text classification on a new subsentence-annotated hotel review dataset.
A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm cs.CL · 2025-09-14 · unverdicted · none · ref 24 · internal anchor
Benchmarks five compressed transformer models for multi-platform sentiment classification on 15-minute city discourse, reporting DistilRoBERTa highest F1 of 0.8292 and platform-specific performance differences.
Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code cs.SE · 2025-08-05 · unverdicted · none · ref 64 · internal anchor
Empirical tests show compressed code language models retain task performance but suffer markedly lower robustness under four standard adversarial attacks.
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection cs.CR · 2025-07-10 · unverdicted · none · ref 33 · internal anchor
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.
ALIEN: Aligned Entropy Head for Improving Uncertainty Estimation of LLMs cs.CL · 2025-05-21 · conditional · none · ref 33 · internal anchor
ALIEN trains a lightweight uncertainty head initialized to model entropy and refined via supervised regularization to improve detection of incorrect predictions and calibration on classification and NER tasks.
Social media polarization during conflict: Insights from an ideological stance dataset on Israel-Palestine Reddit comments cs.CL · 2025-02-01 · unverdicted · none · ref 30 · internal anchor
A new labeled dataset of 9,969 Israel-Palestine Reddit comments is created and used to compare stance classification methods, with a specific Mixtral prompt achieving the highest performance.
Efficient Model Repository for Entity Resolution: Construction, Search, and Integration cs.DB · 2024-12-12 · unverdicted · none · ref 37 · internal anchor
MoRER builds an ER model repository via feature distribution clustering of tasks, achieving competitive results with limited labels versus active learning, transfer learning, and self-supervised methods on three multi-source datasets.
Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA cs.LG · 2024-11-13 · unverdicted · none · ref 40 · internal anchor
A LoRA-based residual feature alignment method for efficient machine unlearning on pre-trained models by targeting zero residuals on retained data and shifted residuals on unlearned data.
Remember what you did so you know what to do next cs.CL · 2023-10-30 · unverdicted · none · ref 23 · internal anchor
GPT-J with full action history achieves 3.5x improvement over RL in ScienceWorld and matches a two-stage system using 29x larger models.
Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models cs.SE · 2023-10-17 · unverdicted · none · ref 52 · internal anchor
bLLMs achieve state-of-the-art results on limited and imbalanced SE sentiment datasets even in zero-shot settings, but fine-tuned sLLMs outperform when ample balanced training data is available.
From Sentiment to Actionable Insights: A Data-Driven Public Sentiment Analysis of Advanced Air Mobility cs.CL · 2026-06-18 · unverdicted · none · ref 83 · internal anchor
Applies standard sentiment classifiers and topic modeling to a large AAM discussion corpus, identifies six clusters of public concern, and lists strategies to address them.
Rec-Distill: An Industrial Distillation Pipeline for Large-Scale Recommendation Models cs.IR · 2026-05-28 · unverdicted · none · ref 17 · internal anchor
Rec-Distill is an industrial distillation pipeline that transfers substantial performance from large-scale recommendation models to efficient students, reporting over 60% transferability and measurable business gains.

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer