hub

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen · 2020 · cs.CL · arXiv 2006.03654

69 Pith papers cite this work. Polarity classification is still indexing.

69 Pith papers citing it

open full Pith review browse 69 citing papers arXiv PDF

abstract

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions, respectively. Second, an enhanced mask decoder is used to incorporate absolute positions in the decoding layer to predict the masked tokens in model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve models' generalization. We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). Notably, we scale up DeBERTa by training a larger version that consists of 48 Transform layers with 1.5 billion parameters. The significant performance boost makes the single DeBERTa model surpass the human performance on the SuperGLUE benchmark (Wang et al., 2019a) for the first time in terms of macro-average score (89.9 versus 89.8), and the ensemble DeBERTa model sits atop the SuperGLUE leaderboard as of January 6, 2021, out performing the human baseline by a decent margin (90.3 versus 89.8).

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1 method 1

citation-polarity summary

background 2 baseline 1 use method 1

representative citing papers

ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts

cs.CL · 2026-04-30 · accept · novelty 8.0

ViLegalNLI is the first 42k-pair Vietnamese legal NLI dataset built via semi-automatic LLM-assisted generation and validation.

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

RoFormer: Enhanced Transformer with Rotary Position Embedding

cs.CL · 2021-04-20 · accept · novelty 8.0

RoFormer introduces rotary position embeddings that encode absolute positions via rotation matrices and relative dependencies in attention, outperforming prior position methods on long text classification tasks.

Visual Semantic Entropy: Do Vision Language Models Recognize Visual Ambiguity?

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

VSE perturbs images only to probe visual ambiguity in VLMs, clusters outputs into semantic prototypes, and computes mass-weighted dispersion, outperforming prior entropy methods on five VQA benchmarks across five models.

Anchors that Don't Lift: Understanding Supply Chain Driven Kernel Lock-In and Governance-Mediated Mitigation Strategies in SOHO Devices

cs.CR · 2026-06-09 · conditional · novelty 7.0

Kernel lock-in from SoC SDKs creates inherited vulnerability debt in SOHO devices, with SoC vendor community engagement as the viable mitigation strategy.

Remember with Confidence: Uncertainty Quantification for Spatio-temporal Memory with Probabilistic Guarantees

cs.CV · 2026-06-06 · unverdicted · novelty 7.0

Introduces object-level semantic uncertainty for VLM memory, the UQ-DAAAM refinement system, and probabilistic guarantees that selected high-quality views reduce uncertainty more effectively.

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

cs.CR · 2026-06-04 · unverdicted · novelty 7.0

SlotGCG uses Vulnerable Slot Score (VSS) to identify and target the most vulnerable prompt positions for adversarial token insertion, delivering 14% higher ASR than standard GCG and 42% higher against defenses.

RWGBench: Evaluating Scholarly Positioning in Related Work Generation

cs.DL · 2026-05-30 · unverdicted · novelty 7.0

RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.

Teaching Language Models to Check Grounded Claim Factuality with Human Test-Taking Strategies

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

Prompting LLMs with test-taking strategies for true/false factuality checks reduces tokens by over 80%, matches strong baselines on two benchmarks with SOTA on one, and enables fine-tuned SLMs to perform similarly at low cost with rationales.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

GoR extracts citation DAGs using position, frequency, predecessor links and time, then fine-tunes Qwen2.5-7B on 498 seed papers to generate ideas, claiming SOTA over gpt-4o baselines via LLM judges.

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

DSR uses transformer models to detect sentiment targets in text and score them along three theory-motivated axes, with validation showing correlations to existing social science datasets.

RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

cs.CL · 2026-04-30 · conditional · novelty 7.0

RSAT uses SFT on verified traces followed by GRPO with NLI faithfulness rewards to make 1-8B models produce verifiable table reasoning with cell citations, raising faithfulness 3.7x to 0.826.

Just Pass Twice: Efficient Token Classification with LLMs for Zero-Shot NER

cs.CL · 2026-04-06 · unverdicted · novelty 7.0

JPT enables bidirectional token classification in causal LLMs for zero-shot NER via input concatenation plus definition-guided embeddings, delivering +7.9 F1 gains and over 20x speedup on benchmarks.

The Indra Representation Hypothesis for Multimodal Alignment

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.

Group Representational Position Encoding

cs.LG · 2025-12-08 · unverdicted · novelty 7.0

GRAPE unifies RoPE and ALiBi as special cases of group actions on positions, providing a principled design space for positional encodings via SO(d) rotations and GL unipotent transformations.

When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA

cs.CV · 2025-11-03 · conditional · novelty 7.0

QA-SNNE adds question-answer alignment via bilateral gating to semantic nearest neighbor entropy, yielding higher AUROC for uncertainty detection in surgical VQA models under both standard and rephrased questions.

Know Your Source: A Public Knowledge Store for Media Background Checks

cs.CL · 2026-07-02 · unverdicted · novelty 6.0

MEDIAREF is a publicly available knowledge store of documents from 200 media sources that enables low-cost, reproducible evaluation of media background check generation for fact-checking systems.

DPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View Transformers

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

DPPE decouples rotation and translation in camera positional encodings for multi-view transformers to resolve late-stage training stagnation and improve generalization in novel view synthesis.

Multi-Source Transfer Learning of Sparse Single-Index Models

stat.ME · 2026-06-28 · unverdicted · novelty 6.0

Proposes a source-data-free transfer learning framework for sparse single-index models that transfers generalized Stein's lemma summaries and uses a guided MLP for nonlinear adaptation.

ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

cs.AI · 2026-06-17 · unverdicted · novelty 6.0 · 2 refs

ITNet frames convolution, attention, and recurrence as special cases of one learnable integral transform with an MLP kernel and shows a single shared operator plus modality encoders matches specialized models on ImageNet-1K, GLUE, ModelNet40, VQA v2, and NLVR2.

Selection Integrity for LLM Graph Memory: An Accumulability Criterion for Information-Flow-Blind Retrieval

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.

Constrained Paraphrase Consistency for LLM Hallucination Detection

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

CCHD formulates hallucination detector training as constrained optimization with paraphrase-consistency and label-preservation rules solved via gradient descent-ascent, outperforming baselines on factuality benchmarks.

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.

citing papers explorer

Showing 19 of 69 citing papers.

Revisiting Semantic Role Labeling: Efficient Structured Inference with Dependency-Informed Analysis cs.CL · 2026-05-04 · unverdicted · none · ref 49 · internal anchor
A new encoder-based SRL system with dependency-informed analysis delivers 10x faster inference and comparable or better F1 scores using BERT, RoBERTa, and DeBERTa while supporting multilingual projection.
VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering cs.IR · 2026-01-16 · unverdicted · none · ref 56 · internal anchor
VerifAI is an open-source biomedical QA system that decomposes generated answers into claims and verifies them with a fine-tuned NLI engine to reduce hallucinations and provide traceable citations.
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning cs.LG · 2025-05-16 · unverdicted · none · ref 16 · internal anchor
TokUR estimates token-level uncertainty via low-rank weight perturbations in LLMs, aggregates signals to correlate with correctness, and uses them to improve reasoning performance on math tasks.
Toward General and Robust LLM-enhanced Text-attributed Graph Learning cs.LG · 2025-04-03 · unverdicted · none · ref 6 · internal anchor
UltraTAG organizes LLM-GNN methods for text-attributed graphs; UltraTAG-S adds LLM text propagation, augmentation, PageRank node selection, and edge reconfiguration to improve robustness on sparse data, with reported gains of 2.12% and 17.47%.
Semantic Embeddings of Chemical Elements for Enhanced Materials Inference and Discovery cs.CL · 2025-02-19 · unverdicted · none · ref 36 · internal anchor
ElementBERT generates literature-derived semantic embeddings for chemical elements that outperform empirical descriptors in alloy property prediction and optimization tasks with up to 23% accuracy gains.
CoVStream: Edge-Cloud Collaboration for Understanding of Long Video Streams cs.CV · 2026-06-22 · unverdicted · none · ref 59 · internal anchor
CoVStream is an edge-cloud system that distills long videos into features and captions to cut bandwidth 87.6% while retaining 99.2% of full-cloud accuracy on LVBench.
From Sentiment to Actionable Insights: A Data-Driven Public Sentiment Analysis of Advanced Air Mobility cs.CL · 2026-06-18 · unverdicted · none · ref 39 · internal anchor
Applies standard sentiment classifiers and topic modeling to a large AAM discussion corpus, identifies six clusters of public concern, and lists strategies to address them.
Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content cs.LG · 2026-05-28 · unverdicted · none · ref 16 · internal anchor
Opir introduces efficient multi-task encoder models trained on a 996-category safety taxonomy that match or exceed larger baselines on most safety benchmarks while using under 100M parameters for edge variants.
MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning cs.CL · 2026-05-08 · unverdicted · none · ref 4 · internal anchor
MIPIAD reports a hybrid Qwen-TF-IDF ensemble defense that reaches F1 0.9205 and reduces the English-Bangla performance gap on a 1.43-million-sample synthetic benchmark derived from BIPIA templates.
BiMind: A Dual-Head Reasoning Model with Attention-Geometry Adapter for Incorrect Information Detection cs.CL · 2026-04-07 · unverdicted · none · ref 4 · internal anchor
BiMind outperforms existing methods in incorrect information detection by disentangling content and knowledge reasoning with attention geometry adaptation and self-retrieval.
Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models cs.CR · 2026-04-07 · unverdicted · none · ref 41 · internal anchor
Encoder-based LLMs detect SDN intrusions with decisions driven by meaningful traffic behaviors, as validated by attribution analysis aligning with established intrusion principles.
LLMs Struggle with Abstract Meaning Comprehension More Than Expected cs.CL · 2026-04-13 · unverdicted · none · ref 11 · internal anchor
LLMs struggle with abstract meaning comprehension on SemEval-2021 Task 4 more than fine-tuned models, and a new bidirectional attention classifier yields small accuracy gains of 3-4%.
Predicting User Satisfaction in Online Education Platforms: A Large Language Model Based Multi-Modal Review Mining Framework cs.GR · 2026-04-13 · unverdicted · none · ref 7 · internal anchor
An LLM multi-modal system integrates topic modeling, transformer sentiment, and behavioral features to predict MOOC learner satisfaction more accurately than single-modality baselines.
Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 26 · internal anchor
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
Findings of the Counter Turing Test: AI-Generated Text Detection cs.CL · 2026-05-20 · unverdicted · none · ref 22 · 2 links · internal anchor
Shared task findings show near-perfect binary detection of AI-generated text but greater difficulty in attributing outputs to particular language models.
Bridging Language Models and Financial Analysis q-fin.ST · 2025-03-14 · unverdicted · none · ref 37 · internal anchor
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving cs.NI · 2026-04-13 · unreviewed · ref 18 · 2 links · internal anchor
Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike cs.CL · 2026-03-16 · unreviewed · ref 13 · internal anchor
Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs cs.SE · 2025-09-22 · unreviewed · ref 16 · internal anchor

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer