ViLegalNLI is the first 42k-pair Vietnamese legal NLI dataset built via semi-automatic LLM-assisted generation and validation.
hub
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
60 Pith papers cite this work. Polarity classification is still indexing.
abstract
Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions, respectively. Second, an enhanced mask decoder is used to incorporate absolute positions in the decoding layer to predict the masked tokens in model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve models' generalization. We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). Notably, we scale up DeBERTa by training a larger version that consists of 48 Transform layers with 1.5 billion parameters. The significant performance boost makes the single DeBERTa model surpass the human performance on the SuperGLUE benchmark (Wang et al., 2019a) for the first time in terms of macro-average score (89.9 versus 89.8), and the ensemble DeBERTa model sits atop the SuperGLUE leaderboard as of January 6, 2021, out performing the human baseline by a decent margin (90.3 versus 89.8).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.
RoFormer introduces rotary position embeddings that encode absolute positions via rotation matrices and relative dependencies in attention, outperforming prior position methods on long text classification tasks.
VSE perturbs images only to probe visual ambiguity in VLMs, clusters outputs into semantic prototypes, and computes mass-weighted dispersion, outperforming prior entropy methods on five VQA benchmarks across five models.
RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.
Prompting LLMs with test-taking strategies for true/false factuality checks reduces tokens by over 80%, matches strong baselines on two benchmarks with SOTA on one, and enables fine-tuned SLMs to perform similarly at low cost with rationales.
RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.
GoR extracts citation DAGs using position, frequency, predecessor links and time, then fine-tunes Qwen2.5-7B on 498 seed papers to generate ideas, claiming SOTA over gpt-4o baselines via LLM judges.
DSR uses transformer models to detect sentiment targets in text and score them along three theory-motivated axes, with validation showing correlations to existing social science datasets.
RSAT uses SFT on verified traces followed by GRPO with NLI faithfulness rewards to make 1-8B models produce verifiable table reasoning with cell citations, raising faithfulness 3.7x to 0.826.
JPT enables bidirectional token classification in causal LLMs for zero-shot NER via input concatenation plus definition-guided embeddings, delivering +7.9 F1 gains and over 20x speedup on benchmarks.
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
GRAPE unifies RoPE and ALiBi as special cases of group actions on positions, providing a principled design space for positional encodings via SO(d) rotations and GL unipotent transformations.
QA-SNNE adds question-answer alignment via bilateral gating to semantic nearest neighbor entropy, yielding higher AUROC for uncertainty detection in surgical VQA models under both standard and rephrased questions.
Clotho ranks LLM test inputs by failure likelihood using pre-generation hidden states and GMMs, achieving 0.716 ROC-AUC after labeling 5.4% of inputs on average across eight tasks and three models, with transfer to proprietary models.
DPPE decouples rotation and translation in camera positional encodings for multi-view transformers to resolve late-stage training stagnation and improve generalization in novel view synthesis.
Proposes a source-data-free transfer learning framework for sparse single-index models that transfers generalized Stein's lemma summaries and uses a guided MLP for nonlinear adaptation.
ITNet frames convolution, attention, and recurrence as special cases of one learnable integral transform with an MLP kernel and shows a single shared operator plus modality encoders matches specialized models on ImageNet-1K, GLUE, ModelNet40, VQA v2, and NLVR2.
Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.
GHI introduces an incidence-based structural reasoning layer using Graphormer on conditioned hypergraphs for ABSA, reporting outperformance on SemEval benchmarks, near-parity with 11B models at 247M parameters, and robustness on ARTS.
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
Generative AI enables scalable, context-aware spear phishing by extracting profiles from public social media, producing emails that outperform real-world phishing samples in personalization and lower recipient suspicion.
InfoPDF uses mutual information to suppress noise in LLM-generated synthetic propagation graphs and adaptively fuse them with real data, yielding more discriminative representations for fake news detection.
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
citing papers explorer
-
Discovering Latent Knowledge in Language Models Without Supervision
An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.