super hub Mixed citations

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang · 2019 · Proceedings of the 2019 Conference of the North · DOI 10.18653/v1/n19-1423

Mixed citation behavior. Most common role is background (68%).

279 Pith papers citing it

6,639 external citations · Crossref

Background 68% of classified citations

open at publisher browse 279 citing papers more from Jacob Devlin

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 25 method 7 dataset 1 other 1

citation-polarity summary

background 23 use method 7 unclear 3 use dataset 1

claims ledger

background The retrieval system only manages to fetch informationabout Fleming's professional achievements in the discoveryof penicillin. However, the document does not provide informa-tion about his educational background, thus the model generates ahallucinatory answer. inappropriately activated, blindly retrieving inaccurate information and consequently leading to an undesirable response. Consequently, several studies [75, 204, 228, 378] have proposed to make a shift from passive retrieval to adaptive re

authors

Jacob Devlin Kenton Lee Kristina Toutanova Ming-Wei Chang

co-cited works

representative citing papers

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes

cs.CL · 2026-06-01 · conditional · novelty 8.0

FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.

Reachability and asymptotics of Gaussian Transformer dynamics

cs.LG · 2026-05-29 · unverdicted · novelty 8.0

Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

cs.AI · 2026-05-18 · accept · novelty 8.0

QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Locating and Editing Factual Associations in GPT

cs.CL · 2022-02-10 · accept · novelty 8.0

Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

Evaluation Pitfalls and Challenges in Multimedia Event Extraction

cs.CL · 2026-06-25 · unverdicted · novelty 7.0

A systematic analysis of evaluation practices in multimedia event extraction reveals that minor methodological choices cause large performance swings and overestimation of cross-modal grounding ability.

Structure Before Collapse: Transient semantic geometry in next-token prediction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.

Understanding Parallel Samplers in Masked Diffusion via Random Walks on Graphs

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

Graph random walks provide a verifiable sandbox for diagnosing parallel samplers in masked diffusion models, showing performance depends on graph structure and introducing a new exact bisection sampler.

How Does Research Evolve? Tracing Cross-Domain Trajectories in NLP, ML, and CV with Claim-Grounded Typed Citations

cs.CL · 2026-06-21 · unverdicted · novelty 7.0

SciTraj is the first claim-grounded typed citation graph with 32,559 papers and 573,126 edges across six relation types, plus a temporally split link-prediction benchmark.

OVIG: Optimistic Verification of AI Training Integrity via Gradient Signals

cs.CR · 2026-06-19 · unverdicted · novelty 7.0

OVIG introduces an optimistic gradient-based verification framework for outsourced AI post-training that uses stride-sampled interval checks against an honest-replay boundary to achieve 0% attack success rate with low overhead.

Structured Inference with Large Language Gibbs

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

Large Language Gibbs uses LLM next-token conditionals as MCMC transition operators for iterative resampling of structured variables, aiming to produce a stationary distribution that compromises across all local conditionals.

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

cs.LG · 2026-06-16 · conditional · novelty 7.0

CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

AfriSUD supplies new SUD-annotated dependency treebanks for nine Sub-Saharan African languages and demonstrates that existing models exhibit clear limitations on their syntax.

WorldReasoner: Evaluating Whether Language Model Agents Forecast Events with Valid Reasoning

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

WorldReasoner supplies 345 resolved forecasting tasks built from 14,141 articles to score LM agents on outcome quality, evidence quality, and reasoning quality against time-bounded evidence and hindsight graphs.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

cs.DB · 2026-06-06 · unverdicted · novelty 7.0

An adaptive two-phase semantic filter using clustering then a hybrid proxy trained on LLM confidence achieves 1.6-2.0x speedup over prior methods at 90% accuracy on 10K document corpora.

Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

A cycle-consistent MT pipeline generates and similarity-weights training data for coreference resolution, producing gains on four low-resource languages and enabling the task where no corpora existed.

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

ClinicalMC is a benchmark of 1,275 Chinese and 5,804 English multi-course clinical samples across four stages, evaluated via a multi-agent framework on closed-source, open-source, and medical LLMs in static and dynamic settings.

EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.

Learning Coherent Representations: A Topological Approach to Interpretability

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Introduces coherence as a topological constraint on representations and the Coh objective to enforce geometric clustering for interpretability in neural networks.

citing papers explorer

Showing 50 of 279 citing papers.

To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Learning, Except In Heavy Truncation Scenarios cs.LG · 2026-05-15 · unverdicted · none · ref 31
Truncated embeddings from non-MRL models perform comparably to or better than MRL-trained models for most truncation levels, except heavy truncation of 80% or more.
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation cs.LG · 2026-05-15 · unverdicted · none · ref 5
Sparsity-guided distillation enables replacing attention layers in ViTs with simpler sequential modules, with sparser layers showing smaller performance drops.
Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction cs.CL · 2026-05-13 · unverdicted · none · ref 9
Edit-level majority voting on multiple LLM-generated candidates reduces over-correction in grammatical error correction and outperforms greedy and MBR decoding on nine multilingual benchmarks while remaining stable to prompt variations.
The Efficiency Gap in Byte Modeling cs.LG · 2026-05-13 · unverdicted · none · ref 51
Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.
Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets cs.CL · 2026-05-06 · unverdicted · none · ref 23
An evidence-based model generates queries from query-free datasets, yielding summaries with competitive ROUGE scores to those using original queries.
A Hybrid Method for Low-Resource Named Entity Recognition cs.CE · 2026-05-06 · unverdicted · none · ref 3
The hybrid method with LLM-augmented data achieves F1 improvements of 7-24 points over baselines on five Vietnamese domain datasets.
Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages cs.CL · 2026-05-04 · unverdicted · none · ref 71
Biaffine LSTM outperforms transformer parsers like AfroXLMR and RemBERT in low-resource dependency parsing, with transformers gaining advantage as data increases and morphological complexity as a secondary predictor.
Revisiting Semantic Role Labeling: Efficient Structured Inference with Dependency-Informed Analysis cs.CL · 2026-05-04 · unverdicted · none · ref 31
A new encoder-based SRL system with dependency-informed analysis delivers 10x faster inference and comparable or better F1 scores using BERT, RoBERTa, and DeBERTa while supporting multilingual projection.
A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents cs.CL · 2026-04-23 · unverdicted · none · ref 253
MODEE is a multimodal system that integrates graphs with LLM embeddings to outperform prior open-domain event extraction methods on large datasets.
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion cs.AI · 2026-04-23 · unverdicted · none · ref 122
GS-Quant generates coarse-to-fine discrete codes for KG entities via semantic hierarchy injection and causal sequence reconstruction, enabling LLMs to perform knowledge graph completion by treating the codes as vocabulary tokens.
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel cs.CV · 2026-04-22 · unverdicted · none · ref 43
LaplacianFormer uses a Laplacian kernel with an injective feature map and efficient approximations to achieve linear attention that preserves mid-range interactions better than Gaussian-based linear attention in vision transformers.
Commonsense Knowledge with Negation: A Resource to Enhance Negation Understanding cs.CL · 2026-04-21 · unverdicted · none · ref 37
Augmenting commonsense knowledge corpora with negation produces over 2M new triples that benefit LLM negation understanding when used for pre-training.
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts cs.SE · 2026-04-20 · conditional · none · ref 11
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
Exploring Concreteness Through a Figurative Lens cs.CL · 2026-04-20 · unverdicted · none · ref 77
LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.
Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 9
Fine-tuned recurrent models like Mamba2 produce competitive text embeddings with linear-time constant-memory inference via vertical chunking, outperforming transformers in memory use.
Modeling Human Perspectives with Socio-Demographic Representations cs.CL · 2026-04-20 · unverdicted · none · ref 6
Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.
Language, Place, and Social Media: Geographic Dialect Alignment in New Zealand cs.CL · 2026-04-17 · unverdicted · none · ref 75
New Zealand Reddit users link language to place and form contiguous speech communities with complex geographic alignment; Word2Vec embeddings reveal semantic variations and shifts in NZ English on a 4.26 billion word corpus.
ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering cs.SE · 2026-04-15 · unverdicted · none · ref 7
ToxiShield delivers a real-time GitHub extension with a BERT toxicity detector at 98% accuracy, a Claude-based coach, and a fine-tuned Llama reframer at 95% style transfer accuracy, validated by a 10-person TAM study.
InsightFlow: LLM-Driven Synthesis of Patient Narratives for Mental Health into Causal Models cs.CL · 2026-04-14 · unverdicted · none · ref 7
LLMs generate 5P causal graphs from 46 psychotherapy intake transcripts that match human expert graphs in structure and meaning, with moderate clinical usefulness ratings.
RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement cs.CR · 2026-04-08 · unverdicted · none · ref 7
RefineRAG achieves 90% attack success on NQ by generating toxic seeds then optimizing them via retriever-in-the-loop word refinement, outperforming prior methods on effectiveness and naturalness.
From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement cs.AI · 2026-04-07 · unverdicted · none · ref 6
A neurosymbolic pipeline extracts predicates from offer texts with an LLM and validates them via Logic Tensor Networks, delivering performance comparable to standard models plus built-in interpretability on a real corpus.
Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs cs.CL · 2026-04-02 · unverdicted · none · ref 12
LLMs produce overly positive idealized depictions of disability in simulated social media posts that do not match real posts by people with disabilities and show topic bias favoring nondisabled people.
ESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction cs.CL · 2026-03-29 · conditional · none · ref 38
ESGLens applies RAG and LLM embeddings to extract GRI-aligned information from ESG reports and achieves 0.48 Pearson correlation when regressing environmental scores on 300 company reports.
Sentiment Classification of Gaza War Headlines: A Comparative Analysis of Large Language Models and Arabic Fine-Tuned BERT Models cs.CL · 2026-03-18 · unverdicted · none · ref 8
LLMs classify Gaza War headlines as strongly negative while fine-tuned Arabic BERT models favor neutral labels, producing measurable non-random divergences in sentiment distributions.
Can Semantic Methods Enhance Team Sports Tactics? A Methodology for Football with Broader Applications cs.AI · 2026-01-01 · unverdicted · none · ref 3
A semantic vector-space model is proposed for encoding football tactics and team profiles to compute alignment and strategy recommendations using distance metrics.
LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation cs.LG · 2025-10-24 · unverdicted · none · ref 29
LLM4Delay improves flight delay prediction accuracy by using instance-level projection to adapt LLMs for integrating textual aeronautical information with multiple aircraft trajectories.
Search-R3: Unifying Reasoning and Embedding in Large Language Models cs.CL · 2025-10-08 · unverdicted · none · ref 15
Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models cs.CL · 2025-06-02 · unverdicted · none · ref 9
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
MSMO-ABSA: Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis cs.CL · 2025-02-19 · unverdicted · none · ref 11
MSMO framework achieves claimed SOTA cross-lingual ABSA via sentence- and aspect-level alignment, code-switching, consistency training, and knowledge distillation.
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions cs.AI · 2025-01-27 · unverdicted · none · ref 12
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference cs.CL · 2024-12-18 · unverdicted · none · ref 133
ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.
Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges cs.CL · 2024-12-17 · unverdicted · none · ref 15
XTransplant empirically shows that cross-lingual latent transplantation yields mutual benefits for multilingual capability and cultural adaptability in LLMs, especially low-resource ones, while revealing underutilized model potential.
Efficient Model Repository for Entity Resolution: Construction, Search, and Integration cs.DB · 2024-12-12 · unverdicted · none · ref 7
MoRER builds an ER model repository via feature distribution clustering of tasks, achieving competitive results with limited labels versus active learning, transfer learning, and self-supervised methods on three multi-source datasets.
Mitigating Extrinsic Gender Bias for Bangla Classification Tasks cs.CL · 2024-11-16 · unverdicted · none · ref 9
Constructs gender-perturbed Bangla classification benchmarks and proposes RandSymKL debiasing that reduces extrinsic gender bias in pretrained models.
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions cs.CL · 2023-11-09 · unverdicted · none · ref 75
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning cs.CL · 2023-08-07 · unverdicted · none · ref 20
LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.
Towards General Text Embeddings with Multi-stage Contrastive Learning cs.CL · 2023-08-07 · unverdicted · none · ref 101
GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.
BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs cs.AI · 2023-06-06 · unverdicted · none · ref 38
BioBLP is a modular embedding framework for multimodal biomedical KGs supporting heterogeneous attributes and missing data, with a pretraining strategy that improves results on drug-protein interaction prediction especially for low-degree entities.
StarCoder: may the source be with you! cs.CL · 2023-05-09 · accept · none · ref 119
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
Text Embeddings by Weakly-Supervised Contrastive Pre-training cs.CL · 2022-12-07 · unverdicted · none · ref 18
E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 5
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
A Comparative Study on Affective Cues in Text Embeddings Across Psychological Emotion Theories cs.CL · 2026-06-27 · unverdicted · none · ref 12
Open-weight instruction-aware encoders capture equal or greater affective information than proprietary models at word level across emotion theories, while task-tuned and proprietary encoders perform best on sentence-level classification.
NEURON-Fabric: Architecture-Runtime Co-Design for Controlled Low-Bit Gradient Communication cs.DC · 2026-06-24 · unverdicted · none · ref 8
NEURON-Fabric provides a profile-guided runtime for controlled low-bit gradient communication that preserves accuracy near full-precision levels while reducing modeled communication traffic across vision, transformer, and language model workloads.
AI-PAVE-Br: Leveraging Large Language Models for Enhanced Product Attribute Value Extraction through a Golden Set Approach cs.CL · 2026-06-23 · unverdicted · none · ref 31
AI-PAVE-Br applies LLMs with prompt engineering to outperform NER baselines on Portuguese product attribute extraction and releases the Golden Set as a new benchmark dataset.
IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources cs.CL · 2026-06-18 · unverdicted · none · ref 1
Trains a 125M-parameter Persian PLM on a curated 45GB corpus using vector semantic deduplication for domain balance, topping QA and NLI benchmarks while remaining competitive on NER and classification.
Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit cs.CL · 2026-06-02 · unverdicted · none · ref 10
Fine-tuned RoBERTa achieves 0.62 macro-F1 on 900 Reddit comments, outperforming best zero-shot LLM at 0.50, with largest gap on detecting belief propagation.
AI-Based KPI Prediction Methods in Future 6G Networks: A Survey eess.SY · 2026-06-01 · unverdicted · none · ref 61
A systematic literature survey that classifies data-driven KPI prediction methods for 6G networks across KPI type, data source, protocol stack layer, horizon, model family, and objective.
The Neuromorphic Supremacy q-bio.NC · 2026-06-01 · unverdicted · none · ref 68
Hybrid neuromorphic-ANN models outperform standard deep learning on few-shot benchmarks and under occlusion/impulse noise via astrocytic modulation and spiking dynamics.
Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment cs.LG · 2026-05-31 · unverdicted · none · ref 170
The paper calls for life cycle assessment to capture embodied hardware costs and full pipeline operational costs in AI development and deployment.
Better heads do not guarantee better binarized constituency parsing cs.CL · 2026-05-27 · unverdicted · none · ref 8
Learned heads improve intrinsic head prediction but fail to deliver consistent punctuation-sensitive parsing gains after binarization and debinarization.

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer