super hub Mixed citations

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang · 2019 · Proceedings of the 2019 Conference of the North · DOI 10.18653/v1/n19-1423

Mixed citation behavior. Most common role is background (68%).

260 Pith papers citing it

6,639 external citations · Crossref

Background 68% of classified citations

open at publisher browse 260 citing papers more from Jacob Devlin

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 25 method 7 dataset 1 other 1

citation-polarity summary

background 23 use method 7 unclear 3 use dataset 1

claims ledger

background The retrieval system only manages to fetch informationabout Fleming's professional achievements in the discoveryof penicillin. However, the document does not provide informa-tion about his educational background, thus the model generates ahallucinatory answer. inappropriately activated, blindly retrieving inaccurate information and consequently leading to an undesirable response. Consequently, several studies [75, 204, 228, 378] have proposed to make a shift from passive retrieval to adaptive re

authors

Jacob Devlin Kenton Lee Kristina Toutanova Ming-Wei Chang

co-cited works

representative citing papers

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes

cs.CL · 2026-06-01 · conditional · novelty 8.0

FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.

Reachability and asymptotics of Gaussian Transformer dynamics

cs.LG · 2026-05-29 · unverdicted · novelty 8.0

Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

cs.AI · 2026-05-18 · accept · novelty 8.0

QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Locating and Editing Factual Associations in GPT

cs.CL · 2022-02-10 · accept · novelty 8.0

Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

Evaluation Pitfalls and Challenges in Multimedia Event Extraction

cs.CL · 2026-06-25 · unverdicted · novelty 7.0

A systematic analysis of evaluation practices in multimedia event extraction reveals that minor methodological choices cause large performance swings and overestimation of cross-modal grounding ability.

Structure Before Collapse: Transient semantic geometry in next-token prediction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.

Understanding Parallel Samplers in Masked Diffusion via Random Walks on Graphs

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

Graph random walks provide a verifiable sandbox for diagnosing parallel samplers in masked diffusion models, showing performance depends on graph structure and introducing a new exact bisection sampler.

How Does Research Evolve? Tracing Cross-Domain Trajectories in NLP, ML, and CV with Claim-Grounded Typed Citations

cs.CL · 2026-06-21 · unverdicted · novelty 7.0

SciTraj is the first claim-grounded typed citation graph with 32,559 papers and 573,126 edges across six relation types, plus a temporally split link-prediction benchmark.

OVIG: Optimistic Verification of AI Training Integrity via Gradient Signals

cs.CR · 2026-06-19 · unverdicted · novelty 7.0

OVIG introduces an optimistic gradient-based verification framework for outsourced AI post-training that uses stride-sampled interval checks against an honest-replay boundary to achieve 0% attack success rate with low overhead.

Structured Inference with Large Language Gibbs

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

Large Language Gibbs uses LLM next-token conditionals as MCMC transition operators for iterative resampling of structured variables, aiming to produce a stationary distribution that compromises across all local conditionals.

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

cs.LG · 2026-06-16 · conditional · novelty 7.0

CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

AfriSUD supplies new SUD-annotated dependency treebanks for nine Sub-Saharan African languages and demonstrates that existing models exhibit clear limitations on their syntax.

WorldReasoner: Evaluating Whether Language Model Agents Forecast Events with Valid Reasoning

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

WorldReasoner supplies 345 resolved forecasting tasks built from 14,141 articles to score LM agents on outcome quality, evidence quality, and reasoning quality against time-bounded evidence and hindsight graphs.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

cs.DB · 2026-06-06 · unverdicted · novelty 7.0

An adaptive two-phase semantic filter using clustering then a hybrid proxy trained on LLM confidence achieves 1.6-2.0x speedup over prior methods at 90% accuracy on 10K document corpora.

Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

A cycle-consistent MT pipeline generates and similarity-weights training data for coreference resolution, producing gains on four low-resource languages and enabling the task where no corpora existed.

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

ClinicalMC is a benchmark of 1,275 Chinese and 5,804 English multi-course clinical samples across four stages, evaluated via a multi-agent framework on closed-source, open-source, and medical LLMs in static and dynamic settings.

EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.

Learning Coherent Representations: A Topological Approach to Interpretability

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Introduces coherence as a topological constraint on representations and the Coh objective to enforce geometric clustering for interpretability in neural networks.

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

cs.LG · 2026-05-29 · conditional · novelty 7.0

Repetition rate mismatch between small-scale proxies and target budgets is the main reason data mixture experiments do not scale; a subsampling procedure that equalizes repetition rates recovers optimal mixtures from 1/16-scale experiments.

citing papers explorer

Showing 1 of 1 citing paper after filters.

BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts cs.AI · 2026-05-12 · conditional · none · ref 40
BadSKP poisons graph node embeddings to steer soft prompts in KG-enhanced LLMs, achieving high attack success rates where text-channel backdoors fail due to semantic anchoring.

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer