super hub Mixed citations

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang · 2019 · Proceedings of the 2019 Conference of the North · DOI 10.18653/v1/n19-1423

Mixed citation behavior. Most common role is background (68%).

279 Pith papers citing it

6,639 external citations · Crossref

Background 68% of classified citations

open at publisher browse 279 citing papers more from Jacob Devlin

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 25 method 7 dataset 1 other 1

citation-polarity summary

background 23 use method 7 unclear 3 use dataset 1

claims ledger

background The retrieval system only manages to fetch informationabout Fleming's professional achievements in the discoveryof penicillin. However, the document does not provide informa-tion about his educational background, thus the model generates ahallucinatory answer. inappropriately activated, blindly retrieving inaccurate information and consequently leading to an undesirable response. Consequently, several studies [75, 204, 228, 378] have proposed to make a shift from passive retrieval to adaptive re

authors

Jacob Devlin Kenton Lee Kristina Toutanova Ming-Wei Chang

co-cited works

representative citing papers

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes

cs.CL · 2026-06-01 · conditional · novelty 8.0

FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.

Reachability and asymptotics of Gaussian Transformer dynamics

cs.LG · 2026-05-29 · unverdicted · novelty 8.0

Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

cs.AI · 2026-05-18 · accept · novelty 8.0

QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Locating and Editing Factual Associations in GPT

cs.CL · 2022-02-10 · accept · novelty 8.0

Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

Evaluation Pitfalls and Challenges in Multimedia Event Extraction

cs.CL · 2026-06-25 · unverdicted · novelty 7.0

A systematic analysis of evaluation practices in multimedia event extraction reveals that minor methodological choices cause large performance swings and overestimation of cross-modal grounding ability.

Structure Before Collapse: Transient semantic geometry in next-token prediction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.

Understanding Parallel Samplers in Masked Diffusion via Random Walks on Graphs

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

Graph random walks provide a verifiable sandbox for diagnosing parallel samplers in masked diffusion models, showing performance depends on graph structure and introducing a new exact bisection sampler.

How Does Research Evolve? Tracing Cross-Domain Trajectories in NLP, ML, and CV with Claim-Grounded Typed Citations

cs.CL · 2026-06-21 · unverdicted · novelty 7.0

SciTraj is the first claim-grounded typed citation graph with 32,559 papers and 573,126 edges across six relation types, plus a temporally split link-prediction benchmark.

OVIG: Optimistic Verification of AI Training Integrity via Gradient Signals

cs.CR · 2026-06-19 · unverdicted · novelty 7.0

OVIG introduces an optimistic gradient-based verification framework for outsourced AI post-training that uses stride-sampled interval checks against an honest-replay boundary to achieve 0% attack success rate with low overhead.

Structured Inference with Large Language Gibbs

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

Large Language Gibbs uses LLM next-token conditionals as MCMC transition operators for iterative resampling of structured variables, aiming to produce a stationary distribution that compromises across all local conditionals.

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

cs.LG · 2026-06-16 · conditional · novelty 7.0

CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

AfriSUD supplies new SUD-annotated dependency treebanks for nine Sub-Saharan African languages and demonstrates that existing models exhibit clear limitations on their syntax.

WorldReasoner: Evaluating Whether Language Model Agents Forecast Events with Valid Reasoning

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

WorldReasoner supplies 345 resolved forecasting tasks built from 14,141 articles to score LM agents on outcome quality, evidence quality, and reasoning quality against time-bounded evidence and hindsight graphs.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

cs.DB · 2026-06-06 · unverdicted · novelty 7.0

An adaptive two-phase semantic filter using clustering then a hybrid proxy trained on LLM confidence achieves 1.6-2.0x speedup over prior methods at 90% accuracy on 10K document corpora.

Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

A cycle-consistent MT pipeline generates and similarity-weights training data for coreference resolution, producing gains on four low-resource languages and enabling the task where no corpora existed.

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

ClinicalMC is a benchmark of 1,275 Chinese and 5,804 English multi-course clinical samples across four stages, evaluated via a multi-agent framework on closed-source, open-source, and medical LLMs in static and dynamic settings.

EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.

Learning Coherent Representations: A Topological Approach to Interpretability

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Introduces coherence as a topological constraint on representations and the Coh objective to enforce geometric clustering for interpretability in neural networks.

citing papers explorer

Showing 50 of 279 citing papers.

CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification cs.CL · 2025-10-19 · unverdicted · none · ref 44
CoGate-LSTM adds prototype-guided cosine feature-space gating to a character-level BiLSTM with multi-source embeddings and focal loss, reaching 0.881 macro-F1 on Jigsaw toxic comments while using 7.3M parameters and outperforming fine-tuned BERT by 6.9 points on minority labels.
Curriculum-guided multimodal representation learning enables generalizable prediction of nanomaterial-protein interactions cs.LG · 2025-07-18 · conditional · none · ref 18
CuMMI applies curriculum learning across progressively complex biofluids to a multimodal model integrating protein sequence, structure, and 37 experimental features, achieving mean classification metrics above 0.75 on temporal, nanomaterial-held-out, and protein-held-out tests.
Should We Still Pretrain Encoders with Masked Language Modeling? cs.CL · 2025-07-01 · accept · none · ref 10
Controlled ablations of 38 models find MLM superior to CLM on representation benchmarks while CLM offers better data efficiency and stability; a biphasic CLM-then-MLM schedule is optimal under fixed compute and improves when initialized from pretrained CLM models.
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products cs.CV · 2025-05-28 · unverdicted · none · ref 6
Proposes ACH module with differentiable sampling and softsign normalization for efficient feature expansion, integrated via NAS into Hadaptive-Net to claim SOTA accuracy/speed trade-offs on image classification.
Rotary Masked Autoencoders are Versatile Learners cs.LG · 2025-05-26 · unverdicted · none · ref 15
RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.
Can Large Language Models Really Recognize Your Name? cs.CR · 2025-05-20 · unverdicted · none · ref 18
LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.
Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths cs.IR · 2024-11-08 · unverdicted · none · ref 7
HyPE improves generative retrieval by first generating hierarchical category paths for explainability and then using path-aware ranking to boost performance.
How Good is Your Wikipedia? Auditing Data Quality for Low-resource and Multilingual NLP cs.CL · 2024-11-08 · unverdicted · none · ref 18
The study filters non-English Wikipedia, reveals quality problems, proposes a 4-level ranking, and shows filtered data matches or beats raw data in language modeling with largest gains for lower-quality editions.
Llemma: An Open Language Model For Mathematics cs.CL · 2023-10-16 · unverdicted · none · ref 61
Continued pretraining of Code Llama on Proof-Pile-2 yields Llemma, an open math-specialized LLM that beats known open base models on MATH and supports tool use plus formal proving out of the box.
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs cs.CL · 2023-10-03 · conditional · none · ref 75
FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.
Demystifying CLIP Data cs.CV · 2023-09-28 · accept · none · ref 174
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models cs.CL · 2023-09-07 · conditional · none · ref 75
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings cs.CL · 2023-05-23 · unverdicted · none · ref 7
TaDSE learns dialogue sentence embeddings via template-guided self-supervised contrastive learning plus synthetic slot-filling augmentation and reports gains on five downstream benchmarks.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations cs.CL · 2023-05-23 · conditional · none · ref 35
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models cs.CL · 2023-04-13 · accept · none · ref 52
AGIEval shows GPT-4 exceeding average human scores on SAT Math at 95% and Chinese college entrance English at 92.5%, while revealing weaker results on complex reasoning tasks.
BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 29
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning cs.CL · 2023-03-18 · unverdicted · none · ref 5
AdaLoRA uses SVD-based pruning to allocate the parameter budget for low-rank fine-tuning updates according to per-matrix importance scores, yielding better performance than uniform allocation especially under tight budgets.
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them cs.CL · 2022-10-17 · accept · none · ref 8
Chain-of-thought prompting enables large language models to surpass average human performance on 17 of 23 challenging BIG-Bench tasks.
Efficient Training of Language Models to Fill in the Middle cs.CL · 2022-07-28 · unverdicted · none · ref 105
Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 36
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling cs.CL · 2021-12-15 · unverdicted · none · ref 12
Semantic constituency graphs outperform syntactic constituency and dependency structures from seven formalisms when added to a Transformer for language modeling.
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation cs.CL · 2021-08-27 · unverdicted · none · ref 6
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.
Inductive Entity Representations from Text via Link Prediction cs.CL · 2020-10-07 · unverdicted · none · ref 9
Entity representations learned from text via link prediction generalize to unseen entities and transfer to classification and retrieval with reported gains of 22% MRR, 16% accuracy, and 8.8% NDCG@10.
X-LogSMask: Expand Transformer for Graph-Structured Data cs.LG · 2026-07-02 · unverdicted · none · ref 4
X-LogSMask injects per-head powers of the normalized adjacency matrix via a logarithmic transform into Transformer attention, achieving SOTA results on 13 of 20 graph benchmarks while remaining competitive in a one-layer setup.
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training cs.LG · 2026-07-01 · unverdicted · none · ref 5
MTCL learns multi-scale temporal correlations in videos via contrastive learning to produce more informative representations that improve sample efficiency and performance in downstream RL tasks.
Mixture-of-Control: State-Aware Fine-Tuning for Transformer-based Models cs.LG · 2026-06-30 · unverdicted · none · ref 20
Mixture-of-Control adaptively combines local and global control states in transformer fine-tuning by treating per-block states as experts in a sparse MoE setup to improve cross-block communication while keeping memory and compute costs comparable to prior state-based methods.
Persona-Trained Monte Carlo: Estimating Market-Outcome Distributions via Swarms of Persona-Conditioned Neural Policy Bots in a Limit Order Book cs.LG · 2026-06-28 · unverdicted · none · ref 51
PTMC is a proposed Monte Carlo estimator that generates market-outcome distributions by simulating continuous double-auction interactions among persona-conditioned neural-policy bots whose heterogeneity is drawn from a learned distribution.
Learning Interpretable Text Signals for Structured Responses stat.AP · 2026-06-24 · unverdicted · none · ref 3
Joint NMF and binomial regression learns response-relevant text signals with competitive performance on simulations and review data.
Learning Moral Diversity: Modelling Individual Perspectives in Moral Classification of Texts cs.CL · 2026-06-22 · unverdicted · none · ref 13
Extending language models with annotator-specific layers improves individual moral annotation predictions and reveals perspective variations hidden by label aggregation.
Objective-Behavior Alignment: Diagnostics for MORL Policy Selection cs.LG · 2026-06-19 · unverdicted · none · ref 83
Proposes an exploratory diagnostic workflow to highlight behavioral variation along MORL Pareto fronts not captured by objective values, with validation on grid and continuous control tasks.
Toten: A Knowledge-Based System For Structure-Preserving Representation Of Physical Quantities And Technical Notation In Brazilian Portuguese cs.AI · 2026-06-17 · unverdicted · none · ref 15
TOTEN is a knowledge-based system for structure-preserving representation of physical quantities and technical notation in Brazilian Portuguese using an ontology of engineering entities and external authorities, outperforming statistical baselines in atomicity and reconstruction.
Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining cs.CL · 2026-06-17 · unverdicted · none · ref 28
ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.
PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes cs.CL · 2026-06-17 · unverdicted · none · ref 23
Presents PEC-Home dataset for elliptical smart-home commands and shows LLMs achieve lower execution accuracy on elliptical inputs than complete commands even with dialogue history access.
Conservation Laws for Modern Neural Architectures cs.LG · 2026-06-16 · unverdicted · none · ref 13
Unified framework characterizes conservation laws for gradient flow in feedforward networks with GELU/SiLU/SwiGLU, multihead attention with positional encodings, and MoE models under various gating.
AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction cs.AI · 2026-06-11 · unverdicted · none · ref 3
The authors created and released AAbAAC, an annotated corpus of 115 abstracts for autoimmunity information extraction, and showed NER performance gains after fine-tuning models on it.
Analysis of the Neglect-Zero Effect in Large Language Models cs.CL · 2026-06-04 · unverdicted · none · ref 13
LLMs do not exhibit the neglect-zero effect in structural priming tasks unlike humans.
Rethinking the Idiomaticity Decomposability Hypothesis: Evidence from Distributional Learning cs.CL · 2026-06-02 · unverdicted · none · ref 23
Language models show idiom decomposability correlates weakly with human judgments, negatively with syntactic flexibility, and contributes most strongly to representation stabilization during training alongside surprisal and frequency.
Sample-Size Scaling of the African Languages NLI Evaluation cs.CL · 2026-06-02 · unverdicted · none · ref 14
Scaling NLI performance with sample size in African languages is language-dependent and frequently non-monotonic, with saturation or declines observed in some cases.
Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt cs.CL · 2026-06-01 · unverdicted · none · ref 2
Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.
ROGLE: Robust Global-Local Alignment with Automated Region Supervision for Text-Based Person Search cs.CV · 2026-06-01 · unverdicted · none · ref 174 · 2 links
ROGLE introduces automated pseudo region-sentence pairs via RSM and multi-granular learning to boost fine-grained alignment in text-based person search, plus the P-VLG benchmark with over 100k annotated regions.
Sequence models reveal diagnosis accumulation pathways beyond comorbidity burden in population-scale hospital data physics.soc-ph · 2026-05-29 · unverdicted · none · ref 26
Sequence embeddings from diagnosis histories improve prediction of 93 of 131 incident disease blocks and event-free survival beyond age, sex, and comorbidity burden in large-scale hospital data.
Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning cs.IR · 2026-05-26 · unverdicted · none · ref 11
Eliot is a query-time clustering and temporal visualization system for arXiv literature, evaluated via offline metrics on eight domains and a user survey showing 85% meaningful cluster labels.
DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation cs.CV · 2026-05-25 · unverdicted · none · ref 10
DuoGesture introduces a dual-stream architecture for co-speech gesture generation that decouples semantic and beat streams via a stochastic gate and biomechanical regularization, claiming better performance than holistic baselines.
Hide to Guide: Learning via Semantic Masking cs.LG · 2026-05-24 · unverdicted · none · ref 41
SMEPO applies fine-grained semantic masking to expert guidance in RLVR, turning hard problems into fill-in-the-blank tasks while preserving structure, yielding up to 3.2 point accuracy gains and 4.2x faster training.
Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking cs.RO · 2026-05-22 · unverdicted · none · ref 40 · 2 links
Any2Any transfers humanoid whole-body tracking models across embodiments via kinematic alignment followed by targeted PEFT, matching full-training performance with 1% of the data and compute on tested platforms.
GradeLegal: Automated Grading for German Legal Cases cs.CL · 2026-05-20 · unverdicted · none · ref 13
Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.
Toto 2.0: Time Series Forecasting Enters the Scaling Era cs.LG · 2026-05-19 · unverdicted · none · ref 15 · 2 links
Time series foundation models scale under a single training recipe, with forecast quality improving from 4M to 2.5B parameters and new SOTA results on BOOM, GIFT-Eval, and TIME benchmarks.
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling cs.CL · 2026-05-19 · unverdicted · none · ref 45 · 2 links
PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.
$M^3 QuestionIng$: Multi-modal Multi-span Medical Question Answering cs.IR · 2026-05-19 · unverdicted · none · ref 11
Proposes a multi-modal multi-span medical QA framework and new dataset that outputs answers containing both text and relevant images.
TextClusterLab: An Integrated Framework for Reliable Text Clustering Studies cs.IR · 2026-05-17 · unverdicted · none · ref 10
TextClusterLab introduces an LLM-driven generator for synthetic text clustering datasets with tunable attributes and a suitability benchmark for evaluation.

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer