super hub Mixed citations

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang · 2019 · Proceedings of the 2019 Conference of the North · DOI 10.18653/v1/n19-1423

Mixed citation behavior. Most common role is background (68%).

281 Pith papers citing it

6,639 external citations · Crossref

Background 68% of classified citations

open at publisher browse 281 citing papers more from Jacob Devlin

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 25 method 7 dataset 1 other 1

citation-polarity summary

background 23 use method 7 unclear 3 use dataset 1

claims ledger

background The retrieval system only manages to fetch informationabout Fleming's professional achievements in the discoveryof penicillin. However, the document does not provide informa-tion about his educational background, thus the model generates ahallucinatory answer. inappropriately activated, blindly retrieving inaccurate information and consequently leading to an undesirable response. Consequently, several studies [75, 204, 228, 378] have proposed to make a shift from passive retrieval to adaptive re

authors

Jacob Devlin Kenton Lee Kristina Toutanova Ming-Wei Chang

co-cited works

representative citing papers

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes

cs.CL · 2026-06-01 · conditional · novelty 8.0

FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.

Reachability and asymptotics of Gaussian Transformer dynamics

cs.LG · 2026-05-29 · unverdicted · novelty 8.0

Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

cs.AI · 2026-05-18 · accept · novelty 8.0

QSTRBench is a new benchmark evaluating LLMs on compositional reasoning, converse relations, and conceptual neighbourhoods across QSTR calculi including a newly published RCC-22 CN, showing models exceed chance but fail to achieve consistent correctness.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Locating and Editing Factual Associations in GPT

cs.CL · 2022-02-10 · accept · novelty 8.0

Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

Evaluation Pitfalls and Challenges in Multimedia Event Extraction

cs.CL · 2026-06-25 · unverdicted · novelty 7.0

A systematic analysis of evaluation practices in multimedia event extraction reveals that minor methodological choices cause large performance swings and overestimation of cross-modal grounding ability.

Structure Before Collapse: Transient semantic geometry in next-token prediction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.

Understanding Parallel Samplers in Masked Diffusion via Random Walks on Graphs

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

Graph random walks provide a verifiable sandbox for diagnosing parallel samplers in masked diffusion models, showing performance depends on graph structure and introducing a new exact bisection sampler.

How Does Research Evolve? Tracing Cross-Domain Trajectories in NLP, ML, and CV with Claim-Grounded Typed Citations

cs.CL · 2026-06-21 · unverdicted · novelty 7.0

SciTraj is the first claim-grounded typed citation graph with 32,559 papers and 573,126 edges across six relation types, plus a temporally split link-prediction benchmark.

OVIG: Optimistic Verification of AI Training Integrity via Gradient Signals

cs.CR · 2026-06-19 · unverdicted · novelty 7.0

OVIG introduces an optimistic gradient-based verification framework for outsourced AI post-training that uses stride-sampled interval checks against an honest-replay boundary to achieve 0% attack success rate with low overhead.

Structured Inference with Large Language Gibbs

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

Large Language Gibbs uses LLM next-token conditionals as MCMC transition operators for iterative resampling of structured variables, aiming to produce a stationary distribution that compromises across all local conditionals.

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

cs.LG · 2026-06-16 · conditional · novelty 7.0

CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

AfriSUD supplies new SUD-annotated dependency treebanks for nine Sub-Saharan African languages and demonstrates that existing models exhibit clear limitations on their syntax.

WorldReasoner: Evaluating Whether Language Model Agents Forecast Events with Valid Reasoning

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

WorldReasoner supplies 345 resolved forecasting tasks built from 14,141 articles to score LM agents on outcome quality, evidence quality, and reasoning quality against time-bounded evidence and hindsight graphs.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

cs.DB · 2026-06-06 · unverdicted · novelty 7.0

An adaptive two-phase semantic filter using clustering then a hybrid proxy trained on LLM confidence achieves 1.6-2.0x speedup over prior methods at 90% accuracy on 10K document corpora.

Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

A cycle-consistent MT pipeline generates and similarity-weights training data for coreference resolution, producing gains on four low-resource languages and enabling the task where no corpora existed.

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

ClinicalMC is a benchmark of 1,275 Chinese and 5,804 English multi-course clinical samples across four stages, evaluated via a multi-agent framework on closed-source, open-source, and medical LLMs in static and dynamic settings.

EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.

Learning Coherent Representations: A Topological Approach to Interpretability

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Introduces coherence as a topological constraint on representations and the Coh objective to enforce geometric clustering for interpretability in neural networks.

citing papers explorer

Showing 31 of 281 citing papers.

Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment cs.LG · 2026-05-31 · unverdicted · none · ref 170
The paper calls for life cycle assessment to capture embodied hardware costs and full pipeline operational costs in AI development and deployment.
Better heads do not guarantee better binarized constituency parsing cs.CL · 2026-05-27 · unverdicted · none · ref 8
Learned heads improve intrinsic head prediction but fail to deliver consistent punctuation-sensitive parsing gains after binarization and debinarization.
Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation cs.CL · 2026-05-14 · unverdicted · none · ref 12
A context-aware synthetic augmentation framework with a hybrid clinical-language model improves psychological defense mechanism classification to 58.26% accuracy and 24.62% macro-F1 in low-resource conditions, outperforming the DMRS Co-Pilot baseline.
A Formative Study of Brief Affective Text as a Complement to Wearable Sensing for Longitudinal Student Health Monitoring cs.HC · 2026-05-14 · unverdicted · none · ref 15
Ultra-brief student concern texts analyzed with NLP associate with lower physical activity during academic concern weeks and poorer sleep plus lower heart rate variability during emotional exhaustion weeks, complementing wearable sensing.
Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation cs.CL · 2026-05-08 · unverdicted · none · ref 39
AI models for automated short answer scoring show substantial mid-range quality degradation in expert agreement that improves with greater task-specific adaptation.
Gyan: An Explainable Neuro-Symbolic Language Model cs.CL · 2026-05-06 · unverdicted · none · ref 1 · 2 links
Gyan is a novel explainable non-transformer language model that achieves SOTA results on multiple datasets by mimicking human-like compositional context and world models.
A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation cs.CL · 2026-05-01 · unverdicted · none · ref 44
Systematic experiments show that text decomposition methods and privacy budget allocation strategies produce significantly different privacy-utility trade-offs even under comparable total epsilon budgets.
K-CARE: Knowledge-driven Symmetrical Contextual Anchoring and Analogical Prototype Reasoning for E-commerce Relevance cs.IR · 2026-04-28 · unverdicted · none · ref 4
K-CARE uses behavior-derived anchoring and expert prototype analogies to ground LLMs and improve relevance on knowledge-intensive e-commerce cases.
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning cs.SE · 2026-04-20 · unverdicted · none · ref 17
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.
A Compact Hybrid Convolution--Frequency State Space Network for Learned Image Compression cs.CV · 2025-11-25 · unverdicted · none · ref 25
HCFSSNet uses convolutional layers plus a Vision Frequency State Space block with omni-directional scanning and frequency reweighting to reach competitive rate-distortion performance in learned image compression.
Investigating the Representation of Backchannels and Fillers in Fine-tuned Language Models cs.CL · 2025-09-24 · unverdicted · none · ref 15
Fine-tuning on annotated English and Japanese dialogues improves clustering of backchannels and fillers and makes generated utterances closer to human ones.
The Master-Slave Encoder Model for Improving Patent Text Summarization: A New Approach to Combining Specifications and Claims cs.CL · 2024-11-21 · unverdicted · none · ref 32
MSEA uses a master-slave encoder architecture on patent specifications and claims, enhanced with pointer networks and repetition suppression, to generate better summaries as measured by small ROUGE score gains.
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions cs.AI · 2024-08-23 · unverdicted · none · ref 176
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.
PaliGemma: A versatile 3B VLM for transfer cs.CV · 2024-07-10 · unverdicted · none · ref 37
PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model cs.CL · 2024-04-03 · unverdicted · none · ref 7
Four MAFT-based PLMs for Angolan languages report 12.3-point gains over AfroXLMR-base and 3.8-point gains over OFA baselines on downstream tasks.
Evaluating Chunking Strategies for Retrieval-Augmented Generation on Academic Texts cs.IR · 2026-07-02 · unverdicted · none · ref 58
Cluster-based semantic chunking does not outperform fixed-size or recursive chunking for RAG on academic theses, and RAGAs faithfulness shows limited reliability in this setup.
Heterogeneous Neural Predictivity from Language Models During Naturalistic Comprehension cs.CL · 2026-06-25 · unverdicted · none · ref 13
Frozen language model features predict held-out neural activity in naturalistic comprehension across multiple brain recording modalities with gains over low-level baselines in many but not all sources, after extensive controls.
PortBERT: Navigating the Depths of Portuguese Language Models cs.CL · 2026-06-01 · unverdicted · none · ref 65
PortBERT releases two RoBERTa models for Portuguese that match or beat prior monolingual and multilingual models on translated GLUE/SuperGLUE tasks while reporting training and inference times.
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges cs.CL · 2026-05-19 · unverdicted · none · ref 16
A literature survey synthesizing benchmarks, architectures, training strategies, and evaluation methods for mathematical reasoning in LLMs, based on roughly 120 papers.
ICT-NLP at SemEval-2026 Task 3: Less Is More -- Multilingual Encoder with Joint Training and Adaptive Ensemble for Dimensional Aspect Sentiment Regression cs.CL · 2026-05-11 · unverdicted · none · ref 3
A lightweight multilingual encoder system with joint training and adaptive ensemble achieves top-half rankings across datasets in SemEval-2026 dimensional aspect sentiment regression.
CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse cs.CL · 2026-05-04 · unverdicted · none · ref 19
An LLM ensemble reached 80 macro-F1 on 3-class clarity detection and 59 on 9-class evasion detection, with partial layer unfreezing and multilingual ensembles improving encoder results while enriched context helped only LLMs.
SG-UniBuc-NLP at SemEval-2026 Task 6: Multi-Head RoBERTa with Chunking for Long-Context Evasion Detection cs.CL · 2026-04-29 · unverdicted · none · ref 8
A multi-head RoBERTa model with overlapping chunking and max-pooling achieves Macro-F1 of 0.80 on 3-way clarity classification and 0.51 on 9-way evasion strategy detection, ranking 11th in both subtasks of SemEval-2026 Task 6.
A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies cs.DC · 2026-04-24 · unverdicted · none · ref 32
MPS can boost performance up to 30% and cut energy 20% with careful provisioning but degrades sharply under memory contention, whereas MIG delivers steadier gains through hardware isolation at the cost of higher overhead and occasional performance losses.
Multilingual Vision-Language Models, A Survey cs.CL · 2025-09-26 · accept · none · ref 39
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding cs.CV · 2025-08-28 · unverdicted · none · ref 153
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.
Benchmark Data Contamination of Large Language Models: A Survey cs.CL · 2024-06-06 · unverdicted · none · ref 34
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
Probing Classifiers: Promises, Shortcomings, and Advances cs.CL · 2021-02-24 · unverdicted · none · ref 59
Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling cs.CL · 2026-05-07 · unverdicted · none · ref 36 · 2 links
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.
Sentiment Analysis of AI Adoption in Indonesian Higher Education Using Machine Learning and Transformer-Based Models cs.CL · 2026-04-30 · unverdicted · none · ref 6
DistilBERT achieves 84.78% accuracy and 84.75% F1-score on binary sentiment classification of Indonesian student opinions about AI in higher education, outperforming SVM at 82.14%.
Using reasoning LLMs to extract SDOH events from clinical notes cs.CL · 2026-04-15 · unreviewed · ref 11
Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models cs.SE · 2025-06-23 · unreviewed · ref 25

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer