super hub Mixed citations

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Danqi Chen, Jingfei Du, Mandar Joshi, Myle Ott, Naman Goyal, Yinhan Liu · 2019 · cs.CL · arXiv 1907.11692

Mixed citation behavior. Most common role is background (65%).

416 Pith papers citing it

Background 65% of classified citations

open full Pith review browse 416 citing papers more from Danqi Chen arXiv PDF

abstract

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 48 method 12 baseline 5 dataset 3

citation-polarity summary

background 44 use method 12 baseline 5 support 3 use dataset 3 unclear 1

claims ledger

abstract Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it

authors

Danqi Chen Jingfei Du Mandar Joshi Myle Ott Naman Goyal Yinhan Liu

co-cited works

representative citing papers

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

cs.IR · 2024-03-06 · unverdicted · novelty 8.0

BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

cs.CL · 2022-02-25 · accept · novelty 8.0

Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

cs.CL · 2020-12-31 · conditional · novelty 8.0

The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.

Measuring Massive Multitask Language Understanding

cs.CY · 2020-09-07 · accept · novelty 8.0

Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

cs.CL · 2020-03-23 · conditional · novelty 8.0

ELECTRA replaces masked language modeling with replaced token detection, yielding contextual representations that outperform BERT at equal compute and match larger models like RoBERTa with far less compute.

REALM: Retrieval-Augmented Language Model Pre-Training

cs.CL · 2020-02-10 · accept · novelty 8.0

REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

cs.CL · 2019-08-27 · unverdicted · novelty 8.0

Sentence-BERT adapts BERT with siamese and triplet networks to produce sentence embeddings for efficient cosine-similarity comparisons, cutting computation time from hours to seconds on similarity search while matching BERT accuracy.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

DEPO formulates detector-evasive paraphrasing as a constrained MDP and solves it via Lagrangian primal-dual RL with GRPO-style updates to achieve evasion while satisfying a semantic-preservation constraint.

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Introduces (ε,q,t,A)-behavioral indistinguishability and shows via Qwen/Llama experiments that LoRA distillation boosts semantic similarity but leaves detectable behavioral differences under adversarial evaluation.

GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

GRUFF dataset shows LLMs agree well with masculine and feminine German pronouns but fail on neopronouns and distractors, with occupational stereotypes poorly correlated across cases.

Towards Cost-effective LLMs Routing with Batch Prompting

cs.DB · 2026-05-27 · unverdicted · novelty 7.0

RoBatch is a two-stage framework that formulates and solves the joint Route with Batching Problem via a batch-aware proxy utility model and greedy scheduling, outperforming separate routing or batching baselines on six benchmarks.

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

cs.CR · 2026-05-22 · unverdicted · novelty 7.0

An RL-guided MCTS proof search for Tamarin finds more and shorter proofs than standard search across 16 protocol models.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

stat.ML · 2026-05-14 · accept · novelty 7.0

A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Chain-based Distillation constructs a sequence of anchor models to enable efficient initialization of variable-sized SLMs through interpolation, with bridge distillation for cross-architecture transfer, yielding better performance than scratch training.

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

GLoRA replaces raw factor averaging with gauge-aware aggregation in a consensus subspace estimated from client projectors, enabling consistent low-rank federated LoRA under heterogeneity.

Evaluating Non-English Developer Support in Machine Learning for Software Engineering

cs.SE · 2026-05-07 · unverdicted · novelty 7.0

Code LLMs generate substantially worse comments outside English, and no tested automatic metric or LLM judge reliably matches human assessment of those outputs.

citing papers explorer

Showing 50 of 53 citing papers after filters.

Discovering Latent Knowledge in Language Models Without Supervision cs.CL · 2022-12-07 · conditional · none · ref 18 · internal anchor
An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.
SimCSE: Simple Contrastive Learning of Sentence Embeddings cs.CL · 2021-04-18 · conditional · none · ref 101 · internal anchor
SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.
The Pile: An 800GB Dataset of Diverse Text for Language Modeling cs.CL · 2020-12-31 · conditional · none · ref 47 · internal anchor
The Pile is a newly constructed 825 GiB dataset from 22 diverse sources that enables language models to achieve better performance on academic, professional, and cross-domain tasks than models trained on Common Crawl variants.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators cs.CL · 2020-03-23 · conditional · none · ref 6 · internal anchor
ELECTRA replaces masked language modeling with replaced token detection, yielding contextual representations that outperform BERT at equal compute and match larger models like RoBERTa with far less compute.
Is She Even Relevant? When BERT Ignores Explicit Gender Cues cs.CL · 2026-05-08 · conditional · none · ref 60 · internal anchor
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction cs.CR · 2026-04-09 · conditional · none · ref 46 · internal anchor
Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them without hurting clean accuracy.
Sell More, Play Less: Benchmarking LLM Realistic Selling Skill cs.CL · 2026-04-08 · conditional · none · ref 24 · internal anchor
SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.
The Curse of Recursion: Training on Generated Data Makes Models Forget cs.LG · 2023-05-27 · conditional · none · ref 7 · internal anchor
Use of model-generated content in training causes irreversible loss of distribution tails, termed model collapse, in VAEs, GMMs, and LLMs.
QLoRA: Efficient Finetuning of Quantized LLMs cs.LG · 2023-05-23 · conditional · none · ref 38 · internal anchor
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 56 · internal anchor
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Language Is Not All You Need: Aligning Perception with Language Models cs.CL · 2023-02-27 · conditional · none · ref 14 · internal anchor
Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale cs.LG · 2022-08-15 · conditional · none · ref 146 · internal anchor
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
An Explanation of In-context Learning as Implicit Bayesian Inference cs.CL · 2021-11-03 · conditional · none · ref 24 · internal anchor
In-context learning emerges as implicit Bayesian inference of latent concepts when pretraining data has long-range coherence, proven for mixture-of-HMM distributions and replicated on the synthetic GINC dataset.
Prefix-Tuning: Optimizing Continuous Prompts for Generation cs.CL · 2021-01-01 · conditional · none · ref 18 · internal anchor
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps cs.CL · 2020-11-02 · conditional · none · ref 9 · internal anchor
Introduces 2WikiMultiHopQA, a multi-hop QA dataset with explicit evidence chains generated via templates and Wikidata logical rules to force and evaluate multi-hop reasoning.
Unsupervised Cross-lingual Representation Learning at Scale cs.CL · 2019-11-05 · conditional · none · ref 5 · internal anchor
XLM-R, pretrained on 100 languages with 2TB of CommonCrawl data, improves average XNLI accuracy by 14.6 points and MLQA F1 by 13 points over mBERT while matching strong monolingual models on GLUE.
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation cs.CL · 2026-05-21 · conditional · none · ref 195 · internal anchor
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning cs.LG · 2026-05-20 · conditional · none · ref 31 · internal anchor
ChunkFT enables full-parameter fine-tuning of Llama 3-8B on one 24 GB GPU and Llama 3-70B on two 80 GB GPUs by streaming gradients over dynamically activated sub-tensors.
Learning When to Adapt cs.LG · 2026-05-18 · conditional · none · ref 29 · internal anchor
DISeL augments standard LoRA with per-input gates over rank-one updates to reduce catastrophic forgetting during fine-tuning while adding few parameters.
A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research cs.CL · 2026-05-15 · conditional · none · ref 26 · internal anchor
A RoBERTa classifier trained on LLM-generated manner/result verb annotations from extended VerbNet data reaches up to 89.6% accuracy on held-out gold-standard sets.
Structure-Aware Masking for Protein Representation Learning cs.LG · 2026-05-15 · conditional · none · ref 4 · internal anchor
Bucket Masking improves protein fitness prediction by up to 14% over random masking by preferentially masking structurally coupled residue groups on four downstream tasks.
Context-Aware Spear Phishing: Generative AI-Enabled Attacks Against Individuals via Public Social Media Data cs.CR · 2026-05-11 · conditional · none · ref 69 · internal anchor
Generative AI enables scalable, context-aware spear phishing by extracting profiles from public social media, producing emails that outperform real-world phishing samples in personalization and lower recipient suspicion.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 155 · internal anchor
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning cs.CV · 2026-04-22 · conditional · none · ref 18 · internal anchor
First TrOCR adaptation for Tigrinya achieves 0.22% CER and 97.2% exact match using tokenizer extension plus Word-Aware Loss Weighting on 5000 synthetic GLOCR images.
MetaDent: Labeling Clinical Images for Vision-Language Models in Dentistry cs.CV · 2026-04-16 · conditional · none · ref 46 · internal anchor
MetaDent supplies a new annotated dental image dataset and shows that current VLMs achieve only moderate accuracy with inconsistent descriptions on fine-grained intraoral scenes.
Continual Distillation of Teachers from Different Domains cs.LG · 2026-04-10 · conditional · none · ref 21 · internal anchor
SE2D stabilizes continual distillation across heterogeneous teachers by preserving logits on external unlabeled data to mitigate unseen knowledge forgetting.
Entities as Retrieval Signals: A Systematic Study of Coverage, Supervision, and Evaluation in Entity-Oriented Ranking cs.IR · 2026-04-06 · conditional · none · ref 10 · internal anchor
Entity signals cover only 19.7% of relevant documents on Robust04 and no configuration among 443 systems improves MAP by more than 0.05 in open-world evaluation, despite gains when entities are pre-restricted.
Conversational Control with Ontologies for Large Language Models: A Lightweight Framework for Constrained Generation cs.CL · 2026-04-06 · conditional · none · ref 17 · internal anchor
Ontology-based constraints combined with hybrid fine-tuning enable consistent control over LLM conversational outputs on proficiency and polarity tasks, outperforming baselines across seven models.
Flow Map Language Models: One-step Language Modeling via Continuous Denoising cs.CL · 2026-02-18 · conditional · none · ref 68 · 2 links · internal anchor
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion cs.IR · 2025-12-26 · conditional · none · ref 34 · internal anchor
Intermediate decoder hidden states from frozen LVLMs fused with ID embeddings outperform caption representations and deliver state-of-the-art micro-video recommendation performance on two real-world benchmarks.
TriagerX: Dual Transformers for Bug Triaging Tasks with Content and Interaction Based Rankings cs.SE · 2025-08-23 · conditional · none · ref 69 · internal anchor
TriagerX combines dual-transformer content rankings with developer interaction history to improve top-k accuracy for developer and component recommendations in bug triaging across five datasets.
Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token cs.CL · 2025-07-31 · conditional · none · ref 9 · internal anchor
Causal2Vec prepends a BERT-generated contextual token to decoder-only LLMs and pools its hidden state with the EOS token to reach new SOTA on MTEB among public-data-trained embedding models.
Memory-Efficient Differentially Private Training with Gradient Random Projection cs.LG · 2025-06-18 · conditional · none · ref 15 · internal anchor
DP-GRAPE reduces memory in differentially private neural network training by using random Gaussian projections on gradients instead of SVD, achieving comparable privacy-utility tradeoffs to DP-SGD and scaling to 6.7B parameter models.
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting cs.CV · 2025-06-01 · conditional · none · ref 36 · internal anchor
AuralSAM2 fuses audio-visual features via a pyramid-based AuralFuser module and audio-guided contrastive loss to improve promptable segmentation accuracy in SAM2 with minimal efficiency impact.
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models cs.CL · 2023-09-21 · conditional · none · ref 42 · internal anchor
Bootstrapping math questions via rewriting creates MetaMathQA; fine-tuning LLaMA-2 on it yields 66.4% on GSM8K for 7B and 82.3% for 70B, beating prior same-size models by large margins.
MiniLLM: On-Policy Distillation of Large Language Models cs.CL · 2023-06-14 · conditional · none · ref 14 · internal anchor
MiniLLM distills large language models into smaller ones via reverse KL divergence and on-policy optimization, yielding higher-quality responses with lower exposure bias than standard KD baselines.
CodeT5+: Open Code Large Language Models for Code Understanding and Generation cs.CL · 2023-05-13 · conditional · none · ref 21 · internal anchor
CodeT5+ is a flexible encoder-decoder LLM family for code pretrained with diverse objectives on multilingual corpora and initialized from existing LLMs, achieving state-of-the-art results on code generation, completion, math programming, and retrieval tasks including new SoTA on HumanEval with the 1
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 87 · internal anchor
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation cs.CL · 2021-09-02 · conditional · none · ref 19 · internal anchor
CodeT5 adds identifier-aware pre-training and bimodal dual generation to a T5-style encoder-decoder, yielding better results on defect detection, clone detection, and code-to-text, text-to-code, and code-to-code tasks than prior encoder-only or decoder-only models.
Aligning AI With Shared Human Values cs.CY · 2020-08-05 · conditional · none · ref 21 · internal anchor
Introduces ETHICS benchmark showing current language models have promising but incomplete ability to predict basic human ethical judgments on text scenarios.
Linformer: Self-Attention with Linear Complexity cs.LG · 2020-06-08 · conditional · none · ref 8 · internal anchor
Linformer approximates self-attention with a low-rank projection to achieve O(n) time and space complexity while matching Transformer accuracy on standard NLP tasks.
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts cs.SE · 2026-04-20 · conditional · none · ref 29 · internal anchor
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs cs.IR · 2026-03-30 · conditional · none · ref 27 · internal anchor
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
ESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction cs.CL · 2026-03-29 · conditional · none · ref 27 · internal anchor
ESGLens applies RAG and LLM embeddings to extract GRI-aligned information from ESG reports and achieves 0.48 Pearson correlation when regressing environmental scores on 300 company reports.
Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction cs.IR · 2025-10-13 · conditional · none · ref 34 · internal anchor
DMF adds target-aware bridging features and an inference-optimized decoupled attention layer to combine modality-centric and modality-enriched user interest modeling for CTR prediction.
BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes cs.CL · 2025-09-19 · conditional · none · ref 18 · internal anchor
Directly fine-tuning the value bias (b_v) in transformer projections outperforms fine-tuning b_q or b_k for downstream performance in low-data regimes across multiple LLM architectures.
ALIEN: Aligned Entropy Head for Improving Uncertainty Estimation of LLMs cs.CL · 2025-05-21 · conditional · none · ref 25 · internal anchor
ALIEN trains a lightweight uncertainty head initialized to model entropy and refined via supervised regularization to improve detection of incorrect predictions and calibration on classification and NER tasks.
Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition cs.AI · 2026-04-19 · conditional · none · ref 19 · internal anchor
Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.
A Semi-Automated Annotation Workflow for Paediatric Histopathology Reports Using Small Language Models cs.CL · 2026-04-05 · conditional · none · ref 39 · internal anchor
Small language models extract structured information from paediatric renal biopsy reports at up to 84.3% accuracy on CPU hardware with minimal clinician review.
Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection cs.LG · 2025-11-26 · conditional · none · ref 27 · internal anchor
DistilBERT reaches AUC 1.0000 and F1 0.9981 for BEC detection while CatBoost reaches AUC 0.9860 and F1 0.9382, with a three-way cost-sensitive policy optimizing under a 1:5167 false-negative to false-positive ratio.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer