hub Canonical reference

Quantifying attention flow in transformers

URLhttps://api · 2019 · DOI 10.18653/v1/2020

Canonical reference. 85% of citing Pith papers cite this work as background.

32 Pith papers citing it

Background 85% of classified citations

open at publisher browse 32 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 12 method 1

citation-polarity summary

background 11 extend 1 unclear 1

representative citing papers

Pretraining Exposure Explains Popularity Judgments in Large Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution

cs.NE · 2026-05-10 · unverdicted · novelty 7.0

QD-LLM evolves prompt embeddings via neuroevolution in a quality-diversity framework, delivering 46% higher coverage and 41% higher QD-score than prior methods on coding and writing benchmarks.

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.

LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

LASQ is a new quadruple extraction dataset for Uzbek and Uyghur that includes a syntax-aware model showing gains over baselines on the task.

Scaling Laws for Cross-Encoder Reranking

cs.IR · 2026-03-05 · unverdicted · novelty 7.0

Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.

Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs

cs.SE · 2025-09-22 · unverdicted · novelty 7.0

Clotho ranks LLM test inputs by failure likelihood using pre-generation hidden states and GMMs, achieving 0.716 ROC-AUC after labeling 5.4% of inputs on average across eight tasks and three models, with transfer to proprietary models.

The Challenge and Reward of Fair Play in Narrative: A Computational Approach

cs.CL · 2025-07-18 · unverdicted · novelty 7.0

Develops an information-theoretic framework showing surprise and coherence trade off in single reader models but coexist via pre- and post-revelation modes, operationalized as reference-less LLM metrics for fair play and validated on generated stories plus classic detective fiction.

Accelerating Large Language Model Decoding with Speculative Sampling

cs.CL · 2023-02-02 · accept · novelty 7.0

Speculative sampling accelerates LLM decoding 2-2.5x by letting a draft model propose short sequences that the target model scores in parallel, then applies modified rejection sampling to keep the exact target distribution.

Multitask Prompted Training Enables Zero-Shot Task Generalization

cs.LG · 2021-10-15 · conditional · novelty 7.0

Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Causal analysis of LLMs finds standard bias metrics overestimate demographic effects due to context toxicity, with Western models showing higher refusal rates for certain groups and Eastern models showing targeted regional sensitivities.

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

CIR: Lightweight Container Image for Cross-Platform Deployment

cs.DC · 2026-04-12 · unverdicted · novelty 6.0

CIR is a cross-platform container image format for Python/R-style apps that defers dependency assembly to deployment, cutting image size by 95% and deployment time by 40-60% versus traditional bundled images.

Context-Aware Dialectal Arabic Machine Translation with Interactive Region and Register Selection

cs.CL · 2026-04-07 · unverdicted · novelty 6.0

A metadata-conditioned mT5 model trained on rule-augmented dialectal Arabic data produces translations that better match intended regional varieties than high-resource baselines, despite lower BLEU scores.

JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections

cs.IR · 2026-04-07 · accept · novelty 6.0

JU'A is a new heterogeneous benchmark for Brazilian legal IR that distinguishes retrieval methods and shows domain-adapted models excel on aligned subsets while BM25 stays competitive elsewhere.

TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

TimelineReasoner applies large reasoning models in a Global Cognition plus Detail Exploration loop to produce more accurate, complete, and coherent timelines from news than prior LLM-based methods.

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

cs.CL · 2026-02-18 · conditional · novelty 6.0 · 2 refs

Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.

SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass

cs.CL · 2026-02-06 · unverdicted · novelty 6.0

SHINE trains a scalable in-context hypernetwork to generate high-quality LoRA adapters from contexts in one pass, enabling efficient LLM adaptation that saves time and compute compared to standard fine-tuning.

Deep sequence models tend to memorize geometrically; it is unclear why

cs.LG · 2025-10-30 · unverdicted · novelty 6.0

Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning

cs.CL · 2025-09-26 · conditional · novelty 6.0

CoSpaDi introduces a training-free sparse dictionary learning framework for post-training LLM compression that optimizes functional reconstruction error via activation-derived orthonormalization and achieves improved accuracy-compression trade-offs over SVD and pruning baselines.

CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

CLIF applies influence functions to pinpoint influential training samples and key concepts in Concept Bottleneck Models, enabling data debugging and behavioral insights on CEBaB and Yelp datasets.

A Decomposed Retrieval-Edit-Rerank Framework for Chord Generation

cs.SD · 2026-05-08 · unverdicted · novelty 5.0

The RER framework decomposes chord generation into retrieval, editing, and reranking stages to outperform end-to-end models in balancing stylistic diversity with music-theoretic feasibility.

pAI/MSc: ML Theory Research with Humans on the Loop

cs.AI · 2026-04-22 · unverdicted · novelty 5.0

pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.

citing papers explorer

Showing 32 of 32 citing papers.

Pretraining Exposure Explains Popularity Judgments in Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 29
LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.
Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution cs.NE · 2026-05-10 · unverdicted · none · ref 15
QD-LLM evolves prompt embeddings via neuroevolution in a quality-diversity framework, delivering 46% higher coverage and 41% higher QD-score than prior methods on coding and writing benchmarks.
Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention cs.CV · 2026-04-30 · unverdicted · none · ref 44
Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.
LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset cs.CL · 2026-04-12 · unverdicted · none · ref 56
LASQ is a new quadruple extraction dataset for Uzbek and Uyghur that includes a syntax-aware model showing gains over baselines on the task.
Scaling Laws for Cross-Encoder Reranking cs.IR · 2026-03-05 · unverdicted · none · ref 28
Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.
Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs cs.SE · 2025-09-22 · unverdicted · none · ref 10
Clotho ranks LLM test inputs by failure likelihood using pre-generation hidden states and GMMs, achieving 0.716 ROC-AUC after labeling 5.4% of inputs on average across eight tasks and three models, with transfer to proprietary models.
The Challenge and Reward of Fair Play in Narrative: A Computational Approach cs.CL · 2025-07-18 · unverdicted · none · ref 25
Develops an information-theoretic framework showing surprise and coherence trade off in single reader models but coexist via pre- and post-revelation modes, operationalized as reference-less LLM metrics for fair play and validated on generated stories plus classic detective fiction.
Accelerating Large Language Model Decoding with Speculative Sampling cs.CL · 2023-02-02 · accept · none · ref 10
Speculative sampling accelerates LLM decoding 2-2.5x by letting a draft model propose short sequences that the target model scores in parallel, then applies modified rejection sampling to keep the exact target distribution.
Multitask Prompted Training Enables Zero-Shot Task Generalization cs.LG · 2021-10-15 · conditional · none · ref 22
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
Task-Adaptive Embedding Refinement via Test-time LLM Guidance cs.CL · 2026-05-12 · unverdicted · none · ref 6
Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.
Retrieval from Within: An Intrinsic Capability of Attention-Based Models cs.LG · 2026-05-07 · unverdicted · none · ref 16 · 2 links
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias cs.AI · 2026-05-06 · unverdicted · none · ref 10
Causal analysis of LLMs finds standard bias metrics overestimate demographic effects due to context toxicity, with Western models showing higher refusal rates for certain groups and Eastern models showing targeted regional sensitivities.
When AI reviews science: Can we trust the referee? cs.AI · 2026-04-26 · unverdicted · none · ref 31
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
CIR: Lightweight Container Image for Cross-Platform Deployment cs.DC · 2026-04-12 · unverdicted · none · ref 59
CIR is a cross-platform container image format for Python/R-style apps that defers dependency assembly to deployment, cutting image size by 95% and deployment time by 40-60% versus traditional bundled images.
Context-Aware Dialectal Arabic Machine Translation with Interactive Region and Register Selection cs.CL · 2026-04-07 · unverdicted · none · ref 9
A metadata-conditioned mT5 model trained on rule-augmented dialectal Arabic data produces translations that better match intended regional varieties than high-resource baselines, despite lower BLEU scores.
JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections cs.IR · 2026-04-07 · accept · none · ref 10
JU'A is a new heterogeneous benchmark for Brazilian legal IR that distinguishes retrieval methods and shows domain-adapted models excel on aligned subsets while BM25 stays competitive elsewhere.
TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models cs.CL · 2026-04-03 · unverdicted · none · ref 11
TimelineReasoner applies large reasoning models in a Global Cognition plus Detail Exploration loop to produce more accurate, complete, and coherent timelines from news than prior LLM-based methods.
Flow Map Language Models: One-step Language Modeling via Continuous Denoising cs.CL · 2026-02-18 · conditional · none · ref 66 · 2 links
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass cs.CL · 2026-02-06 · unverdicted · none · ref 8
SHINE trains a scalable in-context hypernetwork to generate high-quality LoRA adapters from contexts in one pass, enabling efficient LLM adaptation that saves time and compute compared to standard fine-tuning.
Deep sequence models tend to memorize geometrically; it is unclear why cs.LG · 2025-10-30 · unverdicted · none · ref 150
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning cs.CL · 2025-09-26 · conditional · none · ref 7
CoSpaDi introduces a training-free sparse dictionary learning framework for post-training LLM compression that optimizes functional reconstruction error via activation-derived orthonormalization and achieves improved accuracy-compression trade-offs over SVD and pruning baselines.
CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models cs.CL · 2026-05-19 · unverdicted · none · ref 7
CLIF applies influence functions to pinpoint influential training samples and key concepts in Concept Bottleneck Models, enabling data debugging and behavioral insights on CEBaB and Yelp datasets.
A Decomposed Retrieval-Edit-Rerank Framework for Chord Generation cs.SD · 2026-05-08 · unverdicted · none · ref 6
The RER framework decomposes chord generation into retrieval, editing, and reranking stages to outperform end-to-end models in balancing stylistic diversity with music-theoretic feasibility.
pAI/MSc: ML Theory Research with Humans on the Loop cs.AI · 2026-04-22 · unverdicted · none · ref 76
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.
Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs cs.CL · 2026-04-02 · unverdicted · none · ref 26
LLMs produce overly positive idealized depictions of disability in simulated social media posts that do not match real posts by people with disabilities and show topic bias favoring nondisabled people.
Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications cs.SE · 2026-03-13 · unverdicted · none · ref 20
An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.
Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents cs.CL · 2026-02-18 · unverdicted · none · ref 6
Calibrate-Then-Act supplies LLM agents with priors on latent environment states to enable explicit cost-uncertainty reasoning, producing more optimal strategies than standard approaches in retrieval QA and file-reading coding tasks.
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation cs.CL · 2025-05-02 · unverdicted · none · ref 5
New LLM-based models for fine-grained conditional probability estimation outperform prior fine-tuned and prompting methods through enhanced data creation and supervision.
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions cs.CL · 2023-11-09 · unverdicted · none · ref 157
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models cs.SE · 2026-04-28 · unverdicted · none · ref 13
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP cs.LG · 2026-04-01 · unverdicted · none · ref 1
Matched learning-rate experiments show LoRA retains substantially higher zero-shot transfer (45% vs 11% on EuroSAT, 58% vs 9% on Pets) than Full FT in CLIP adaptation.
Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models cs.LG · 2025-08-06 · unverdicted · none · ref 81
A systematic literature review of explainability in multimodal attention models finds most studies focus on vision-language tasks with attention-based explanations, but evaluation methods lack consistency and modality-specific considerations.

Quantifying attention flow in transformers

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer