hub

Rating: [[...]] Analysis

Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu · 2024 · arXiv 2404.15574

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

In 1-3B instruction-tuned LMs on GSM8K, arithmetic CoT readout is dominated by positional copying of the trailing number before the answer delimiter, accounting for 54-92 percentage points of accuracy.

Soft Head Selection for Injecting ICL-Derived Task Embeddings

cs.CL · 2025-07-28 · conditional · novelty 7.0

SITE applies soft gradient-based head selection to inject ICL-derived task embeddings, outperforming prior embedding adaptation and few-shot ICL across generation, reasoning, and NLU tasks on 12 LLMs from 4B to 70B parameters.

AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference

cs.DC · 2026-05-12 · unverdicted · novelty 6.0

AB-Sparse adaptively allocates per-head block sizes for sparse attention, adds lossless centroid quantization and custom variable-block GPU kernels, and reports up to 5.43% accuracy gain over fixed-block baselines with no throughput loss.

CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.

Understanding the Mechanism of Altruism in Large Language Models

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.

SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing

cs.DB · 2026-04-16 · unverdicted · novelty 6.0

SAGE is a training-free context reduction method that converts attention signals from a small LLM into a differential relevance heatmap to select top units for downstream QA, achieving competitive accuracy at 10% token budget on benchmarks like QuALITY-hard.

One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

XComp reaches extreme video compression (one token per selective frame) via learnable progressive token compression and question-conditioned frame selection, lifting LVBench accuracy from 42.9 percent to 46.2 percent after tuning on 2.5 percent of standard data.

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

cs.CL · 2025-12-12 · unverdicted · novelty 6.0

BLASST dynamically sparsifies attention by thresholding softmax scores to skip blocks, delivering 1.5x speedups at 70%+ sparsity while preserving benchmark accuracy.

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

cs.CL · 2024-06-04 · conditional · novelty 6.0

PyramidKV dynamically compresses KV cache across layers following pyramidal information funneling, matching full performance at 12% retention and outperforming alternatives at 0.7% retention with up to 20.5 accuracy gains.

Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

cs.LG · 2026-04-08 · unverdicted · novelty 5.0

Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.

ART: Attention Replacement Technique to Improve Factuality in LLMs

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

ART replaces uniform attention in shallow LLM layers with local attention patterns to reduce hallucinations across multiple model architectures.

Are Finer Citations Always Better? Rethinking Granularity for Attributed Generation

cs.CL · 2026-04-01 · unverdicted · novelty 5.0

Enforcing sentence-level citations degrades LLM attribution quality by 16-276% versus paragraph-level, with larger models penalized more due to disrupted semantic synthesis.

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference

cs.LG · 2026-02-20 · 2 refs

citing papers explorer

Showing 14 of 14 citing papers.

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models cs.LG · 2026-05-20 · unverdicted · none · ref 38
In 1-3B instruction-tuned LMs on GSM8K, arithmetic CoT readout is dominated by positional copying of the trailing number before the answer delimiter, accounting for 54-92 percentage points of accuracy.
Soft Head Selection for Injecting ICL-Derived Task Embeddings cs.CL · 2025-07-28 · conditional · none · ref 21
SITE applies soft gradient-based head selection to inject ICL-derived task embeddings, outperforming prior embedding adaptation and few-shot ICL across generation, reasoning, and NLU tasks on 12 LLMs from 4B to 70B parameters.
AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference cs.DC · 2026-05-12 · unverdicted · none · ref 19
AB-Sparse adaptively allocates per-head block sizes for sparse attention, adds lossless centroid quantization and custom variable-block GPU kernels, and reports up to 5.43% accuracy gain over fixed-block baselines with no throughput loss.
CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering cs.CV · 2026-05-06 · unverdicted · none · ref 62
CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.
Understanding the Mechanism of Altruism in Large Language Models econ.GN · 2026-04-21 · unverdicted · none · ref 195
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety cs.CL · 2026-04-20 · unverdicted · none · ref 18
Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.
SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing cs.DB · 2026-04-16 · unverdicted · none · ref 46
SAGE is a training-free context reduction method that converts attention signals from a small LLM into a differential relevance heatmap to select top units for downstream QA, achieving competitive accuracy at 10% token budget on benchmarks like QuALITY-hard.
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding cs.CV · 2026-04-15 · unverdicted · none · ref 66
XComp reaches extreme video compression (one token per selective frame) via learnable progressive token compression and question-conditioned frame selection, lifting LVBench accuracy from 42.9 percent to 46.2 percent after tuning on 2.5 percent of standard data.
BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding cs.CL · 2025-12-12 · unverdicted · none · ref 20
BLASST dynamically sparsifies attention by thresholding softmax scores to skip blocks, delivering 1.5x speedups at 70%+ sparsity while preserving benchmark accuracy.
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling cs.CL · 2024-06-04 · conditional · none · ref 22
PyramidKV dynamically compresses KV cache across layers following pyramidal information funneling, matching full performance at 12% retention and outperforming alternatives at 0.7% retention with up to 20.5 accuracy gains.
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference cs.LG · 2026-04-08 · unverdicted · none · ref 43
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
ART: Attention Replacement Technique to Improve Factuality in LLMs cs.CL · 2026-04-07 · unverdicted · none · ref 10
ART replaces uniform attention in shallow LLM layers with local attention patterns to reduce hallucinations across multiple model architectures.
Are Finer Citations Always Better? Rethinking Granularity for Attributed Generation cs.CL · 2026-04-01 · unverdicted · none · ref 3
Enforcing sentence-level citations degrades LLM attribution quality by 16-276% versus paragraph-level, with larger models penalized more due to disrupted semantic synthesis.
RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference cs.LG · 2026-02-20 · unreviewed · ref 30 · 2 links

Rating: [[...]] Analysis

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer