Self-Attention with Relative Position Representations

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani · 2018 · DOI 10.18653/v1/n18-2074

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open at publisher browse 10 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.

Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis

quant-ph · 2026-05-11 · unverdicted · novelty 7.0

Equivariant RL agent synthesizes near-optimal Clifford circuits up to 30 qubits with lower two-qubit gate counts than Qiskit baselines.

Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.

Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

Rotary positional encodings reduce the symmetry group of functional equivalence in attention compared to sinusoidal encodings, increasing expressivity and altering linear mode connectivity patterns.

Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation

cs.CL · 2025-12-07 · unverdicted · novelty 6.0

Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.

Learning Spatial-Preserving Hierarchical Representations for Digital Pathology

cs.CV · 2024-06-13 · unverdicted · novelty 6.0

SPAN is a hierarchical attention framework that constructs multi-scale pyramid representations from single-scale patch inputs for WSI classification and segmentation while preserving spatial relationships.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

cs.CL · 2021-08-27 · unverdicted · novelty 6.0

ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.

Encoding Database Schemas with Relation-Aware Self-Attention for Text-to-SQL Parsers

cs.LG · 2019-06-27 · unverdicted · novelty 6.0

Relation-aware self-attention encodes schema structure for text-to-SQL, raising exact-match accuracy on Spider from 18.96% to 42.94%.

A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

cs.CR · 2026-04-23 · unverdicted · novelty 5.0

A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarms and real-time deployment on Raspberry Pi Zero 2 W.

A Survey of Large Language Models

cs.CL · 2023-03-31 · accept · novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

citing papers explorer

Showing 5 of 5 citing papers after filters.

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving cs.AI · 2026-06-04 · unverdicted · none · ref 50
QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis quant-ph · 2026-05-11 · unverdicted · none · ref 42
Equivariant RL agent synthesizes near-optimal Clifford circuits up to 30 qubits with lower two-qubit gate counts than Qiskit baselines.
Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection cs.LG · 2026-06-23 · unverdicted · none · ref 65
Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.
Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity cs.LG · 2026-06-16 · unverdicted · none · ref 15
Rotary positional encodings reduce the symmetry group of functional equivalence in attention compared to sinusoidal encodings, increasing expressivity and altering linear mode connectivity patterns.
A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation cs.CR · 2026-04-23 · unverdicted · none · ref 61
A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarms and real-time deployment on Raspberry Pi Zero 2 W.

Self-Attention with Relative Position Representations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer