Self-attention with relative position representations

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani · 2018 · DOI 10.18653/v1/n18-2074

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis

quant-ph · 2026-05-11 · unverdicted · novelty 7.0

Equivariant RL agent synthesizes near-optimal Clifford circuits up to 30 qubits with lower two-qubit gate counts than Qiskit baselines.

Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation

cs.CL · 2025-12-07 · unverdicted · novelty 6.0

Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.

Learning Spatial-Preserving Hierarchical Representations for Digital Pathology

cs.CV · 2024-06-13 · unverdicted · novelty 6.0

SPAN is a hierarchical attention framework that constructs multi-scale pyramid representations from single-scale patch inputs for WSI classification and segmentation while preserving spatial relationships.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

cs.CL · 2021-08-27 · unverdicted · novelty 6.0

ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.

Encoding Database Schemas with Relation-Aware Self-Attention for Text-to-SQL Parsers

cs.LG · 2019-06-27 · unverdicted · novelty 6.0

Relation-aware self-attention encodes schema structure for text-to-SQL, raising exact-match accuracy on Spider from 18.96% to 42.94%.

A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

cs.CR · 2026-04-23 · unverdicted · novelty 5.0

A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarms and real-time deployment on Raspberry Pi Zero 2 W.

A Survey of Large Language Models

cs.CL · 2023-03-31 · accept · novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

citing papers explorer

Showing 7 of 7 citing papers.

Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis quant-ph · 2026-05-11 · unverdicted · none · ref 42
Equivariant RL agent synthesizes near-optimal Clifford circuits up to 30 qubits with lower two-qubit gate counts than Qiskit baselines.
Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation cs.CL · 2025-12-07 · unverdicted · none · ref 27
Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.
Learning Spatial-Preserving Hierarchical Representations for Digital Pathology cs.CV · 2024-06-13 · unverdicted · none · ref 33
SPAN is a hierarchical attention framework that constructs multi-scale pyramid representations from single-scale patch inputs for WSI classification and segmentation while preserving spatial relationships.
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation cs.CL · 2021-08-27 · unverdicted · none · ref 36
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.
Encoding Database Schemas with Relation-Aware Self-Attention for Text-to-SQL Parsers cs.LG · 2019-06-27 · unverdicted · none · ref 12
Relation-aware self-attention encodes schema structure for text-to-SQL, raising exact-match accuracy on Spider from 18.96% to 42.94%.
A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation cs.CR · 2026-04-23 · unverdicted · none · ref 61
A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarms and real-time deployment on Raspberry Pi Zero 2 W.
A Survey of Large Language Models cs.CL · 2023-03-31 · accept · none · ref 297
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

Self-attention with relative position representations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer