QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
Self-Attention with Relative Position Representations
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Equivariant RL agent synthesizes near-optimal Clifford circuits up to 30 qubits with lower two-qubit gate counts than Qiskit baselines.
Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.
Rotary positional encodings reduce the symmetry group of functional equivalence in attention compared to sinusoidal encodings, increasing expressivity and altering linear mode connectivity patterns.
Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.
SPAN is a hierarchical attention framework that constructs multi-scale pyramid representations from single-scale patch inputs for WSI classification and segmentation while preserving spatial relationships.
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.
Relation-aware self-attention encodes schema structure for text-to-SQL, raising exact-match accuracy on Spider from 18.96% to 42.94%.
A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarms and real-time deployment on Raspberry Pi Zero 2 W.
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
citing papers explorer
-
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving
QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
-
Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis
Equivariant RL agent synthesizes near-optimal Clifford circuits up to 30 qubits with lower two-qubit gate counts than Qiskit baselines.
-
Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection
Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.
-
Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity
Rotary positional encodings reduce the symmetry group of functional equivalence in attention compared to sinusoidal encodings, increasing expressivity and altering linear mode connectivity patterns.
-
A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation
A-THENA improves averaged IoT intrusion detection accuracy by 3.69-6.88 percentage points over baselines on three datasets using time-aware hybrid encoding and network-specific augmentation, with near-zero false alarms and real-time deployment on Raspberry Pi Zero 2 W.