PACT introduces a peak-aware cross-attention graph transformer that emulates station-level storm surges more accurately than prior graph neural network baselines while running in seconds after training.
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
LEAP adds a layer-wise exit-aware constraint to standard distillation, reconciling it with early-exit mechanisms and delivering 1.61x wall-clock speedup on MiniLM at 0.95 threshold with 91.9% early exits by layer 7.
citing papers explorer
-
PACT: Peak-Aware Cross-Attention Graph Transformers for Efficient Storm-Surge Emulation
PACT introduces a peak-aware cross-attention graph transformer that emulates station-level storm surges more accurately than prior graph neural network baselines while running in seconds after training.
-
LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
LEAP adds a layer-wise exit-aware constraint to standard distillation, reconciling it with early-exit mechanisms and delivering 1.61x wall-clock speedup on MiniLM at 0.95 threshold with 91.9% early exits by layer 7.