Two stones hit one bird: Bilevel positional encoding for better length extrapolation

Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He · 2024 · arXiv 2401.16421

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

Explicitly disentangling semantic and positional streams in a Transformer encoder reveals that absolute positional representations collapse to a 2D document-structure manifold, attention heads specialize by role, and the approach improves linguistic probing performance on 49 of 65 phenomena.

A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders cs.CL · 2026-05-28 · unverdicted · none · ref 10
Explicitly disentangling semantic and positional streams in a Transformer encoder reveals that absolute positional representations collapse to a 2D document-structure manifold, attention heads specialize by role, and the approach improves linguistic probing performance on 49 of 65 phenomena.
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits cs.LG · 2026-05-19 · unverdicted · none · ref 11
Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.

Two stones hit one bird: Bilevel positional encoding for better length extrapolation

fields

years

verdicts

representative citing papers

citing papers explorer