The impact of positional encoding on length generalization in transformers

Kazemnejad, A · 2023 · arXiv 2305.19466

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

q-bio.QM · 2026-04-09 · unverdicted · novelty 7.0

Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.

YaRN: Efficient Context Window Extension of Large Language Models

cs.CL · 2023-08-31 · unverdicted · novelty 6.0

YaRN extends the context window of RoPE-based LLMs like LLaMA more efficiently than prior methods, using 10x fewer tokens and 2.5x fewer steps while surpassing state-of-the-art performance and enabling extrapolation beyond fine-tuning lengths.

A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.

Positional Encoding in Transformer-Based Time Series Models: A Survey

cs.LG · 2025-02-17 · unverdicted · novelty 3.0

A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.

citing papers explorer

Showing 4 of 4 citing papers.

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings q-bio.QM · 2026-04-09 · unverdicted · none · ref 39
Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.
YaRN: Efficient Context Window Extension of Large Language Models cs.CL · 2023-08-31 · unverdicted · none · ref 9
YaRN extends the context window of RoPE-based LLMs like LLaMA more efficiently than prior methods, using 10x fewer tokens and 2.5x fewer steps while surpassing state-of-the-art performance and enabling extrapolation beyond fine-tuning lengths.
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits cs.LG · 2026-05-19 · unverdicted · none · ref 12
Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.
Positional Encoding in Transformer-Based Time Series Models: A Survey cs.LG · 2025-02-17 · unverdicted · none · ref 28
A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.

The impact of positional encoding on length generalization in transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer