Mean-pooled cosine similarity grows with sequence length in anisotropic transformer embeddings independent of content, while CKA shows far less length dependence across code, translation, and vision tasks.
Do multilingual large language models think in english? In International Conference on Learning Representations (ICLR), 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Mean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternative
Mean-pooled cosine similarity grows with sequence length in anisotropic transformer embeddings independent of content, while CKA shows far less length dependence across code, translation, and vision tasks.