pith. sign in

Scaling laws of rope-based extrapolation

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 2

representative citing papers

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

FocuSFT uses an inner optimization loop to adapt fast-weight parameters into a parametric memory that sharpens attention on relevant content, then conditions outer-loop supervised fine-tuning on this representation, yielding gains on long-context benchmarks.

HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

cs.AI · 2026-02-15 · unverdicted · novelty 6.0

HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.

Positional Encoding via Token-Aware Phase Attention

cs.CL · 2025-09-16 · unverdicted · novelty 6.0

TAPA adds a learnable phase function to attention to preserve long-range token interactions, enabling direct continual pretraining, length extrapolation, lower perplexity, and stronger retrieval than RoPE-style methods.

InternLM2 Technical Report

cs.CL · 2024-03-26 · unverdicted · novelty 5.0

InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.

citing papers explorer

Showing 8 of 8 citing papers.

  • RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably cs.CL · 2026-05-15 · conditional · none · ref 19

    Proves that RoPE attention loses locality bias and token distinction in long contexts, approaching random behavior independent of content.

  • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens cs.CL · 2024-02-21 · unverdicted · none · ref 8

    LongRoPE extends LLM context windows to 2048k tokens via search for non-uniform positional interpolation, progressive fine-tuning from 256k, and short-context readjustment.

  • Remember to Forget: Gated Adaptive Positional Encoding cs.LG · 2026-05-11 · unverdicted · none · ref 15

    GAPE augments RoPE with query- and key-dependent gates to stabilize attention and improve long-context performance in language models.

  • FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning cs.CL · 2026-05-11 · unverdicted · none · ref 22

    FocuSFT uses an inner optimization loop to adapt fast-weight parameters into a parametric memory that sharpens attention on relevant content, then conditions outer-loop supervised fine-tuning on this representation, yielding gains on long-context benchmarks.

  • HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling cs.AI · 2026-02-15 · unverdicted · none · ref 27

    HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.

  • Positional Encoding via Token-Aware Phase Attention cs.CL · 2025-09-16 · unverdicted · none · ref 10

    TAPA adds a learnable phase function to attention to preserve long-range token interactions, enabling direct continual pretraining, length extrapolation, lower perplexity, and stronger retrieval than RoPE-style methods.

  • MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent cs.CL · 2025-07-03 · unverdicted · none · ref 16

    MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.

  • InternLM2 Technical Report cs.CL · 2024-03-26 · unverdicted · none · ref 152

    InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.