{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2024:HAMJZSQVKL4VHWOAINCHUE26II","short_pith_number":"pith:HAMJZSQV","schema_version":"1.0","canonical_sha256":"38189cca1552f953d9c043447a135e420636aa9e5d0a13a0ca99322e4750d280","source":{"kind":"arxiv","id":"2402.13753","version":1},"attestation_state":"computed","paper":{"title":"LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"LongRoPE extends pre-trained LLMs to 2048k token contexts via targeted non-uniform positional interpolation and a two-stage fine-tuning process.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Chengruidong Zhang, Fan Yang, Jiahang Xu, Li Lyna Zhang, Mao Yang, Ning Shang, Yiran Ding, Yuanyuan Xu","submitted_at":"2024-02-21T12:30:33Z","abstract_excerpt":"Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2402.13753","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","primary_cat":"cs.CL","submitted_at":"2024-02-21T12:30:33Z","cross_cats_sorted":[],"title_canon_sha256":"46e4af97439ff2209ef39f593912fe97c0c8c33415282cb5b0df3aafaf04abf8","abstract_canon_sha256":"0379a2839dbbe711aa7363393e5effbbca314a73f949a18ec961afaed67dfb0f"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:53.019823Z","signature_b64":"p9LZId8qixz4wed1Z7pJXaxf/8BRnyoaETcm8ARr6k5FxWyQfWF/N16iogOPGzhILUglFHIiH6Xj4VELzxTHCw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"38189cca1552f953d9c043447a135e420636aa9e5d0a13a0ca99322e4750d280","last_reissued_at":"2026-05-17T23:38:53.019275Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:53.019275Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"LongRoPE extends pre-trained LLMs to 2048k token contexts via targeted non-uniform positional interpolation and a two-stage fine-tuning process.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Chengruidong Zhang, Fan Yang, Jiahang Xu, Li Lyna Zhang, Mao Yang, Ning Shang, Yiran Ding, Yuanyuan Xu","submitted_at":"2024-02-21T12:30:33Z","abstract_excerpt":"Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"LongRoPE extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The two forms of non-uniformities in positional interpolation identified via efficient search are generalizable across models and tasks and provide a stable initialization that does not overfit to the search data.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LongRoPE extends LLM context windows to 2048k tokens via search for non-uniform positional interpolation, progressive fine-tuning from 256k, and short-context readjustment.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"LongRoPE extends pre-trained LLMs to 2048k token contexts via targeted non-uniform positional interpolation and a two-stage fine-tuning process.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ae9240a7ec5d1b526adafcf866f0ababd68b9c70aaf16e645576e810486a721a"},"source":{"id":"2402.13753","kind":"arxiv","version":1},"verdict":{"id":"5f8907da-40cb-4089-9569-b9f66b67b1ef","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T08:26:12.371361Z","strongest_claim":"LongRoPE extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window.","one_line_summary":"LongRoPE extends LLM context windows to 2048k tokens via search for non-uniform positional interpolation, progressive fine-tuning from 256k, and short-context readjustment.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The two forms of non-uniformities in positional interpolation identified via efficient search are generalizable across models and tasks and provide a stable initialization that does not overfit to the search data.","pith_extraction_headline":"LongRoPE extends pre-trained LLMs to 2048k token contexts via targeted non-uniform positional interpolation and a two-stage fine-tuning process."},"references":{"count":15,"sample":[{"doi":"","year":null,"title":"Extending Context Window of Large Language Models via Positional Interpolation","work_id":"c8b6df85-e7da-4bd8-90a4-d309cc2a0f60","ref_index":1,"cited_arxiv_id":"2306.15595","is_internal_anchor":true},{"doi":"","year":null,"title":"The Pile: An 800GB Dataset of Diverse Text for Language Modeling","work_id":"9b10667a-da61-4358-aceb-10578234d45d","ref_index":2,"cited_arxiv_id":"2101.00027","is_internal_anchor":true},{"doi":"","year":2020,"title":"Single path one-shot neural architecture search with uniform sampling","work_id":"ff23255d-764c-4c95-b81c-58af1380a665","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Lm-infinite: Simple on-the-fly length generalization for large language models","work_id":"62f73354-8046-4134-b4e3-1bfab8df2d99","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2009,"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","ref_index":5,"cited_arxiv_id":"2009.03300","is_internal_anchor":true}],"resolved_work":15,"snapshot_sha256":"3b493439f6cdafeea298cb4f5c718ef82b2281f1301109427045f495d9c668bc","internal_anchors":6},"formal_canon":{"evidence_count":2,"snapshot_sha256":"a63ce17c0d4a95df8d33c6089c52946c3f888c31ef5a4860fb117ea08d23d5ca"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2402.13753","created_at":"2026-05-17T23:38:53.019377+00:00"},{"alias_kind":"arxiv_version","alias_value":"2402.13753v1","created_at":"2026-05-17T23:38:53.019377+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2402.13753","created_at":"2026-05-17T23:38:53.019377+00:00"},{"alias_kind":"pith_short_12","alias_value":"HAMJZSQVKL4V","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"HAMJZSQVKL4VHWOA","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"HAMJZSQV","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":29,"internal_anchor_count":29,"sample":[{"citing_arxiv_id":"2502.14541","citing_title":"LLM-based User Profile Management for Recommender System","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.04217","citing_title":"Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22668","citing_title":"SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07731","citing_title":"Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2506.17310","citing_title":"PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2509.11206","citing_title":"Evalet: Evaluating Large Language Models through Functional Fragmentation","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2509.12635","citing_title":"Positional Encoding via Token-Aware Phase Attention","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2510.09608","citing_title":"StreamingVLM: Real-Time Understanding for Infinite Video Streams","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2511.22599","citing_title":"DisCEdge: Distributed Context Management for Large Language Models at the Edge","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03263","citing_title":"LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14589","citing_title":"EndPrompt: Efficient Long-Context Extension via Terminal Anchoring","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13831","citing_title":"Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2406.02069","citing_title":"PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2406.16852","citing_title":"Long Context Transfer from Language to Vision","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10544","citing_title":"Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09932","citing_title":"FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10414","citing_title":"Remember to Forget: Gated Adaptive Positional Encoding","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09304","citing_title":"Generating Complex Code Analyzers from Natural Language Questions","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10268","citing_title":"MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24162","citing_title":"Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00968","citing_title":"Adaptive 3D-RoPE: Physics-Aligned Rotary Positional Encoding for Wireless Foundation Models","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08932","citing_title":"From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07766","citing_title":"Sensitivity-Positional Co-Localization in GQA Transformers","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06850","citing_title":"How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2404.06654","citing_title":"RULER: What's the Real Context Size of Your Long-Context Language Models?","ref_index":11,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II","json":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II.json","graph_json":"https://pith.science/api/pith-number/HAMJZSQVKL4VHWOAINCHUE26II/graph.json","events_json":"https://pith.science/api/pith-number/HAMJZSQVKL4VHWOAINCHUE26II/events.json","paper":"https://pith.science/paper/HAMJZSQV"},"agent_actions":{"view_html":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II","download_json":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II.json","view_paper":"https://pith.science/paper/HAMJZSQV","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2402.13753&json=true","fetch_graph":"https://pith.science/api/pith-number/HAMJZSQVKL4VHWOAINCHUE26II/graph.json","fetch_events":"https://pith.science/api/pith-number/HAMJZSQVKL4VHWOAINCHUE26II/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II/action/timestamp_anchor","attest_storage":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II/action/storage_attestation","attest_author":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II/action/author_attestation","sign_citation":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II/action/citation_signature","submit_replication":"https://pith.science/pith/HAMJZSQVKL4VHWOAINCHUE26II/action/replication_record"}},"created_at":"2026-05-17T23:38:53.019377+00:00","updated_at":"2026-05-17T23:38:53.019377+00:00"}