{"paper":{"title":"River-LLM: Large Language Model Seamless Exit Based on KV Share","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"River-LLM enables token-level early exit in decoder LLMs by generating missing KV caches through layer sharing.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"An Zou, Yingtao Shen","submitted_at":"2026-04-20T15:20:17Z","abstract_excerpt":"Large Language Models (LLMs) have demonstrated exceptional performance across diverse domains but are increasingly constrained by high inference latency. Early Exit has emerged as a promising solution to accelerate inference by dynamically bypassing redundant layers. However, in decoder-only architectures, the efficiency of Early Exit is severely bottlenecked by the KV Cache Absence problem, where skipped layers fail to provide the necessary historical states for subsequent tokens. Existing solutions, such as recomputation or masking, either introduce significant latency overhead or incur seve"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"River-LLM, a training-free framework that enables seamless token-level Early Exit... achieves 1.71 to 2.16 times of practical speedup while maintaining high generation quality.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That state transition similarity within decoder blocks can reliably predict cumulative KV errors and guide exit decisions without causing noticeable quality degradation or error drift over sequences, especially since no training is used to calibrate the predictor.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"River-LLM enables seamless token-level early exit in decoder-only LLMs via a KV-shared river mechanism and similarity-based error prediction, delivering 1.71-2.16x practical speedup on reasoning tasks while preserving generation quality.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"River-LLM enables token-level early exit in decoder LLMs by generating missing KV caches through layer sharing.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ac4f46de3ec4a6881749d2b32a6d9d6be67f23854eb1084452c9b89b14c7b4fd"},"source":{"id":"2604.18396","kind":"arxiv","version":3},"verdict":{"id":"c3e0a701-0e6b-451f-bc61-f548ef300038","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T04:32:56.900436Z","strongest_claim":"River-LLM, a training-free framework that enables seamless token-level Early Exit... achieves 1.71 to 2.16 times of practical speedup while maintaining high generation quality.","one_line_summary":"River-LLM enables seamless token-level early exit in decoder-only LLMs via a KV-shared river mechanism and similarity-based error prediction, delivering 1.71-2.16x practical speedup on reasoning tasks while preserving generation quality.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That state transition similarity within decoder blocks can reliably predict cumulative KV errors and guide exit decisions without causing noticeable quality degradation or error drift over sequences, especially since no training is used to calibrate the predictor.","pith_extraction_headline":"River-LLM enables token-level early exit in decoder LLMs by generating missing KV caches through layer sharing."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.18396/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_compliance","ran_at":"2026-05-20T04:02:35.194352Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"fd586c39cfafb6e1673e312dcf365be302101231670cfb0882dbe3207839fbea"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}