Training-free long-context scaling of large language models

Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong · 2024 · arXiv 2402.17463

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

A queueing model derives stability conditions for LLM inference services under combined compute and KV cache memory limits, with experimental validation showing typical deviations under 10%.

HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

cs.AI · 2026-02-15 · unverdicted · novelty 6.0

HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.

Qwen2.5-1M Technical Report

cs.CL · 2025-01-26 · accept · novelty 6.0

Qwen2.5-1M models reach 1M token context with improved long-context performance, no short-context loss, and 3-7x prefill speedup via open inference optimizations.

Qwen3 Technical Report

cs.CL · 2025-05-14 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Multi-Model Synthetic Training for Mission-Critical Small Language Models

cs.CL · 2025-09-16 · unverdicted · novelty 4.0

Fine-tunes Qwen2.5-7B on 21,543 synthetic maritime Q&A pairs generated from 3.2B AIS records by GPT-4o and o3-mini, reaching 75% accuracy at 261x lower inference cost than larger models.

Qwen2.5 Technical Report

cs.CL · 2024-12-19 · unverdicted · novelty 3.0

Qwen2.5 LLMs scale pre-training data to 18 trillion tokens and apply multistage reinforcement learning, achieving competitive performance on benchmarks with models up to 5 times larger.

citing papers explorer

Showing 8 of 8 citing papers.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks cs.CL · 2026-04-19 · unverdicted · none · ref 91
ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.
A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints cs.LG · 2026-05-06 · unverdicted · none · ref 118
A queueing model derives stability conditions for LLM inference services under combined compute and KV cache memory limits, with experimental validation showing typical deviations under 10%.
HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling cs.AI · 2026-02-15 · unverdicted · none · ref 26
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent cs.CL · 2025-07-03 · unverdicted · none · ref 15
MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.
Qwen2.5-1M Technical Report cs.CL · 2025-01-26 · accept · none · ref 1
Qwen2.5-1M models reach 1M token context with improved long-context performance, no short-context loss, and 3-7x prefill speedup via open inference optimizations.
Qwen3 Technical Report cs.CL · 2025-05-14 · unverdicted · none · ref 2
Pith review generated a malformed one-line summary.
Multi-Model Synthetic Training for Mission-Critical Small Language Models cs.CL · 2025-09-16 · unverdicted · none · ref 23
Fine-tunes Qwen2.5-7B on 21,543 synthetic maritime Q&A pairs generated from 3.2B AIS records by GPT-4o and o3-mini, reaching 75% accuracy at 261x lower inference cost than larger models.
Qwen2.5 Technical Report cs.CL · 2024-12-19 · unverdicted · none · ref 4
Qwen2.5 LLMs scale pre-training data to 18 trillion tokens and apply multistage reinforcement learning, achieving competitive performance on benchmarks with models up to 5 times larger.

Training-free long-context scaling of large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer