hub Canonical reference

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu · 2024 · arXiv 2310.06839

Canonical reference. 75% of citing Pith papers cite this work as background.

19 Pith papers citing it

Background 75% of classified citations

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 baseline 1

citation-polarity summary

background 6 baseline 1 unclear 1

representative citing papers

RULER: What's the Real Context Size of Your Long-Context Language Models?

cs.CL · 2024-04-09 · accept · novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

cs.SE · 2026-05-24 · unverdicted · novelty 7.0

Tool schema compression by 44-50% enables agentic RAG at 8K context where uncompressed schemas fail, with +20.5 pp exact match lift across models and scaling to over 800 tools.

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

cs.SE · 2026-05-04 · unverdicted · novelty 7.0

TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

cs.CR · 2026-05-26 · unverdicted · novelty 6.0

GroundedCache reduces unsafe-served rate in RAG answer caching to 0-1.5% (vs 15-51.5% naive) via four validation gates while keeping p50 latency within 1.07x of no-cache baseline.

SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors

cs.LG · 2026-05-23 · unverdicted · novelty 6.0

SemanticZip is a pilot framework introducing LLM-mediated lossy text compression with an experimental interface evaluating six representation regimes on five diagnostic cases for semantic atom recovery and token efficiency.

On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation

cs.SE · 2026-04-15 · unverdicted · novelty 6.0

Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.

Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models

cs.CL · 2026-03-22 · unverdicted · novelty 6.0

A unified compressed-sensing framework enables dynamic, task- and token-adaptive structured reduction of LLMs with formal sample-complexity bounds.

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

cs.AI · 2025-03-07 · unverdicted · novelty 6.0

R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.

CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs

cs.SE · 2025-02-19 · unverdicted · novelty 6.0

CodePromptZip builds a code compressor via type-aware ablation-ranked training samples and a copy-augmented small LM, reporting 23.4-28.7% gains over baselines on three RAG coding tasks.

Search-o1: Agentic Search-Enhanced Large Reasoning Models

cs.AI · 2025-01-09 · unverdicted · novelty 6.0

Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.

Talk Less, Fly Lighter: Autonomous Semantic Compression for UAV Swarm Communication via LLMs

cs.RO · 2025-08-16 · unverdicted · novelty 5.0

LLM-based autonomous semantic compression in four 2D UAV swarm simulations shows potential for efficient collaborative communication under bandwidth constraints.

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

cs.CL · 2024-09-10 · unverdicted · novelty 5.0

E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.

AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models

cs.CL · 2024-09-03 · unverdicted · novelty 5.0

AdaComp trains a compression-rate predictor on annotated minimum top-k data to adaptively retain only the documents needed for each RAG query.

Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks

cs.CL · 2026-05-10 · unverdicted · novelty 4.0

Byte-exact deduplication reduces RAG context size by 0.16% to 80.34% across three regimes with zero measurable quality regression per multi-vendor LLM evaluation.

Supplement Generation Training for Enhancing Agentic Task Performance

cs.LG · 2026-04-22 · unverdicted · novelty 4.0

SGT trains a lightweight model to generate task-specific supplemental text that improves performance of a larger frozen LLM on agentic tasks without modifying the large model.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

Retrieval-Augmented Generation for Large Language Models: A Survey

cs.CL · 2023-12-18 · unverdicted · novelty 3.0

A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.

Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations

cs.CL · 2026-04-14

LIFT: A Novel Framework for Enhancing Long-Context Understanding of LLMs via Long Input Fine-Tuning

cs.CL · 2025-02-20

citing papers explorer

Showing 1 of 1 citing paper after filters.

Retrieval-Augmented Generation for Large Language Models: A Survey cs.CL · 2023-12-18 · unverdicted · none · ref 101
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer