hub Canonical reference

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

· 2023 · arXiv 2310.06839

Canonical reference. 75% of citing Pith papers cite this work as background.

16 Pith papers citing it

Background 75% of classified citations

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 baseline 1

citation-polarity summary

background 6 baseline 1 unclear 1

representative citing papers

RULER: What's the Real Context Size of Your Long-Context Language Models?

cs.CL · 2024-04-09 · accept · novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

cs.SE · 2026-05-04 · unverdicted · novelty 7.0

TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.

On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation

cs.SE · 2026-04-15 · unverdicted · novelty 6.0

Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.

Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models

cs.CL · 2026-03-22 · unverdicted · novelty 6.0

A unified compressed-sensing framework enables dynamic, task- and token-adaptive structured reduction of LLMs with formal sample-complexity bounds.

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

cs.AI · 2025-03-07 · unverdicted · novelty 6.0

R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.

LIFT: A Novel Framework for Enhancing Long-Context Understanding of LLMs via Long Input Fine-Tuning

cs.CL · 2025-02-20 · unverdicted · novelty 6.0

LIFT fine-tunes short-context LLMs on long inputs with synthetic tasks to absorb information into parameters, enabling answers without the input present at inference.

CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs

cs.SE · 2025-02-19 · unverdicted · novelty 6.0

CodePromptZip builds a code compressor via type-aware ablation-ranked training samples and a copy-augmented small LM, reporting 23.4-28.7% gains over baselines on three RAG coding tasks.

Search-o1: Agentic Search-Enhanced Large Reasoning Models

cs.AI · 2025-01-09 · unverdicted · novelty 6.0

Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.

Talk Less, Fly Lighter: Autonomous Semantic Compression for UAV Swarm Communication via LLMs

cs.RO · 2025-08-16 · unverdicted · novelty 5.0

LLM-based autonomous semantic compression in four 2D UAV swarm simulations shows potential for efficient collaborative communication under bandwidth constraints.

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

cs.CL · 2024-09-10 · unverdicted · novelty 5.0

E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.

AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models

cs.CL · 2024-09-03 · unverdicted · novelty 5.0

AdaComp trains a compression-rate predictor on annotated minimum top-k data to adaptively retain only the documents needed for each RAG query.

Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks

cs.CL · 2026-05-10 · unverdicted · novelty 4.0

Byte-exact deduplication reduces RAG context size by 0.16% to 80.34% across three regimes with zero measurable quality regression per multi-vendor LLM evaluation.

Supplement Generation Training for Enhancing Agentic Task Performance

cs.LG · 2026-04-22 · unverdicted · novelty 4.0

SGT trains a lightweight model to generate task-specific supplemental text that improves performance of a larger frozen LLM on agentic tasks without modifying the large model.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

Retrieval-Augmented Generation for Large Language Models: A Survey

cs.CL · 2023-12-18 · unverdicted · novelty 3.0

A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.

Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations

cs.CL · 2026-04-14

citing papers explorer

Showing 16 of 16 citing papers.

RULER: What's the Real Context Size of Your Long-Context Language Models? cs.CL · 2024-04-09 · accept · none · ref 20
RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments cs.SE · 2026-05-04 · unverdicted · none · ref 11
TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.
On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation cs.SE · 2026-04-15 · unverdicted · none · ref 18
Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.
Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models cs.CL · 2026-03-22 · unverdicted · none · ref 5
A unified compressed-sensing framework enables dynamic, task- and token-adaptive structured reduction of LLMs with formal sample-complexity bounds.
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning cs.AI · 2025-03-07 · unverdicted · none · ref 26
R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.
LIFT: A Novel Framework for Enhancing Long-Context Understanding of LLMs via Long Input Fine-Tuning cs.CL · 2025-02-20 · unverdicted · none · ref 3
LIFT fine-tunes short-context LLMs on long inputs with synthetic tasks to absorb information into parameters, enabling answers without the input present at inference.
CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs cs.SE · 2025-02-19 · unverdicted · none · ref 5
CodePromptZip builds a code compressor via type-aware ablation-ranked training samples and a copy-augmented small LM, reporting 23.4-28.7% gains over baselines on three RAG coding tasks.
Search-o1: Agentic Search-Enhanced Large Reasoning Models cs.AI · 2025-01-09 · unverdicted · none · ref 24
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.
Talk Less, Fly Lighter: Autonomous Semantic Compression for UAV Swarm Communication via LLMs cs.RO · 2025-08-16 · unverdicted · none · ref 15
LLM-based autonomous semantic compression in four 2D UAV swarm simulations shows potential for efficient collaborative communication under bandwidth constraints.
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning cs.CL · 2024-09-10 · unverdicted · none · ref 26
E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.
AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models cs.CL · 2024-09-03 · unverdicted · none · ref 10
AdaComp trains a compression-rate predictor on annotated minimum top-k data to adaptively retain only the documents needed for each RAG query.
Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks cs.CL · 2026-05-10 · unverdicted · none · ref 4
Byte-exact deduplication reduces RAG context size by 0.16% to 80.34% across three regimes with zero measurable quality regression per multi-vendor LLM evaluation.
Supplement Generation Training for Enhancing Agentic Task Performance cs.LG · 2026-04-22 · unverdicted · none · ref 30
SGT trains a lightweight model to generate task-specific supplemental text that improves performance of a larger frozen LLM on agentic tasks without modifying the large model.
A Survey on Efficient Inference for Large Language Models cs.CL · 2024-04-22 · accept · none · ref 43
The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.
Retrieval-Augmented Generation for Large Language Models: A Survey cs.CL · 2023-12-18 · unverdicted · none · ref 101
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.
Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations cs.CL · 2026-04-14 · unreviewed · ref 1

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer