pith. sign in

arxiv: 2411.05928 · v1 · pith:7PYMHGBQnew · submitted 2024-11-08 · 💻 cs.CL

Reducing Distraction in Long-Context Language Models by Focused Learning

classification 💻 cs.CL
keywords longcontextslearningllmsrelevantcontextcontrastivedistraction
0
0 comments X
read the original abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced their capacity to process long contexts. However, effectively utilizing this long context remains a challenge due to the issue of distraction, where irrelevant information dominates lengthy contexts, causing LLMs to lose focus on the most relevant segments. To address this, we propose a novel training method that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning. Specifically, during fine-tuning with long contexts, we employ a retriever to extract the most relevant segments, serving as augmented inputs. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned. Extensive experiments on long single-document and multi-document QA benchmarks demonstrate the effectiveness of our proposed method.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering

    cs.CL 2026-06 unverdicted novelty 5.0

    EASE-TTT creates a soft attention target from evidence chunks to guide query-side test-time adaptation, yielding higher macro-average scores than full-context, retrieval-only, and standard qTTT baselines on six LongBe...

  2. DisCEdge: Distributed Context Management for Large Language Models at the Edge

    cs.DC 2025-11 unverdicted novelty 5.0

    DisCEdge manages LLM context in tokenized form replicated on edge nodes, delivering up to 14.46% faster median responses, 15% lower sync overhead, and 90% smaller client requests versus baselines while ensuring consistency.