Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
In-Context Retrieval-Augmented Language Models
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.
RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.
MemShot renders local dialogue spans as structured visual memory units to improve long-term dialogue modeling in LLMs, achieving competitive benchmark performance with 70x faster memory construction.
A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.
A single hub text can unreasonably match many images in CLIP-based similarity, exposing vulnerabilities in cross-modal encoders for caption evaluation and retrieval.
Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
G-Defense builds claim-centered graphs from sub-claims, applies RAG for evidence and competing explanations, then uses graph inference to detect fake news veracity and generate intuitive explanation graphs, claiming SOTA results.
In production moderation and contrast tasks, selective retrieval of case-dependent external experience outperforms unconditional global injection on quality-cost trade-offs, with retrieval quality and output regime mattering more than Top-K size.
citing papers explorer
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
-
Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling
RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.
-
Boosting Self-Consistency with Ranking
RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
-
Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG
CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.
-
Memory Shot for Long-Term Dialogue
MemShot renders local dialogue spans as structured visual memory units to improve long-term dialogue modeling in LLMs, achieving competitive benchmark performance with 70x faster memory construction.
-
Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks
A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.
-
One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness
A single hub text can unreasonably match many images in CLIP-based similarity, exposing vulnerabilities in cross-modal encoders for caption evaluation and retrieval.
-
All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG
Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
-
A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM
G-Defense builds claim-centered graphs from sub-claims, applies RAG for evidence and competing explanations, then uses graph inference to detect fake news veracity and generate intuitive explanation graphs, claiming SOTA results.
-
External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs
In production moderation and contrast tasks, selective retrieval of case-dependent external experience outperforms unconditional global injection on quality-cost trade-offs, with retrieval quality and output regime mattering more than Top-K size.