CrossPool separates weights and KV-cache into distinct GPU pools plus a planner, virtualizer, and layer-wise scheduler to cut P99 time-between-tokens by up to 10.4x versus prior kvcached multi-LLM systems.
Mixed citations
inProceed- ingsofthe2023ConferenceonEmpiricalMethodsinNaturalLanguageProcessing2511–2522 (Association for Computational Linguistics, Singapore, 2023)
Mixed citation behavior. Most common role is background (60%).
citation-role summary
citation-polarity summary
representative citing papers
QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
MemPoison enables stealthy memory poisoning in LLM agents via dialogue by using semantic relational bridges, entity masquerading, and joint embedding optimization to bypass selective extraction and rewriting, achieving up to 0.95 attack success rate.
SilentRetrieval is a data poisoning attack achieving 84.6% HR@10 and 57.5% ASR-LLM on Natural Questions via coordinated beam search and trigger fusion while preserving document fluency.
Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs
DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.
IPQA is a new benchmark that measures how well models identify core user intents from history in personalized question answering, finding that performance is poor and declines with greater question complexity.
Crowdsourced metaphors show rising anthropomorphism and warmth toward AI that predict trust and adoption, with notable demographic differences.
ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.
A multi-LLM council scores predictive processing papers on an expert ontology, maps results in 3D hypothesis space, and introduces a dispersion metric showing greater spread in global versus local oddball paradigms.
SPLADE models produce wacky expansion terms whose prevalence rises with larger vocabularies and falls with stricter sparsity; these terms primarily aid in-domain retrieval rather than out-of-domain generalization.
Introduces bounded relational presence as a designable, tunable, and withdrawable quality for conversational AI that supports engagement while avoiding claims of personhood or human equivalence.
A neural cellular automaton learns compositional rules from data alone to achieve structural generalization on the SLOG semantic parsing benchmark, reaching 67.3% accuracy and fully succeeding on 11 of 17 categories.
Symptom Induction compresses labeled data into interpretable guidelines that improve LLM classification of depression symptoms in text, outperforming zero-shot, in-context, and fine-tuning approaches with gains on rare symptoms and cross-disease generalization.
GRASP improves multimodal sarcasm target identification by anchoring visual regions in grounded chain-of-thought reasoning and using dual-stage optimization on a new balanced dataset.
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
MirrorBench defines a reproducible benchmark combining lexical metrics (MATTR, Yule's K, HD-D) and LLM-judge metrics with calibration controls to measure human-likeness of user-proxy agents across four datasets.
MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.
NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.
CoCoMUT is a reusable pipeline that discovers project structure, constructs call graphs, extracts source, reconciles bytecode to source, and emits versioned JSON datasets of method contexts, demonstrated on 20 Java repositories with 97.8% reconciliation and 99% audit accuracy.
A dual-agent closed-loop system integrates Theory of Mind reasoning with multimodal video generation to create social avatars that outperform full-information baselines on dialogue quality under information asymmetry.
EgoCoT-Bench provides 3,172 verifiable QA pairs across perception, anticipation, and reasoning tasks on egocentric videos, revealing that many MLLMs give answer-correct but evidence-inconsistent explanations.
TextClusterLab introduces an LLM-driven generator for synthetic text clustering datasets with tunable attributes and a suitability benchmark for evaluation.
Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.
citing papers explorer
-
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving
QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
-
MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness
MirrorBench defines a reproducible benchmark combining lexical metrics (MATTR, Yule's K, HD-D) and LLM-judge metrics with calibration controls to measure human-likeness of user-proxy agents across four datasets.
-
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.