{"total":12,"items":[{"citing_arxiv_id":"2606.29844","ref_index":87,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers","primary_cat":"cs.CL","submitted_at":"2026-06-29T06:33:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"MATCH augments sparsified attention with an efficient in-context retrieval system to boost performance on long-range recall tasks in transformers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.27237","ref_index":52,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"LMs as Task-Specific Knowledge Bases: An Interpretability Analysis","primary_cat":"cs.CL","submitted_at":"2026-06-25T16:22:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25198","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Heuresis: Search Strategies for Autonomous AI Research Agents Across Quality, Diversity and Novelty","primary_cat":"cs.AI","submitted_at":"2026-06-23T21:44:08+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Heuresis evaluates six search strategies for LLM research agents and shows they steer ideas along quality-diversity-novelty axes but fail to generate novel ideas that match or exceed known high-performing recipes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20419","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Spectral Query-Key Product Weight Steering for Training-Free VLM Hallucination Mitigation","primary_cat":"cs.CV","submitted_at":"2026-06-18T16:03:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"QK Product Steering suppresses dominant singular modes in the per-head QK product of selected middle layers via a closed-form query-only update, yielding 4.0% average relative CHAIR_s reduction on three GQA VLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03014","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency","primary_cat":"cs.LG","submitted_at":"2026-06-02T01:40:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MOSAIC uses an Integer Linear Program scheduler for expert placement and prompt assignment plus adaptive aggregation to achieve 1.7-2.3x end-to-end speedup on 4-GPU MoA workloads while keeping accuracy within 0.1pp.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20683","ref_index":1,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Layer-wise Token Compression for Efficient Document Reranking","primary_cat":"cs.IR","submitted_at":"2026-05-20T03:52:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08301","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Priming: Hybrid State Space Models From Pre-trained Transformers","primary_cat":"cs.LG","submitted_at":"2026-05-08T11:43:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"recurrence next. 1D convolutions.In SSMs, 1D convolutions are applied along the sequence dimension independently across each channel. Without loss of generality, we illustrate with a scalar sequence. Consider a scalar sequenceU={u i}l i=1 of lengthland a 1D convolution filter of sizedconv. The 1D convolution operation can be written as follows: for allt∈[1, l], yt = dconvX i=1 wi ·u t−dconv+i,(14) wherew∈R dconv is the convolution kernel andyt is the output at timet, with the convention thatuj = 0 for allj≤0. Notice from Equation (14) that the 1D convolution for any token in the input sequence depends on itself and the previousd conv −1tokens. Under the simple SP pattern, computing the output for the"},{"citing_arxiv_id":"2605.00528","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters","primary_cat":"cs.DC","submitted_at":"2026-05-01T09:05:28+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"execute iterativeThought-Action-Observationloops [ 66] that may invoke 10-100 large language model (LLM) calls per task [31], inter- leaved with external tool invocations such as code execution, web browsing, or database queries. These compound AI systems [69] have become central to major deployments including GitHub Copi- lot Workspace [15], Amazon Q Developer [3], and enterprise au- tomation platforms [ 10, 35], which now route millions of such agentic workloads through shared GPU clusters daily. 1.1 Motivation The shift from single-shot inference to multi-step agentic workloads creates a fundamental mismatch with existing GPU cluster sched- uling systems [7, 27]. Current LLM serving frameworks [34, 68, 70]"},{"citing_arxiv_id":"2604.17366","ref_index":94,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ArgBench: Benchmarking LLMs on Computational Argumentation Tasks","primary_cat":"cs.CL","submitted_at":"2026-04-19T10:23:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.16378","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs","primary_cat":"cs.CL","submitted_at":"2025-12-18T10:21:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.17396","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments","primary_cat":"cs.CL","submitted_at":"2025-09-22T06:56:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EpiCache clusters long conversation history into coherent episodes for per-episode KV cache eviction, delivering up to 30% accuracy gains and 3.7x peak memory reduction on LongConvQA tasks under fixed budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2402.19173","ref_index":150,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"StarCoder 2 and The Stack v2: The Next Generation","primary_cat":"cs.SE","submitted_at":"2024-02-29T13:53:35+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}