DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
PinpointQA is the first benchmark dataset for small object-centric spatial understanding in indoor videos, with four progressive tasks built from ScanNet data.
ComPASS creates tool-augmented LLM agents for substantive social support, releases the first personalized benchmark ComPASS-Bench, and fine-tunes ComPASS-Qwen to outperform its base model while matching larger LLMs.
Targeted, evidence-rich context partitions improve causal clarity and actionability of LLM failure explanations while large undifferentiated contexts produce vaguer outputs, with higher-quality explanations correlating to better downstream repair rates.
EvoRAG adds a feedback-driven backpropagation step that attributes response quality to individual knowledge-graph triplets and updates the graph to raise reasoning accuracy by 7.34 percent over prior KG-RAG methods.
GroupGPT decouples intervention timing from response generation via edge-cloud collaboration for multi-user chats, scoring 4.72/5 on the new MUIR benchmark of 2500 segments while cutting token use by up to 3x and adding privacy sanitization.
citing papers explorer
-
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
-
PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos
PinpointQA is the first benchmark dataset for small object-centric spatial understanding in indoor videos, with four progressive tasks built from ScanNet data.
-
ComPASS: Towards Personalized Agentic Social Support via Tool-Augmented Companionship
ComPASS creates tool-augmented LLM agents for substantive social support, releases the first personalized benchmark ComPASS-Bench, and fine-tunes ComPASS-Qwen to outperform its base model while matching larger LLMs.
-
From Program Slices to Causal Clarity: Evaluating Faithful, Actionable LLM-Generated Failure Explanations via Context Partitioning and LLM-as-a-Judge
Targeted, evidence-rich context partitions improve causal clarity and actionability of LLM failure explanations while large undifferentiated contexts produce vaguer outputs, with higher-quality explanations correlating to better downstream repair rates.
-
EvoRAG: Making Knowledge Graph-based RAG Automatically Evolve through Feedback-driven Backpropagation
EvoRAG adds a feedback-driven backpropagation step that attributes response quality to individual knowledge-graph triplets and updates the graph to raise reasoning accuracy by 7.34 percent over prior KG-RAG methods.
-
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
GroupGPT decouples intervention timing from response generation via edge-cloud collaboration for multi-user chats, scoring 4.72/5 on the new MUIR benchmark of 2500 segments while cutting token use by up to 3x and adding privacy sanitization.