The first SoK on LLM-as-a-Judge security organizes attacks targeting judges, attacks using judges, defenses leveraging judges, and security-domain applications while flagging vulnerabilities.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
RepoBench is a new benchmark with retrieval, completion, and pipeline tasks to evaluate code auto-completion systems on entire repositories instead of single files.
Knowledge Capsules provide structured nonparametric memory units that integrate external knowledge directly into LLMs' attention computation via key-value injection, outperforming RAG and GraphRAG on QA benchmarks without parameter updates.
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
MaxShapley computes fair document attributions in generative QA by reducing Shapley value calculation to polynomial time via a max-sum utility, matching exact Shapley quality on HotPotQA, MuSiQUE, and MS MARCO while using up to 9x fewer resources.
citing papers explorer
-
Security in LLM-as-a-Judge: A Comprehensive SoK
The first SoK on LLM-as-a-Judge security organizes attacks targeting judges, attacks using judges, defenses leveraging judges, and security-domain applications while flagging vulnerabilities.
-
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
RepoBench is a new benchmark with retrieval, completion, and pipeline tasks to evaluate code auto-completion systems on entire repositories instead of single files.
-
Knowledge Capsules: Structured Nonparametric Memory Units for LLMs
Knowledge Capsules provide structured nonparametric memory units that integrate external knowledge directly into LLMs' attention computation via key-value injection, outperforming RAG and GraphRAG on QA benchmarks without parameter updates.
-
Efficient Streaming Language Models with Attention Sinks
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
-
MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution
MaxShapley computes fair document attributions in generative QA by reducing Shapley value calculation to polynomial time via a max-sum utility, matching exact Shapley quality on HotPotQA, MuSiQUE, and MS MARCO while using up to 9x fewer resources.