Analysis of 67,057 servers across six registries reveals widespread conditions for server hijacking and metadata manipulation in MCP, with a new tool MCPInspect flagging 833 vulnerable servers and 18 with suspicious descriptions.
Adversarial demonstration attacks on large language models
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
polarities
background 4representative citing papers
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
Compact open-source LLMs can produce syntactically valid, semantically complete, and inter-model consistent DSL models from text via few-shot prompting, with some 7B-12B models matching much larger ones in quality.
A survey that creates taxonomies for jailbreak attacks and defenses on LLMs, subdivides them into sub-classes, and compares evaluation approaches.
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.
citing papers explorer
-
A First Look at the Security Issues in the Model Context Protocol Ecosystem
Analysis of 67,057 servers across six registries reveals widespread conditions for server hijacking and metadata manipulation in MCP, with a new tool MCPInspect flagging 833 vulnerable servers and 18 with suspicious descriptions.
-
Towards an AI co-scientist
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
-
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
-
From Text to DSL: Evaluating Grammar-Based Model Generation Using Open LLMs
Compact open-source LLMs can produce syntactically valid, semantically complete, and inter-model consistent DSL models from text via few-shot prompting, with some 7B-12B models matching much larger ones in quality.
-
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
A survey that creates taxonomies for jailbreak attacks and defenses on LLMs, subdivides them into sub-classes, and compares evaluation approaches.
-
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.