AgentSpec introduces a customizable DSL for runtime enforcement of safety constraints on LLM agents, achieving over 90% prevention of unsafe code actions, zero hazardous embodied actions, and 100% AV compliance in evaluations.
Safeguarding large language models: A survey
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4representative citing papers
Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.
Bielik Guard delivers compact Polish safety classifiers with F1 scores near 0.79 and superior real-prompt precision over baselines.
The paper presents a layered method to translate governance objectives from standards such as ISO/IEC 42001 into four control layers for agentic AI, with runtime guardrails limited to observable, determinate, and time-sensitive controls, shown via a procurement-agent case study.
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
Survey of harmful fine-tuning attacks on LLMs, their variants, defense strategies, mechanical analysis, and evaluation methodologies.
citing papers explorer
-
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Survey of harmful fine-tuning attacks on LLMs, their variants, defense strategies, mechanical analysis, and evaluation methodologies.