hub

N e M o Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

Rebedea, Traian, Dinu, Razvan, Sreedhar, Makesh Narsimhan, Parisien, Christopher, Cohen, Jonathan , booktitle = · 2023 · DOI 10.18653/v1/2023.emnlp-demo.40

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open at publisher browse 12 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills

cs.CR · 2026-06-05 · unverdicted · novelty 8.0

MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.

MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents

cs.AI · 2026-04-30 · unverdicted · novelty 8.0

MCPHunt benchmark finds 11.5-41.3% policy-violating credential propagation in multi-server MCP agents across five models, reducible up to 97% by prompt mitigations while retaining most utility.

Efficient and Sound Probabilistic Verification for AI Agents

cs.CR · 2026-06-18 · unverdicted · novelty 6.0

Presents a distributionally robust optimization method for sound probabilistic verification of Datalog policies in AI agents that bounds violation risk regardless of predicate correlations.

Cordon: Semantic Transactions for Tool-Using LLM Agents

cs.OS · 2026-06-16 · unverdicted · novelty 6.0

Cordon is a transactional runtime system that binds tool intents to reversible state, staged effects, and audit metadata to validate composed agent workflows before commit.

Deterministic Integrity Gates for LLM-Assisted Clinical Manuscript Preparation: An Auditable Biomedical Informatics Architecture

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

Presents MedSci Skills, an open-source toolkit with deterministic integrity gates for verifying LLM-assisted clinical manuscripts against reporting guidelines like STARD, PRISMA, and STROBE.

Triaging Threats to Specialized Guardrails

cs.CR · 2026-05-29 · unverdicted · novelty 6.0

Introduces GuardZoo benchmark and RouteGuard router-expert system showing monolithic guardrails suffer task interference while specialized routing improves threat detection and generalization.

PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

PRISM detects and stops credential leakage during LLM generation in multi-agent pipelines using per-token risk scores from lexical, structural, and behavioral signals, achieving zero observed leaks and F1 of 0.832 on a 2000-task benchmark.

From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents

cs.CR · 2026-06-03 · unverdicted · novelty 5.0 · 2 refs

This survey defines execution provenance as a typed graph of agent execution and evidence tracing as its projection onto evidence-support relations, then reviews methods, taxonomy, benchmarks, and challenges for auditable LLM agents.

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

cs.AI · 2026-05-26 · unverdicted · novelty 5.0

Neuro-symbolic pipeline using formal logic and semantic embeddings detects hallucinations in LLM medical reports at 83%+ for entities and 72% for fabrications while cutting creation time 30%.

ADR: An Agentic Detection System for Enterprise Agentic AI Security

cs.AI · 2026-05-17 · unverdicted · novelty 5.0

ADR is a three-component detection system for AI agents that combines telemetry sensors, red teaming, and two-tier detection, achieving 97.2% precision in a ten-month Uber deployment and outperforming baselines on the new ADR-Bench.

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

cs.SE · 2026-04-16 · unverdicted · novelty 5.0

Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.

TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts

cs.CR · 2026-04-17 · unverdicted · novelty 4.0

TWGuard achieves +0.289 F1 improvement and 94.9% false-positive reduction for LLM safety guardrails in the Taiwan linguistic context compared to foundation models and baselines.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Cordon: Semantic Transactions for Tool-Using LLM Agents cs.OS · 2026-06-16 · unverdicted · none · ref 37
Cordon is a transactional runtime system that binds tool intents to reversible state, staged effects, and audit metadata to validate composed agent workflows before commit.

N e M o Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer