EmbodiedGovBench is a new benchmark framework that measures embodied agent systems on seven governance dimensions including policy adherence, recovery success, and upgrade safety.
Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whetmeher the new version may be activated safely, under what deployment conditions, with what monitoring, and when it should be rolled back. Existing software-deployment patterns (canary, blue-green, feature flags, MLOps pipelines) address parts of this loop but were designed for stateless web services rather than stateful, policy-constrained runtimes that drive AI components in the field. We study this problem in the setting of embodied agents, where capabilities are packaged as installable modules under runtime policy and recovery constraints. We formulate governed capability evolution as a first-class software-lifecycle problem for AI-component-based systems and propose a staged upgrade framework that treats every new capability version as a governed deployment candidate rather than an immediate replacement. The framework introduces four compatibility checks (interface, policy, behavioral, recovery) and organizes them into a staged pipeline of candidate validation, sandbox evaluation, shadow deployment, gated activation, online monitoring, and rollback. A reference prototype on a PyBullet/ROS 2 testbed evaluated over 6 upgrade rounds with 15 random seeds shows naive upgrade reaches 72.9% task success but drives unsafe activation to 60% by the final round, while governed upgrade retains comparable success (67.4%) with zero unsafe activations across all rounds (Wilcoxon p=0.003). Shadow deployment surfaces 40% of regressions invisible to sandbox alone, and rollback succeeds in 79.8% of post-activation drift scenarios. The work extends runtime governance from action execution to capability evolution.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
extension 1polarities
extend 1representative citing papers
FSAR is a fleet coordination architecture that preserves each robot as a single-agent runtime and achieves multi-robot coordination via capability sharing, delegation, and layered recovery instead of internal agent fragmentation.
ECM Contracts define a six-dimensional contract model for embodied capability modules that enables static checks for safe composition, installation, and versioned upgrades in robotics systems.
The Alignment Flywheel is a governance-centric hybrid MAS architecture that decouples decision generation from safety governance using a Proposer, Safety Oracle, runtime enforcement, and auditing governance layer for architecture-agnostic safety.
citing papers explorer
-
EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems
EmbodiedGovBench is a new benchmark framework that measures embodied agent systems on seven governance dimensions including policy adherence, recovery success, and upgrade safety.
-
Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation
FSAR is a fleet coordination architecture that preserves each robot as a single-agent runtime and achieves multi-robot coordination via capability sharing, delegation, and layered recovery instead of internal agent fragmentation.
-
ECM Contracts: Contract-Aware, Versioned, and Governable Capability Interfaces for Embodied Agents
ECM Contracts define a six-dimensional contract model for embodied capability modules that enables static checks for safe composition, installation, and versioned upgrades in robotics systems.
-
The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety
The Alignment Flywheel is a governance-centric hybrid MAS architecture that decouples decision generation from safety governance using a Proposer, Safety Oracle, runtime enforcement, and auditing governance layer for architecture-agnostic safety.