{"paper":{"title":"The Informational Cost of Agency: A Bounded Measure of Interaction Efficiency for Deployed Reinforcement Learning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Bipredictability measures closed-loop efficiency in deployed RL agents with a universal upper bound of 0.5 that drops when agency is present.","cross_cats":["cs.LG"],"primary_cat":"cs.AI","authors_text":"Amit Nazeri, Cameron Reid, Wael Hafez","submitted_at":"2026-03-01T21:38:39Z","abstract_excerpt":"Deployed reinforcement learning systems lack a principled runtime reliability theory. We close this gap by introducing Bipredictability, P, a closed form information theoretic metric that quantifies how efficiently a closed loop interaction between agent and environment converts uncertainty into shared predictability. P admits a provable classical bound P equal, smaller than 0.5, derived from Shannon entropy subadditivity, and responsive agency necessarily suppresses P below this ceiling, a structural prediction we term the informational cost of agency. Across 21 trained continuous control age"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"A theoretical property is a provable classical upper bound P is less than or equal to 0.5, independent of domain, task, or agent, a structural consequence of Shannon entropy rather than an empirical observation. When agency is present, a penalty suppresses P strictly below this ceiling, confirmed at P equals 0.33 across trained agents.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That uncertainty resolution across the observation-action-outcome loop is the fundamental quantity that must be maintained for reliable deployed performance, and that this can be computed from the observable stream alone without reference to internal model states.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Bipredictability P measures shared predictability in RL interaction loops with a universal upper bound of 0.5 that agency suppresses to 0.33, operationalized via an Information Digital Twin for faster detection of coupling degradation than reward metrics.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Bipredictability measures closed-loop efficiency in deployed RL agents with a universal upper bound of 0.5 that drops when agency is present.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"6765d366405ea0d9f41dc1aab20579bc23eee3dc494acc3de74b48a1a475e049"},"source":{"id":"2603.01283","kind":"arxiv","version":3},"verdict":{"id":"04e1912d-3986-41d0-809e-85c49b8e35b4","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T17:30:54.755508Z","strongest_claim":"A theoretical property is a provable classical upper bound P is less than or equal to 0.5, independent of domain, task, or agent, a structural consequence of Shannon entropy rather than an empirical observation. When agency is present, a penalty suppresses P strictly below this ceiling, confirmed at P equals 0.33 across trained agents.","one_line_summary":"Bipredictability P measures shared predictability in RL interaction loops with a universal upper bound of 0.5 that agency suppresses to 0.33, operationalized via an Information Digital Twin for faster detection of coupling degradation than reward metrics.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That uncertainty resolution across the observation-action-outcome loop is the fundamental quantity that must be maintained for reliable deployed performance, and that this can be computed from the observable stream alone without reference to internal model states.","pith_extraction_headline":"Bipredictability measures closed-loop efficiency in deployed RL agents with a universal upper bound of 0.5 that drops when agency is present."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2603.01283/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"fb2c9113642edf5e362b576117e2c0ee4e63e08498026764160b93d366f40a8b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}