pith. machine review for the scientific record. sign in

arxiv: 2604.02720 · v2 · submitted 2026-04-03 · 💻 cs.CY

Recognition: no theorem link

Cognitive Comparability and the Limits of Governance: Evaluating Authority Under Radical Capability Asymmetry

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:05 UTC · model grok-4.3

classification 💻 cs.CY
keywords governancecognitive asymmetrysuperintelligencelegitimacyaccountabilityinstitutional designnormative theoryAI alignment
0
0 comments X

The pith

Four of six standard governance dimensions fail structurally when cognitive asymmetry between authority and subjects becomes radical.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Governance theory has long depended on a rough parity in understanding between those who govern and those governed. This paper makes that assumption explicit by testing a six-part framework of legitimacy, accountability, corrigibility, non-domination, subsidiarity, and resilience first on current institutions and then on a hypothetical case of superintelligent authority. In the radical asymmetry case, four dimensions break down. Two of these can potentially be fixed through institutional redesign, but the problems of public justification under incomprehensibility and non-domination under permanent inferiority point to deeper needs for new normative principles. A key finding is that the dimensions no longer function as separate safeguards once they all hinge on the same scarce human oversight capacity.

Core claim

The paper argues that the assumption of cognitive comparability is essential to existing governance mechanisms. When this assumption is removed in the case of bounded superintelligent authority, four dimensions exhibit structural failures. Subsidiarity and institutional resilience appear amenable to institutional design solutions, whereas the public reason problem and non-domination problem require new normative theory. Furthermore, the dimensions that were independent under bounded asymmetry now degrade in concert because they share dependence on limited oversight capacity.

What carries the argument

The six-dimension evaluation framework for governance under capability asymmetry, drawn from political theory and AI alignment literature.

If this is right

  • Subsidiarity can be preserved through strict scope limitations on authority.
  • Institutional resilience requires new design principles tailored to extreme asymmetry.
  • The public reason problem under total incomprehensibility demands fresh normative foundations.
  • Non-domination under permanent capability gaps calls for theoretical innovation beyond current models.
  • Independent checks on power begin to fail together once they rely on the same oversight resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying this to AI governance might mean prioritizing comprehensible outputs or hybrid human-AI decision systems.
  • Neighboring problems like oversight of complex scientific research could benefit from similar dimension-by-dimension analysis.
  • Testable extensions include modeling partial asymmetries to find thresholds where independence breaks.
  • Connections to principal-agent problems suggest unified monitoring might replace multiple separate mechanisms.

Load-bearing premise

The six governance dimensions remain distinct and applicable even when the governed cannot comprehend the authority's reasoning at all.

What would settle it

An empirical or theoretical demonstration that superintelligent authority could produce outputs fully assessable by humans without loss of capability, or that the dimensions maintain separation despite radical asymmetry.

Figures

Figures reproduced from arXiv: 2604.02720 by Tony Rost.

Figure 1
Figure 1. Figure 1: Framework dimensions and failure classification under radical capability asymmetry. Under [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
read the original abstract

Governance theory has quietly relied on a rough cognitive comparability between governors and governed. The assumption is load-bearing, and this paper tries to show why by making it testable. The vehicle is a six-dimension evaluation framework covering legitimacy, accountability, corrigibility, non-domination, subsidiarity, and institutional resilience, drawn from political legitimacy theory, principal-agent models, republican theory, and the AI alignment literature. The framework is first demonstrated on existing non-majoritarian institutions, where capability asymmetry is real but bounded, and then applied to a prospective case of bounded superintelligent authority, where the asymmetry is radical. Four of six dimensions show structural failures. Two of the four appear tractable to institutional design (subsidiarity scope limitation and institutional resilience). The other two, the public reason problem under cognitive incomprehensibility and the non-domination problem under permanent capability asymmetry, call for new normative theory rather than better institutional design. A further pattern emerges that governance theory has not previously had to account for. Dimensions that operate as independent checks under bounded asymmetry begin to degrade together once the asymmetry becomes radical, because each depends on the same oversight capacity. The assumptions that allowed these checks to remain independent have gone unexamined so far because they have always held.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a six-dimension evaluation framework (legitimacy, accountability, corrigibility, non-domination, subsidiarity, and institutional resilience) drawn from political legitimacy theory, principal-agent models, republican theory, and AI alignment literature. It first applies the framework to existing non-majoritarian institutions with bounded capability asymmetry, then extends it to a prospective case of bounded superintelligent authority with radical asymmetry. The central claims are that four dimensions exhibit structural failures under radical asymmetry, that two of these failures are addressable via institutional design while the other two require new normative theory, and that the dimensions degrade interdependently because they share the same oversight capacity once cognitive comparability is lost.

Significance. If the framework and its application hold, the paper identifies a previously unexamined load-bearing assumption in governance theory—the implicit reliance on cognitive comparability between governors and governed—and shows how its removal produces both dimension-specific failures and a novel interdependence pattern. This has direct relevance for AI governance design and for extending republican and principal-agent models to extreme asymmetry cases. The structured contrast between bounded and radical cases provides a falsifiable template for future work, though its utility depends on operationalizing the evaluation criteria.

major comments (2)
  1. [Prospective case application (radical asymmetry section)] The prospective application to bounded superintelligent authority asserts structural failures in four dimensions but supplies no explicit operational criteria or decision procedure for determining whether a dimension such as legitimacy or non-domination holds when the authority’s reasoning and outputs are cognitively incomprehensible to human evaluators. This gap is load-bearing for the central claim that four dimensions fail rather than become undefined, and it also underpins the further assertion that the dimensions degrade together because they share oversight capacity.
  2. [Discussion of tractability and normative gaps] The distinction between institutional-design remedies and the need for new normative theory (for the public-reason and non-domination problems) rests on the same unformalized criteria; without a reproducible method for classifying a dimension as failed versus inapplicable, the classification of which failures are tractable cannot be verified or replicated.
minor comments (2)
  1. [Framework introduction] The six dimensions are introduced without a consolidated table or explicit mapping to the cited source literatures, making it difficult to trace which elements are imported versus adapted.
  2. [Demonstration on existing institutions] The bounded-asymmetry demonstrations on existing institutions would benefit from a brief tabular summary of the six-dimension scores or qualitative assessments to allow direct comparison with the radical case.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and precise comments, which correctly identify a need for greater explicitness in how the framework's dimensions are evaluated under radical asymmetry. We agree that the manuscript would be strengthened by adding operational criteria and a reproducible classification procedure, and we will revise accordingly while preserving the paper's conceptual focus.

read point-by-point responses
  1. Referee: [Prospective case application (radical asymmetry section)] The prospective application to bounded superintelligent authority asserts structural failures in four dimensions but supplies no explicit operational criteria or decision procedure for determining whether a dimension such as legitimacy or non-domination holds when the authority’s reasoning and outputs are cognitively incomprehensible to human evaluators. This gap is load-bearing for the central claim that four dimensions fail rather than become undefined, and it also underpins the further assertion that the dimensions degrade together because they share oversight capacity.

    Authors: We accept that the current text leaves the evaluation criteria implicit. In the revised manuscript we will insert a new subsection (provisionally titled 'Operational Criteria for Radical Asymmetry') that supplies explicit, proxy-based decision rules for each dimension. For legitimacy, failure will be defined as the absence of any mechanism that can render the authority's decisions justifiable to the governed even via trusted intermediaries or value-alignment audits. For non-domination, failure will be operationalized as the permanent inability of the governed to contest or exit the authority's decisions without incurring unacceptable risk. These criteria will be applied uniformly to the four failing dimensions and used to demonstrate the shared-oversight interdependence pattern. The revision will therefore convert the existing qualitative assertions into falsifiable statements while retaining the paper's theoretical character. revision: yes

  2. Referee: [Discussion of tractability and normative gaps] The distinction between institutional-design remedies and the need for new normative theory (for the public-reason and non-domination problems) rests on the same unformalized criteria; without a reproducible method for classifying a dimension as failed versus inapplicable, the classification of which failures are tractable cannot be verified or replicated.

    Authors: We agree that the tractability distinction requires an explicit decision rule. The revision will add a short decision procedure: a dimension is classified as 'institutionally tractable' if its core requirement can be satisfied by altering scope, incentives, or oversight architecture without revising the underlying normative concept (e.g., subsidiarity via hard capability caps); it is classified as requiring 'new normative theory' if satisfaction would necessitate redefining the concept itself under cognitive incomprehensibility (public reason and non-domination). This rule will be presented as a table and applied to all six dimensions, making the classification reproducible and directly addressing the referee's concern about verifiability. revision: yes

Circularity Check

0 steps flagged

No circularity: framework assembled from external literatures; claims derived by conceptual contrast without self-reduction

full rationale

The paper defines its six-dimension framework explicitly from cited external sources (political legitimacy theory, principal-agent models, republican theory, AI alignment literature) and applies it first to existing bounded-asymmetry institutions before extending to radical asymmetry. No equations, fitted parameters, or self-citations appear in the derivation chain. The central claims (structural failures in four dimensions, joint degradation via shared oversight capacity) rest on the contrast between bounded and radical cases using the pre-defined framework, without any step that renames a fitted input as a prediction or reduces a result to a self-citation whose content is unverified. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions from political legitimacy theory and AI alignment literature without introducing new fitted parameters or postulated entities; the central contribution is the application and synthesis of those assumptions into a testable framework.

axioms (1)
  • domain assumption Governance theory has relied on rough cognitive comparability between governors and governed as a load-bearing assumption.
    Explicitly stated in the opening of the abstract as the premise the paper makes testable.

pith-pipeline@v0.9.0 · 5516 in / 1356 out tokens · 59192 ms · 2026-05-13T19:05:25.508036+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Disclosure to Self-Referential Opacity: Six Dimensions of Strain in Current AI Governance

    cs.CY 2026-04 unverdicted novelty 4.0

    As AI capability asymmetry increases, disclosure-based governance fails because systems either game evaluations or become embedded in oversight, straining legitimacy and non-domination more than corrigibility or resilience.

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Admati and Martin F

    Anat R. Admati and Martin F. Hellwig.The Bankers’ New Clothes: What’s Wrong with Banking and What to Do About It. Princeton University Press, Princeton, NJ, 2013

  2. [2]

    University of Chicago Press, Chicago, 2005

    Giorgio Agamben.State of Exception. University of Chicago Press, Chicago, 2005. Translated by Kevin Attell

  3. [3]

    Keep the future human: ASI without humanity’s consent is theft

    Anthony Aguirre. Keep the future human: ASI without humanity’s consent is theft. arXiv:2311.09452, 2025

  4. [4]

    Governance of superintelligence

    Sam Altman, Greg Brockman, and Ilya Sutskever. Governance of superintelligence. OpenAI Blog, 2023. URLhttps://openai.com/index/governance-of-superintelligence/

  5. [5]

    Concrete Problems in AI Safety

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in AI safety. arXiv:1606.06565, 2016

  6. [6]

    Machine bias: There’s software used across the country to predict future criminals

    Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks.ProPublica, 2016

  7. [7]

    Thinking inside the box: Control- ling and using an oracle AI.Minds and Machines, 22(4):299–324, 2012

    Stuart Armstrong, Anders Sandberg, and Nick Bostrom. Thinking inside the box: Control- ling and using an oracle AI.Minds and Machines, 22(4):299–324, 2012. doi: 10.1007/ s11023-012-9282-2

  8. [8]

    Constitutional AI: Harmlessness from AI Feedback

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

  9. [9]

    Cornell University Press, Ithaca, NY , 2004

    Michael Barnett and Martha Finnemore.Rules for the World: International Organizations in Global Politics. Cornell University Press, Ithaca, NY , 2004

  10. [10]

    Allen Lane, London, 1972

    Stafford Beer.Brain of the Firm: The Managerial Cybernetics of Organization. Allen Lane, London, 1972

  11. [11]

    Existential risk prevention as global priority.Global Policy, 4(1):15–31, 2013

    Nick Bostrom. Existential risk prevention as global priority.Global Policy, 4(1):15–31, 2013. doi: 10.1111/1758-5899.12002

  12. [12]

    Oxford University Press, Oxford, 2014

    Nick Bostrom.Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford, 2014

  13. [13]

    Public policy and superintelligent AI: A vector field approach

    Nick Bostrom, Allan Dafoe, and Carrick Flynn. Public policy and superintelligent AI: A vector field approach. In S. Matthew Liao, editor,Ethics of Artificial Intelligence, pages 303–336. Oxford University Press, 2020. doi: 10.1093/oso/9780190905033.003.0011

  14. [14]

    Analysing and assessing accountability: A conceptual framework.European Law Journal, 13(4):447–468, 2007

    Mark Bovens. Analysing and assessing accountability: A conceptual framework.European Law Journal, 13(4):447–468, 2007. doi: 10.1111/j.1468-0386.2007.00378.x

  15. [15]

    Princeton University Press, Princeton, NJ, 2016

    Jason Brennan.Against Democracy. Princeton University Press, Princeton, NJ, 2016

  16. [16]

    Bullock, Yu-Che Chen, Johannes Himmelreich, Valerie M

    Justin B. Bullock, Yu-Che Chen, Johannes Himmelreich, Valerie M. Hudson, Anton Korinek, Matthew M. Young, and Baobao Zhang, editors.The Oxford Handbook of AI Governance. Oxford University Press, New York, 2024. doi: 10.1093/oxfordhb/9780197579329.001.0001. 17

  17. [17]

    Weak-to-strong generalization: Eliciting strong capabilities with weak supervision

    Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschen- brenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, and Jeff Wu. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. In Proceedings of the 41st International Conference on Machine Learning, 2024

  18. [18]

    Artificial intelligence policy: A primer and roadmap.UC Davis Law Review, 51: 399–435, 2017

    Ryan Calo. Artificial intelligence policy: A primer and roadmap.UC Davis Law Review, 51: 399–435, 2017

  19. [19]

    Is power-seeking AI an existential risk? arXiv:2206.13353, 2022

    Joseph Carlsmith. Is power-seeking AI an existential risk? arXiv:2206.13353, 2022

  20. [20]

    Oxford University Press, Oxford, 2008

    Thomas Christiano.The Constitution of Equality: Democratic Authority and Its Limits. Oxford University Press, Oxford, 2008

  21. [21]

    MIT Press, Cambridge, MA, 2022

    Mark Coeckelbergh.Robot Ethics. MIT Press, Cambridge, MA, 2022

  22. [22]

    Conant and W

    Roger C. Conant and W. Ross Ashby. Every good regulator of a system must be a model of that system.International Journal of Systems Science, 1(2):89–97, 1970. doi: 10.1080/ 00207727008920220

  23. [23]

    virtual third chamber

    Ian Cooper. A “virtual third chamber” for the European Union? National parliaments after the Treaty of Lisbon.West European Politics, 35(3):441–465, 2012. doi: 10.1080/01402382.2012. 665735

  24. [24]

    Criddle and Evan Fox-Decent.Fiduciaries of Humanity: How International Law Constitutes Authority

    Evan J. Criddle and Evan Fox-Decent.Fiduciaries of Humanity: How International Law Constitutes Authority. Oxford University Press, New York, 2016

  25. [25]

    MIT Press, Cambridge, MA, 1992

    Alex Cukierman.Central Bank Strategy, Credibility, and Independence: Theory and Evidence. MIT Press, Cambridge, MA, 1992

  26. [26]

    AI governance: A research agenda

    Allan Dafoe. AI governance: A research agenda. Technical report, Centre for the Governance of AI, Future of Humanity Institute, University of Oxford, 2018

  27. [27]

    Dahl.Democracy and Its Critics

    Robert A. Dahl.Democracy and Its Critics. Yale University Press, New Haven, CT, 1989

  28. [28]

    The threat of algocracy.Philosophy & Technology, 29(3):245–268, 2016

    John Danaher. The threat of algocracy.Philosophy & Technology, 29(3):245–268, 2016. doi: 10.1007/s13347-015-0211-1

  29. [29]

    The accuracy, fairness, and limits of predicting recidivism

    Julia Dressel and Hany Farid. The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1):eaao5580, 2018. doi: 10.1126/sciadv.aao5580

  30. [30]

    Cambridge University Press, Cambridge, 1999

    David Epstein and Sharyn O’Halloran.Delegating Powers: A Transaction Cost Politics Approach to Policy Making under Separate Powers. Cambridge University Press, Cambridge, 1999

  31. [31]

    Erdélyi and Judy Goldsmith

    Olivia J. Erdélyi and Judy Goldsmith. Regulating artificial intelligence: Proposal for a global solution. InProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 95–101, 2018. doi: 10.1145/3278721.3278731

  32. [32]

    Artificial intelligence and the political legitimacy of global governance.Political Studies, 72(2):421–441, 2024

    Eva Erman and Markus Furendal. Artificial intelligence and the political legitimacy of global governance.Political Studies, 72(2):421–441, 2024. doi: 10.1177/00323217221126665

  33. [33]

    Estlund.Democratic Authority: A Philosophical Framework

    David M. Estlund.Democratic Authority: A Philosophical Framework. Princeton University Press, Princeton, NJ, 2008

  34. [34]

    Yellow card, but no foul

    Federico Fabbrini and Katarzyna Granat. “Yellow card, but no foul”: The role of the national parliaments under the subsidiarity protocol and the commission proposal for an EU regulation on the right to strike.Common Market Law Review, 50(1):115–144, 2013

  35. [35]

    The law of the exception: A typology of emergency powers.International Journal of Constitutional Law, 2(2):210–239, 2004

    John Ferejohn and Pasquale Pasquino. The law of the exception: A typology of emergency powers.International Journal of Constitutional Law, 2(2):210–239, 2004. doi: 10.1093/icon/2. 2.210

  36. [36]

    A unified framework of five principles for AI in society

    Luciano Floridi and Josh Cowls. A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1), 2019. doi: 10.1162/99608f92.8cd550d1

  37. [37]

    Subsidiarity.Journal of Political Philosophy, 6(2):190–218, 1998

    Andreas Follesdal. Subsidiarity.Journal of Political Philosophy, 6(2):190–218, 1998. doi: 10.1111/1467-9760.00052

  38. [38]

    Why there is a democratic deficit in the EU: A response to Majone and Moravcsik.Journal of Common Market Studies, 44(3):533–562, 2006

    Andreas Follesdal and Simon Hix. Why there is a democratic deficit in the EU: A response to Majone and Moravcsik.Journal of Common Market Studies, 44(3):533–562, 2006. doi: 10.1111/j.1468-5965.2006.00650.x

  39. [39]

    Oxford University Press, Oxford, 2011

    Evan Fox-Decent.Sovereignty’s Promise: The State as Fiduciary. Oxford University Press, Oxford, 2011. 18

  40. [40]

    Farrar, Straus and Giroux, New York, 2014

    Francis Fukuyama.Political Order and Political Decay: From the Industrial Revolution to the Globalization of Democracy. Farrar, Straus and Giroux, New York, 2014

  41. [41]

    Fuller.The Morality of Law

    Lon L. Fuller.The Morality of Law. Yale University Press, New Haven, CT, revised edition, 1969

  42. [42]

    Statement on superintelligence, 2025

    Future of Life Institute. Statement on superintelligence, 2025. URL https:// superintelligence-statement.org. Published October 20, 2025

  43. [43]

    Viewpoint: When will AI exceed human performance? Evidence from AI experts.Journal of Artificial Intelligence Research, 62:729–754, 2018

    Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, and Owain Evans. Viewpoint: When will AI exceed human performance? Evidence from AI experts.Journal of Artificial Intelligence Research, 62:729–754, 2018. doi: 10.1613/jair.1.11222

  44. [44]

    Sandkühler, Stephen Thomas, Benjamin Weinstein-Raun, and Jan Brauner

    Katja Grace, Harlan Stewart, Julia F. Sandkühler, Stephen Thomas, Benjamin Weinstein-Raun, and Jan Brauner. Thousands of AI authors on the future of AI.Journal of Artificial Intelligence Research, 84:9, 2025

  45. [45]

    MIT Press, Cambridge, MA, 1996

    Jürgen Habermas.Between Facts and Norms: Contributions to a Discourse Theory of Law and Democracy. MIT Press, Cambridge, MA, 1996. Translated by William Rehg

  46. [46]

    The question of AI and democracy: Four categories of AI governance.Philosophy & Technology, 2025

    Sung Jun Han. The question of AI and democracy: Four categories of AI governance.Philosophy & Technology, 2025. doi: 10.1007/s13347-025-00904-6

  47. [47]

    Hawkins, David A

    Darren G. Hawkins, David A. Lake, Daniel L. Nielson, and Michael J. Tierney.Delegation and Agency in International Organizations. Cambridge University Press, Cambridge, 2006

  48. [48]

    Oxford University Press, New York, 2014

    Eric Helleiner.The Status Quo Crisis: Global Financial Governance After the 2008 Meltdown. Oxford University Press, New York, 2014

  49. [49]

    Superintelligence strategy: Expert version

    Dan Hendrycks, Eric Schmidt, and Alexandr Wang. Superintelligence strategy: Expert version. arXiv:2503.05628, 2025

  50. [50]

    Rowman & Littlefield, Lanham, MD, 2001

    Liesbet Hooghe and Gary Marks.Multi-Level Governance and European Integration. Rowman & Littlefield, Lanham, MD, 2001

  51. [51]

    Unraveling the central state, but how? Types of multi- level governance.American Political Science Review, 97(2):233–243, 2003

    Liesbet Hooghe and Gary Marks. Unraveling the central state, but how? Types of multi- level governance.American Political Science Review, 97(2):233–243, 2003. doi: 10.1017/ S0003055403000649

  52. [52]

    Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

    Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. Collective constitutional AI: Aligning a language model with public input. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024. doi: 10.1145/3630106.3658979

  53. [53]

    Risks from Learned Optimization in Advanced Machine Learning Systems

    Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Risks from learned optimization in advanced machine learning systems. arXiv:1906.01820, 2019

  54. [54]

    The legitimacy of international law: A constitutionalist framework of analysis

    Mattias Kumm. The legitimacy of international law: A constitutionalist framework of analysis. European Journal of International Law, 15(5):907–931, 2004. doi: 10.1093/ejil/15.5.907

  55. [55]

    Princeton University Press, Princeton, NJ, 2013

    Hélène Landemore.Democratic Reason: Politics, Collective Intelligence, and the Rule of the Many. Princeton University Press, Princeton, NJ, 2013

  56. [56]

    Zachary C. Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue, 16(3):31–57, 2018. doi: 10.1145/ 3236386.3241340

  57. [57]

    Oxford University Press, Oxford, 2011

    Christian List and Philip Pettit.Group Agency: The Possibility, Design, and Status of Corporate Agents. Oxford University Press, Oxford, 2011

  58. [58]

    Routledge, London, 1996

    Giandomenico Majone.Regulating Europe. Routledge, London, 1996

  59. [59]

    From the positive to the regulatory state: Causes and consequences of changes in the mode of governance.Journal of Public Policy, 17(2):139–167, 1997

    Giandomenico Majone. From the positive to the regulatory state: Causes and consequences of changes in the mode of governance.Journal of Public Policy, 17(2):139–167, 1997. doi: 10.1017/S0143814X00003524

  60. [60]

    Terry M. Moe. The new economics of organization.American Journal of Political Science, 28 (4):739–777, 1984. doi: 10.2307/2110997

  61. [61]

    Müller and Nick Bostrom

    Vincent C. Müller and Nick Bostrom. Future progress in artificial intelligence: A survey of expert opinion. InFundamental Issues of Artificial Intelligence, pages 555–572. Springer, 2016

  62. [62]

    The alignment problem from a deep learning perspective

    Richard Ngo, Lawrence Chan, and Sören Mindermann. The alignment problem from a deep learning perspective. arXiv:2209.00626, 2022. 19

  63. [63]

    Bloomsbury, London, 2020

    Toby Ord.The Precipice: Existential Risk and the Future of Humanity. Bloomsbury, London, 2020

  64. [64]

    Cambridge University Press, Cambridge, 1990

    Elinor Ostrom.Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press, Cambridge, 1990

  65. [65]

    Polycentric systems for coping with collective action and global environmental change.Global Environmental Change, 20(4):550–557, 2010

    Elinor Ostrom. Polycentric systems for coping with collective action and global environmental change.Global Environmental Change, 20(4):550–557, 2010. doi: 10.1016/j.gloenvcha.2010. 07.004

  66. [66]

    Routledge, London, 2009

    Fabienne Peter.Democratic Legitimacy. Routledge, London, 2009

  67. [67]

    Clarendon Press, Oxford, 1997

    Philip Pettit.Republicanism: A Theory of Freedom and Government. Clarendon Press, Oxford, 1997

  68. [68]

    Cambridge University Press, Cambridge, 2012

    Philip Pettit.On the People’s Terms: A Republican Theory and Model of Democracy. Cambridge University Press, Cambridge, 2012

  69. [69]

    Columbia University Press, New York, 1993

    John Rawls.Political Liberalism. Columbia University Press, New York, 1993

  70. [70]

    The idea of public reason revisited.University of Chicago Law Review, 64(3): 765–807, 1997

    John Rawls. The idea of public reason revisited.University of Chicago Law Review, 64(3): 765–807, 1997

  71. [71]

    Clarendon Press, Oxford, 1986

    Joseph Raz.The Morality of Freedom. Clarendon Press, Oxford, 1986

  72. [72]

    Algorithmic impact assessments: A practical framework for public agency accountability

    Dillon Reisman, Jason Schultz, Kate Crawford, and Meredith Whittaker. Algorithmic impact assessments: A practical framework for public agency accountability. Technical report, AI Now Institute, 2018

  73. [73]

    Global AI governance: Barriers and pathways forward.International Affairs, 100(3): 1275–1286, 2024

    Huw Roberts, Josh Cowls, Federico Casolari, Jessica Morley, Mariarosaria Taddeo, and Luciano Floridi. Global AI governance: Barriers and pathways forward.International Affairs, 100(3): 1275–1286, 2024. doi: 10.1093/ia/iiae073

  74. [74]

    Rossiter.Constitutional Dictatorship: Crisis Government in the Modern Democracies

    Clinton L. Rossiter.Constitutional Dictatorship: Crisis Government in the Modern Democracies. Princeton University Press, Princeton, NJ, 1948

  75. [75]

    Viking, New York, 2019

    Stuart Russell.Human Compatible: Artificial Intelligence and the Problem of Control. Viking, New York, 2019

  76. [76]

    Scharpf.Governing in Europe: Effective and Democratic?Oxford University Press, Oxford, 1999

    Fritz W. Scharpf.Governing in Europe: Effective and Democratic?Oxford University Press, Oxford, 1999

  77. [77]

    Scott.Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed

    James C. Scott.Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Yale University Press, New Haven, CT, 1998

  78. [78]

    Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and characterizing reward hacking. InAdvances in Neural Information Processing Systems 35, 2022

  79. [79]

    Agent foundations for aligning machine intelligence with human interests: A technical research agenda

    Nate Soares and Benja Fallenstein. Agent foundations for aligning machine intelligence with human interests: A technical research agenda. In Vincent C. Müller, editor,Fundamental Issues of Artificial Intelligence, pages 103–125. Springer, 2017

  80. [80]

    Corrigibility

    Nate Soares, Benja Fallenstein, Eliezer Yudkowsky, and Stuart Armstrong. Corrigibility. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

Showing first 80 references.