pith. sign in

arxiv: 2508.00932 · v2 · submitted 2025-07-30 · 💻 cs.CY

How Sovereign Is Sovereign Compute? A Review of 775 Non-U.S. Data Centers

Pith reviewed 2026-05-19 01:39 UTC · model grok-4.3

classification 💻 cs.CY
keywords data centersdigital sovereigntyAI governanceoperator nationalityinvestment valuecompute capacitynon-U.S. projectsforeign influence
0
0 comments X

The pith

U.S. companies operate 48 percent of non-U.S. data centers by investment value, meaning many could be subject to U.S. legal authority.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reviews 775 data center projects located outside the United States and measures the share run by U.S. companies. When projects are weighted by their investment amounts as a stand-in for size, U.S. operators account for 48 percent. The authors suggest this gives the United States a way to govern AI computing resources that are already deployed abroad, in addition to export controls on hardware. For other nations, it means that local data center construction may not deliver full control over computing if the operators are foreign. The work includes a public dataset with details on each project to encourage more research.

Core claim

The central finding is that U.S. companies operate 48% of all non-U.S. data center projects in the dataset when weighted by investment value, used here as a proxy for compute capacity. This is presented as an initial estimate based on public data. The authors conclude that data center operators offer a lever for internationally governing AI that complements traditional export controls, since operators can be used to regulate computing resources already deployed in non-U.S. data centers. For other countries, the results show that building data centers locally does not guarantee digital sovereignty if those facilities are run by foreign entities.

What carries the argument

Analysis of operator national affiliation across a dataset of 775 non-U.S. data center projects, weighted by investment value to estimate control over compute capacity.

If this is right

  • Data center operators provide a complementary tool to export controls for U.S. governance of AI.
  • Local data center construction does not ensure sovereignty when foreign operators are involved.
  • The dataset supports research on strategic motivations, operational challenges, and entity engagements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Policymakers in non-U.S. countries may need to screen operators for nationality when approving new data centers.
  • This approach could reveal similar patterns of foreign control in other tech infrastructures.
  • Extending the analysis to actual usage data might refine the estimates of effective capacity.

Load-bearing premise

That investment value accurately proxies for compute capacity and that the nationality of the operator determines which legal authorities can regulate the data center.

What would settle it

Verification through on-site inspections or detailed capacity reports for a sample of the 775 projects that shows U.S. operators control a substantially different share than 48 percent when measured by actual hardware or power consumption.

Figures

Figures reproduced from arXiv: 2508.00932 by Aris Richardson, Casey Price, Haley Yi, Mauricio Baker, Michelle Nie, Ruben Weijers, Simon Wisdom, Steven Veld.

Figure 1
Figure 1. Figure 1: The number of data center projects in our data [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Donut chart showing the distribution of operators of data centers by country. Comparing U.S. and Chinese Operators We compared U.S. operators to Chinese operators geo￾graphically, weighted by investment value ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Map showing data center investments (existing and planned) within our dataset by country, not including the U.S. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: A comparison of the frequency that companies [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Previous literature has proposed that the companies operating data centers enforce government regulations on AI companies. Using a new dataset of 775 non-U.S. data center projects, this paper estimates how often data centers could be subject to foreign legal authorities due to the nationality of the data center operators. We find that U.S. companies operate 48% of all non-U.S. data center projects in our dataset when weighted by investment value - a proxy for compute capacity. This is an approximation based on public data and should be interpreted as an initial estimate. For the United States, our findings suggest that data center operators offer a lever for internationally governing AI that complements traditional export controls, since operators can be used to regulate computing resources already deployed in non-U.S. data centers. For other countries, our results show that building data centers locally does not guarantee digital sovereignty if those facilities are run by foreign entities. To support future research, we release our dataset, which documents over 20 variables relating to each data center, including the year it was announced, the investment value, and its operator's national affiliation. The dataset also includes over 1,000 quotes describing these data centers' strategic motivations, operational challenges, and engagement with U.S. and Chinese entities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript assembles a public-source dataset of 775 non-U.S. data center projects and reports that U.S. companies operate 48% of them when weighted by announced investment value, treated as a proxy for compute capacity. The authors interpret this as evidence that data centers located outside the United States may remain subject to U.S. legal authorities through operator nationality, thereby offering a regulatory lever complementary to export controls. They release the full dataset containing over 20 variables per project and more than 1,000 extracted quotes on motivations and challenges.

Significance. If the weighting and classification procedures prove robust, the 48% estimate would supply a concrete empirical anchor for ongoing debates on digital sovereignty and extraterritorial AI governance. The public release of the dataset, including detailed variables and primary-source quotes, constitutes a clear strength that supports reproducibility and extension by other researchers.

major comments (2)
  1. [§3] §3 (Dataset Construction and Weighting): The 48% U.S.-operator share is obtained by weighting projects by announced investment value as a proxy for compute capacity. No sensitivity analysis, correlation check against power capacity or GPU counts, or discussion of how land/construction costs vary by jurisdiction is provided; this assumption is load-bearing for the central claim.
  2. [§4] §4 (Results): Rules for assigning operator nationality (headquarters location, ultimate parent, or subsidiary status) and treatment of joint ventures or missing ownership data are not specified in sufficient detail to allow replication or assessment of classification error.
minor comments (2)
  1. [Abstract] Abstract: The statement that the 48% figure is 'an approximation' appears only at the end; moving an explicit qualifier earlier would better calibrate reader expectations.
  2. [Dataset release] Dataset documentation: A summary table listing the 20+ variables and their coverage rates would improve usability of the released data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for improving the transparency and robustness of our analysis. We respond to each major comment below and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: [§3] §3 (Dataset Construction and Weighting): The 48% U.S.-operator share is obtained by weighting projects by announced investment value as a proxy for compute capacity. No sensitivity analysis, correlation check against power capacity or GPU counts, or discussion of how land/construction costs vary by jurisdiction is provided; this assumption is load-bearing for the central claim.

    Authors: We agree that the choice of announced investment value as a proxy for compute capacity is a load-bearing assumption and that the manuscript would benefit from greater scrutiny of this choice. In the revised version we will add a dedicated sensitivity analysis subsection. This will test how the 48 % figure changes when projects are re-weighted by available power-capacity data for the subset of facilities where such figures are reported. We will also add a short discussion of cross-jurisdictional differences in land and construction costs and their possible effect on the investment-value proxy. Comprehensive GPU-count data, however, are not publicly disclosed for the large majority of projects; we will therefore note this data limitation explicitly rather than claim a full correlation check is feasible. revision: partial

  2. Referee: [§4] §4 (Results): Rules for assigning operator nationality (headquarters location, ultimate parent, or subsidiary status) and treatment of joint ventures or missing ownership data are not specified in sufficient detail to allow replication or assessment of classification error.

    Authors: We will expand the methods section with an explicit decision protocol for operator-nationality assignment. The protocol will state that nationality is determined by the headquarters country of the ultimate parent company, that subsidiaries are traced to that parent, and that joint ventures are coded according to the majority equity holder (or flagged as mixed when ownership is evenly split). Entries with insufficient ownership information will be flagged in the released dataset and excluded from the primary weighted calculation; a conservative “U.S. if any U.S. involvement” variant will be reported as a robustness check. An appendix will supply concrete examples of classification decisions to support replication and error assessment. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical aggregation from new dataset

full rationale

The paper constructs a novel dataset of 775 non-U.S. data center projects from public sources and computes the 48% U.S. operator share as a straightforward weighted sum of investment values. No equations, fitted parameters, or prior results are invoked to derive this figure; it is presented explicitly as an initial descriptive estimate. The analysis contains no self-citations that bear load on the central claim, no ansatzes smuggled through citations, and no renaming of known results. The derivation chain is self-contained as data collection followed by aggregation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on two domain assumptions about measurement and legal reach rather than on mathematical axioms or newly postulated entities.

axioms (2)
  • domain assumption Investment value is a valid proxy for compute capacity
    Used to weight the share of U.S.-operated projects.
  • domain assumption Nationality of the data center operator determines potential subjection to foreign legal authorities
    Underpins the claim that U.S. operators provide a regulatory lever.

pith-pipeline@v0.9.0 · 5774 in / 1326 out tokens · 39998 ms · 2026-05-19T01:39:49.996185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    https://www.cna.org/quick -looks/2023/China-national- security-laws-implications-beyond-borders.pdf

    China's National Security Laws: Implications Beyond Borders. https://www.cna.org/quick -looks/2023/China-national- security-laws-implications-beyond-borders.pdf. Accessed: 2025 - 04-30. Council of the European Union

  2. [2]

    https://www.consil- ium.eu- ropa.eu/uedocs/cms_Data/docs/pressdata/en/misc/95017.pdf

    Processing and protection of personal data subpoenaed by the Treasury Department from the US based operation centre of the Society for Worldwide Interbank Financial Telecommunication (SWIFT). https://www.consil- ium.eu- ropa.eu/uedocs/cms_Data/docs/pressdata/en/misc/95017.pdf. Ac- cessed: 2025-04-30. Covington

  3. [3]

    https://www.cov.com/en/news-and-insights/in- sights/2015/07/china-passes-new-national-security-law

    China Enacts New National Security Law. https://www.cov.com/en/news-and-insights/in- sights/2015/07/china-passes-new-national-security-law. Ac- cessed: 2025-04-30. Heim, L.; Fist, T.; Egan, J.; Huang, S.; Zekany, S.; Trager, R.; Os- borne, M.; and Zilberman, N

  4. [4]

    arXiv:2403.08501

    Governing Through the Cloud: The Intermediary Role of Compute Providers in AI Regulation. arXiv:2403.08501. Huawei

  5. [5]

    https://www.huawei.com/en/tech4all/stories/ict

    Bringing ICT Education and Employment Closer Together. https://www.huawei.com/en/tech4all/stories/ict. Ac- cessed: 2024-12-11. IMF

  6. [6]

    https://www.imf.org/en/Publications/WEO/weo -data- base/2023/April/groups-and-aggregates

    World Economic Outlook Database - Groups and Ag- gregates. https://www.imf.org/en/Publications/WEO/weo -data- base/2023/April/groups-and-aggregates. Accessed: 2025-04-30. Lalwani, Nikita

  7. [7]

    Compute South: The Uneven Possibilities of Compute -based AI Governance Around the Globe

    Compute North vs. Compute South: The Uneven Possibilities of Compute -based AI Governance Around the Globe. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Washington: AAAI Press. doi.org/10.1609/aies.v7i1.31683. Musin, Timur

  8. [8]

    Estimation of Global Public IaaS Market Concentration by Linda Index

    “Estimation of Global Public IaaS Market Concentration by Linda Index.” SHS Web of Conferences 114 (January): 01014. https://doi.org/10.1051/shsconf/202111401014. Pilz, K.; Heim, L

  9. [9]

    arXiv:2311.02651

    Compute at Scale: A Broad Investigation Into the Data Center Industry. arXiv:2311.02651. Pilz, K.; Sanders, J.; Rahman, R.; and Heim, L

  10. [10]

    F., Sanders, J., Rahman, R., and Heim, L

    Trends in AI Supercomputers. arXiv:2504.16026. Pohle, J.; Thiel, T

  11. [11]

    Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence

    Digital Sovereignty. Internet Policy Re- view 9(4). https://doi.org/10.14763/2020.4.1532. “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.”

  12. [12]

    November 1,

    Https://Www.Federalregister.Gov/Docu- ments/2023/11/01/2023-24283/Safe-Secure-and-Trustworthy-De- velopment-and-Use-of-Artificial-Intelligence. November 1,

  13. [13]

    The AI Continent Action Plan

    Hyperscale Data Center Count Hits 1,136; Average Size Increases; US Accounts for 54% of Total Capacity. https://www.srgresearch.com/articles/hyperscale -data- center-count-hits-1136-average-size-increases-us-accounts-for- 54-of-total-capacity. Accessed: 2025-05-02. “The AI Continent Action Plan.”

  14. [14]

    https://www.justice.gov/criminal/cloud-act-resources

    CLOUD Act Resources. https://www.justice.gov/criminal/cloud-act-resources. Westgarth, T.; Garson, M.; Crowley-Carbery, K.; Otway, A.; Brad- ley, J.; and Mökander, J. State of Compute Access 2024: How to Navigate the New Power Paradox. London: Tony Blair Institute for Global Change. Clancey, W. J. 1979b. Transfer of Rule-Based Expertise through a Tutorial ...