Reconstructing temporal multi-relational firm networks at scale using large language models. The case of the semiconductor industry
Pith reviewed 2026-05-19 19:32 UTC · model grok-4.3
The pith
Large language models can extract supply-chain, partnership and ownership links from public webpages to build a temporal network of over 1,300 semiconductor firms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that scanning 170 million semiconductor firm webpages with large language models identifies and classifies supply-chain, partnership, and ownership links, yielding a temporal network of over 1,300 linked firms. This network overlaps with and complements proprietary databases, remains consistent with aggregate economic data, and records a temporary 9% decline in edges during the 2022 chip shortage together with rising centrality for AI supply-chain firms such as NVIDIA and geographic realignment of relations amid geopolitical shifts.
What carries the argument
The LLM-based pipeline that reads firm webpages and classifies relational statements into supply-chain, partnership, and ownership categories to assemble the temporal multi-relational graph.
If this is right
- The network records a temporary 9% decline in edges during the 2022 chip shortage.
- Centrality rises rapidly for AI supply-chain bottleneck firms such as NVIDIA.
- Geographic patterns of interfirm relations shift in response to geopolitical turbulence.
- The framework supplies up-to-date maps for assessing resilience in the semiconductor sector.
Where Pith is reading between the lines
- The same webpage-scanning approach could be applied to map firm relations in other sectors where public data is plentiful but proprietary records lag.
- Combining the extracted temporal networks with economic flow models might permit simulations of how disruptions at individual firms spread through supply chains.
- Repeated updates from web sources could support continuous policy tracking of strategic industries beyond one-time studies.
Load-bearing premise
Publicly available firm webpages contain sufficiently complete and unbiased information on supply-chain, partnership, and ownership relations, and the LLM can classify those links without systematic errors that would distort network structure or temporal dynamics.
What would settle it
A side-by-side comparison showing that the extracted network's link counts, centrality rankings, or recorded 9% edge decline during 2022 diverge substantially from a comprehensive proprietary database or independent transaction-volume statistics for the same firms and period.
Figures
read the original abstract
The semiconductor industry is foundational to modern technology, yet its complex global multi-relational firm network remains poorly understood, posing challenges to scientists, firms, and policymakers. Traditional analysis relies on proprietary databases that are often expensive, incomplete, and slowly updated, limiting their ability to capture rapidly evolving dependencies. Here, we demonstrate that a novel, generalizable methodology combining Large Language Models (LLMs) with open web data can reconstruct this network and its structural dynamics at scale. We identify and classify supply-chain, partnership, and ownership links from 170 million semiconductor firm webpages, yielding a temporal network of over 1,300 linked firms. We validate link-extraction quality (Precision: 0.884; F1-score: 0.784), network overlap and complementarity with a proprietary database, and consistency with aggregate economic data. Our network reveals a temporary 9% decline in edges during the 2022 chip shortage, rapid increases in the centrality of AI supply-chain bottleneck firms such as NVIDIA, and geographic realignment of interfirm relations amid geopolitical turbulence. This generalizable framework overcomes barriers to transparency and provides essential, up-to-date maps for assessing resilience and informing policy across strategically relevant sectors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a novel LLM-based methodology applied to 170 million open semiconductor firm webpages can reconstruct a temporal multi-relational network of over 1,300 firms, extracting supply-chain, partnership and ownership links at scale. It reports validation via aggregate link-extraction metrics (precision 0.884, F1 0.784), overlap/complementarity with a proprietary database, and consistency with aggregate economic statistics, then uses the resulting network to document a 9% edge decline during the 2022 chip shortage, rising centrality of AI-bottleneck firms such as NVIDIA, and geographic realignment amid geopolitical turbulence.
Significance. If the central extraction and temporal claims hold after bias checks, the work would provide a scalable, low-cost alternative to proprietary databases for mapping strategic industry networks. This could enable timely analysis of supply-chain resilience and structural change in critical sectors, with clear relevance to policy and economic network science.
major comments (1)
- [Validation and results sections] Validation and results sections: the reported aggregate precision (0.884) and F1 (0.784) are not shown to be uniform across firm size, geography, relation type or time period. Without a stratified error analysis or temporal-slice overlap statistics against the proprietary database, it remains possible that differential web visibility or LLM classification errors (e.g., under-representation of smaller Asian suppliers) drive the observed 9% edge decline in 2022 and the reported rise in NVIDIA centrality rather than genuine network evolution.
minor comments (2)
- [Abstract] The abstract states 'network overlap and complementarity with a proprietary database' without quantitative figures; adding these numbers (or directing readers to the relevant table/figure) would improve immediate clarity.
- [Methods] Methods description of the LLM prompting and classification pipeline would benefit from an explicit statement of how temporal information is extracted and dated from webpages.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which helps strengthen the validation of our LLM-based network reconstruction methodology. We address the major comment on the uniformity of validation metrics below.
read point-by-point responses
-
Referee: Validation and results sections: the reported aggregate precision (0.884) and F1 (0.784) are not shown to be uniform across firm size, geography, relation type or time period. Without a stratified error analysis or temporal-slice overlap statistics against the proprietary database, it remains possible that differential web visibility or LLM classification errors (e.g., under-representation of smaller Asian suppliers) drive the observed 9% edge decline in 2022 and the reported rise in NVIDIA centrality rather than genuine network evolution.
Authors: We acknowledge that the reported validation metrics are aggregate and do not include explicit stratified breakdowns by firm size, geography, relation type, or time period. Our current validation relies on overall precision and F1, combined with overlap/complementarity checks against a proprietary database and consistency with aggregate economic statistics. While these provide support for the network's reliability, we agree that a stratified analysis would more rigorously rule out systematic biases such as differential web visibility for smaller Asian suppliers. In the revised version, we will add a stratified error analysis (including precision/F1 by region and firm size where metadata permits) and temporal-slice overlap statistics with the proprietary database. This will help confirm that the 9% edge decline during the 2022 chip shortage and the rise in NVIDIA centrality reflect genuine structural changes, consistent with independent industry reports on supply disruptions and AI bottlenecks. revision: yes
Circularity Check
No circularity: empirical extraction validated against external data
full rationale
The paper describes an LLM-based pipeline to extract multi-relational links from 170 million firm webpages, producing a temporal network of 1,300+ firms. Validation consists of precision/F1 metrics on sampled annotations, overlap checks with an independent proprietary database, and consistency tests against aggregate economic statistics. No equations, fitted parameters, or first-principles derivations are presented as predictions; the central claims rest on external benchmarks rather than internal re-use of the extracted network itself. No self-citation chains or ansatzes are invoked to justify the core reconstruction step.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can accurately extract and classify supply-chain, partnership, and ownership relations from firm webpages
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We identify and classify supply-chain, partnership, and ownership links from 170 million semiconductor firm webpages, yielding a temporal network of over 1,300 linked firms. We validate link-extraction quality (Precision: 0.884; F1-score: 0.784)
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The observed overlap of 380 directed edges is strongly significant relative to the configuration model null (z=23.14, p<10^{-3})
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anton Pichler, Christian Diem, Alexandra Brintrup, Fran¸ cois Lafond, Glenn Mager- man, Gert Buiten, Thomas Y. Choi, Vasco M. Carvalho, J. Doyne Farmer, and Stefan Thurner. Building an alliance to map global supply networks.Science, 382(6668):270–272, 2023
work page 2023
-
[2]
The network origins of aggregate fluctuations.Econometrica, 80(5):1977–2016, 2012
Daron Acemoglu, Vasco M Carvalho, Asuman Ozdaglar, and Alireza Tahbaz-Salehi. The network origins of aggregate fluctuations.Econometrica, 80(5):1977–2016, 2012
work page 1977
-
[3]
Giulio Cimini, Tiziano Squartini, Diego Garlaschelli, and Andrea Gabrielli. Sys- temic risk analysis on reconstructed economic and financial networks.Scientific reports, 5(1):15758, 2015. 22
work page 2015
-
[4]
Peter Klimek and Sebastian Poledna. Quantifying economic resilience from in- put–output susceptibility to improve predictions of economic growth and recovery. Nature Communications, 10(1):1677, 2019
work page 2019
-
[5]
Hiroyasu Inoue and Yasuyuki Todo. Firm-level propagation of shocks through supply-chain networks.Nature Sustainability, 2(9):841–847, 2019
work page 2019
-
[6]
The network origins of aggregate fluctuations.Econometrica, 80, 10 2011
Daron Acemoglu, Vasco Carvalho, Asuman Ozdaglar, and Alireza Tahbaz-Salehi. The network origins of aggregate fluctuations.Econometrica, 80, 10 2011
work page 2011
-
[7]
Vasco M Carvalho, Makoto Nirei, Yukiko U Saito, and Alireza Tahbaz-Salehi. Sup- ply chain disruptions: Evidence from the great east Japan earthquake*.The Quar- terly Journal of Economics, 136(2):1255–1321, 12 2020
work page 2020
-
[8]
Christian Diem, Andr´ as Borsos, Tobias Reisch, J´ anos Kert´ esz, and Stefan Thurner. Estimating the loss of economic predictability from aggregating firm-level produc- tion networks.PNAS Nexus, 3(3):pgae064, 02 2024
work page 2024
-
[9]
Abhijit Chakraborty, Tobias Reisch, Christian Diem, Pablo Astudillo-Est´ evez, and Stefan Thurner. Inequality in economic shock exposures across the global firm-level supply network.Nature Communications, 15(1):3348, 2024
work page 2024
-
[10]
Climate stress test of the global supply chain network: the case of river floods
Georgios Papadopoulos, Javier Ojea Ferreiro, and Roberto Panzica. Climate stress test of the global supply chain network: the case of river floods. 2025
work page 2025
-
[11]
Quantifying firm-level economic systemic risk from nation-wide supply networks
Christian Diem, Andr´ as Borsos, Tobias Reisch, J´ anos Kert´ esz, and Stefan Thurner. Quantifying firm-level economic systemic risk from nation-wide supply networks. Scientific reports, 12(1):7719, 2022
work page 2022
-
[12]
Anna Mancini, Bal´ azs Lengyel, Riccardo Di Clemente, and Giulio Cimini. Evolution and determinants of firm-level systemic risk in local production networks.arXiv preprint arXiv:2506.21426, 2025
-
[13]
Christian Diem, William Schueller, Melanie Gerschberger, Johannes Stangl, Beate Conrady, Markus Gerschberger, and Stefan Thurner. Supply network stress-testing of food security on the establishment-level.International Journal of Production Research, 63(9):3259–3283, 2025
work page 2025
-
[14]
James McNerney, Charles Savoie, Francesco Caravelli, Vasco M. Carvalho, and J. Doyne Farmer. How production networks amplify economic growth.Proceedings of the National Academy of Sciences, 119(1):e2106031118, 2022
work page 2022
-
[15]
The network of global corporate control.PloS one, 6(10):e25995, 2011
Stefania Vitali, James B Glattfelder, and Stefano Battiston. The network of global corporate control.PloS one, 6(10):e25995, 2011
work page 2011
-
[16]
The power of corporate control in the global ownership network.Plos one, 15(8):e0237862, 2020
Takayuki Mizuno, Shohei Doi, and Shuhei Kurizaki. The power of corporate control in the global ownership network.Plos one, 15(8):e0237862, 2020. 23
work page 2020
-
[17]
Johannes Dahlke, Mathias Beck, Jan Kinne, David Lenz, Robert Dehghan, Martin W¨ orter, and Bernd Ebersberger. Epidemic effects in the diffusion of emerging digital technologies: evidence from artificial intelligence adoption.Research Policy, 53(2):104917, 2024
work page 2024
-
[18]
Kevin Zhu, Shutao Dong, Sean Xin Xu, and Kenneth L Kraemer. Innovation diffu- sion in global contexts: determinants of post-adoption digital transformation of Eu- ropean companies.European journal of information systems, 15(6):601–616, 2006
work page 2006
-
[19]
Nvidia’s$100bn bet on OpenAI raises more questions than it answers.The Economist, September 2025
The Economist. Nvidia’s$100bn bet on OpenAI raises more questions than it answers.The Economist, September 2025. Business section. Published September 22, 2025. Accessed 2026-01-09
work page 2025
-
[20]
Peter Klimek, Markus Gerschberger, Christopher Schwarz, Tiberiu-Alexandru Cioban, Agnes K¨ ugler, Elma Dervic, Georg Heiler, Hernan Picatto, Klaus Friesen- bichler, and Lukas Schmoigl. Mapping of the global semiconductor supply chain: Embedding Austria in the global semiconductor inter-firm network. Policy brief, Supply Chain Intelligence Institute Austri...
work page 2024
-
[21]
OECD. Mapping the semiconductor value chain: Working towards identifying de- pendencies and vulnerabilities. Technical Report 182, OECD Publishing, Paris, June 2025
work page 2025
-
[22]
Firm-level production networks: what do we (really) know.INET Oxford Working Paper, 2023, 2023
Andrea Bacilieri, Andr´ as Borsos, Pablo Astudillo-Estevez, and Fran¸ cois Lafond. Firm-level production networks: what do we (really) know.INET Oxford Working Paper, 2023, 2023
work page 2023
-
[23]
Erik Dietzenbacher, Bart Los, Robert Stehrer, Marcel Timmer, and Gaaitzen de Vries. The construction of world input–output tables in the wiod project.Eco- nomic Systems Research, 25(1):71–98, 2013
work page 2013
-
[24]
The Belgian production network 2002-2012
Emmanuel Dhyne, Glenn Magerman, and Stela Rub´ ınov´ a. The Belgian production network 2002-2012. Technical report, NBB Working Paper, 2015
work page 2002
-
[25]
Luca Mungo, Fran¸ cois Lafond, Pablo Astudillo-Est´ evez, and J Doyne Farmer. Re- constructing production networks using machine learning.Journal of Economic Dynamics and Control, 148:104607, 2023
work page 2023
-
[26]
Bart L MacCarthy, Wafaa AH Ahmed, and Guven Demirel. Mapping the supply chain: Why, what and how?International Journal of Production Economics, 250:108688, 2022
work page 2022
-
[27]
Xiling Wu, Caihua Zhang, and Wei Du. An analysis on the crisis of “chips shortage” in automobile industry——based on the double influence of COVID-19 and trade friction. InJournal of Physics: Conference Series, volume 1971, page 012100. IOP Publishing, 2021. 24
work page 1971
-
[28]
Wassen Mohammad, Adel Elomri, and Laoucine Kerbache. The global semi- conductor chip shortage: Causes, implications, and potential remedies.IFAC- PapersOnLine, 55(10):476–483, 2022
work page 2022
-
[29]
Lukasz Bednarski, Samuel Roscoe, Constantin Blome, and Martin C Schleper. Geopolitical disruptions in global supply chains: a state-of-the-art literature re- view.Production planning & control, 36(4):536–562, 2025
work page 2025
-
[30]
Recon- structing supply networks.Journal of Physics: Complexity, 5(1):012001, 2024
Luca Mungo, Alexandra Brintrup, Diego Garlaschelli, and Fran¸ cois Lafond. Recon- structing supply networks.Journal of Physics: Complexity, 5(1):012001, 2024
work page 2024
-
[31]
Leonardo Niccol` o Ialongo, Camille De Valk, Emiliano Marchese, Fabian Jansen, Hicham Zmarrou, Tiziano Squartini, and Diego Garlaschelli. Reconstructing firm- level interactions in the dutch input–output network from production constraints. Scientific reports, 12(1):11847, 2022
work page 2022
-
[32]
P Wichmann, A Brintrup, S Baker, P Woodall, and D McFarlane. Extracting supply chain maps from news articles using deep neural networks.International Journal of Production Research, 58(17):5320–5336, 2020
work page 2020
-
[33]
Sara AlMahri, Liming Xu, and Alexandra Brintrup. Enhancing supply chain visibility with knowledge graphs and large language models.arXiv preprint arXiv:2408.07705, 2024
-
[34]
Ilya Jackson, Maria Jes´ us Sa´ enz, Dmitry Ivanov, and Benedict Jun Ma. Supply chain mapping through retrieval-augmented generation: applications to the elec- tronics industry.Journal of the Operational Research Society, pages 1–21, 2025
work page 2025
-
[35]
Yoshi Fujiwara and Hideaki Aoyama. Large-scale structure of a nation-wide pro- duction network.The European Physical Journal B, 77(4):565–580, 2010
work page 2010
-
[36]
Topological robustness of the global automotive industry.Logistics Research, 9(1):1, 2016
Alexandra Brintrup, Anna Ledwoch, and Jose Barros. Topological robustness of the global automotive industry.Logistics Research, 9(1):1, 2016
work page 2016
-
[37]
Geoeconomics.Annual Review of Eco- nomics, 17, 2025
Cathrin Mohr and Christoph Trebesch. Geoeconomics.Annual Review of Eco- nomics, 17, 2025
work page 2025
-
[38]
Benjamin BM Shao, Zhan Michael Shi, Thomas Y Choi, and Sangho Chae. A data-analytics approach to identifying hidden critical suppliers in supply networks: Development of nexus supplier index.Decision Support Systems, 114:37–48, 2018
work page 2018
-
[39]
Supply chain network rewiring dynamics at the firm-level.arXiv preprint arXiv:2503.20594, 2025
Tobias Reisch, Andr´ as Borsos, and Stefan Thurner. Supply chain network rewiring dynamics at the firm-level.arXiv preprint arXiv:2503.20594, 2025
-
[40]
Orbis database.https://www.bvdinfo.com/en-gb/our-produ cts/data/orbis
Bureau van Dijk. Orbis database.https://www.bvdinfo.com/en-gb/our-produ cts/data/orbis. Accessed 2025-11-10
work page 2025
-
[41]
Eto chip explorer.https://github.com/georgetown-cset/ eto-chip-explorer
Georgetown CSET. Eto chip explorer.https://github.com/georgetown-cset/ eto-chip-explorer. Accessed 2025-11-10. 25
work page 2025
-
[42]
Semiconductor materials and equipment.https://abachy.com/
Abachy.com. Semiconductor materials and equipment.https://abachy.com/. Accessed 2025-11-10
work page 2025
-
[43]
Common crawl dataset.https://commoncrawl.org/
Common Crawl Foundation. Common crawl dataset.https://commoncrawl.org/. Accessed 2025-11-10
work page 2025
-
[44]
Hernan Picatto, Georg Heiler, and Peter Klimek. Cost-effective big data orchestra- tion using dagster: A multi-platform approach.arXiv preprint arXiv: 2408.11635, 2024
-
[45]
S&P capital IQ database.https://www.capita liq.com
S&P Global Market Intelligence. S&P capital IQ database.https://www.capita liq.com. Accessed 2025-11-10
work page 2025
-
[46]
OECD. Inter-country input–output (ICIO) tables.https://www.oecd.org/sti/i nd/inter-country-input-output-tables.htm, 2023. Accessed 2025-11-10
work page 2023
-
[47]
Guillaume Gaulier and Soledad Zignago. Baci: International trade database at the product-level (the 2020 version).https://www.cepii.fr/CEPII/en/bdd_modele /presentation.asp?id=37, 2020. Accessed 2025-11-10. 26 Supplementary Information S1 Consistency Filtering and Robustness Analysis To ensure the temporal reliability of domains used in our analysis, we c...
work page 2020
-
[48]
The activity string ended with at least three consecutive 1’s (111), ensuring reten- tion of domains consistently active in the later years
-
[49]
The total number of 1’s (active years) exceeded four
-
[50]
The activity string contained at least one sequence of three or more consecutive 1’s
-
[51]
Applying these criteria yielded a core set of 2015 temporally consistent domains
The activity string did not contain any sequence of three or more consecutive 0’s, except in cases where the pattern ended in111. Applying these criteria yielded a core set of 2015 temporally consistent domains. These domains were used in the main analysis. S1.1 Robustness Check Using a Broader Domain Set To assess the robustness of our results, we repeat...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.