Recognition: unknown
Physical Foundation Models: Fixed hardware implementations of large-scale neural networks
Pith reviewed 2026-05-07 06:13 UTC · model grok-4.3
The pith
Physical media can directly implement trillion-parameter neural networks through their natural dynamics for major gains in efficiency and scale.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Foundation models are large general-purpose neural networks trained once and adapted to many tasks. Because the same model architecture is reused across applications, it becomes sensible to manufacture fixed hardware versions at the cadence of new model releases. Physical foundation models take this further by realizing the network directly in the physical design of the hardware so that inference occurs through the material's natural dynamics instead of digital simulation.
What carries the argument
Physical Foundation Models (PFMs): hardware in which the neural network is realized directly at the level of the physical design and operates via the hardware's natural physical dynamics, illustrated with calculations for 3D nanostructured glass.
Load-bearing premise
Physical media can be engineered and trained to carry out the precise high-dimensional computations of trillion-parameter networks despite noise, fabrication variation, and limited controllability.
What would settle it
A laboratory prototype of a physical neural network with 10^5 to 10^6 parameters that matches the task accuracy of a digital equivalent while using at least 10 times less energy per inference.
Figures
read the original abstract
Foundation models are deep neural networks (such as GPT-5, Gemini~3, and Opus~4) trained on large datasets that can perform diverse downstream tasks -- text and code generation, question answering, summarization, image classification, and so on. The philosophy of foundation models is to put effort into a single, large (${\sim}10^{12}$-parameter) general-purpose model that can be adapted to many downstream tasks with no or minimal additional training. We argue that the rise of foundation models presents an opportunity for hardware engineers: in contrast to when different models were used for different tasks, it now makes sense to build special-purpose, fixed hardware implementations of neural networks, manufactured and released at the roughly 1-year cadence of major new foundation-model versions. Beyond conventional digital-electronic inference hardware with read-only weight memory, we advocate a more radical re-thinking: hardware in which the neural network is realized directly at the level of the physical design and operates via the hardware's natural physical dynamics -- \textit{Physical Foundation Models} (PFMs). PFMs could enable orders-of-magnitude advantages in energy efficiency, speed, and parameter density. For ${\sim}10^{12}$-parameter models, this would both reduce the high energy burden of AI in datacenters and enable AI in edge devices that today are power-constrained to far smaller models. PFMs could also enable inference hardware for models much larger than current ones: $10^{15}$- or even $10^{18}$-parameter PFMs seem plausible by some measures. We present back-of-the-envelope calculations illustrating PFM scaling using an optical example -- a 3D nanostructured glass medium -- and discuss prospects in nanoelectronics and other physical platforms. We conclude with the major research challenges that must be resolved for trillion-parameter PFMs and beyond to become reality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Physical Foundation Models (PFMs): fixed hardware implementations of large foundation models (~10^12 parameters) realized directly in physical media (e.g., 3D nanostructured glass) that operate via the medium's natural physical dynamics rather than conventional digital electronics with read-only weights. It argues that the standardization around foundation models justifies special-purpose hardware at the ~1-year release cadence, and presents back-of-the-envelope scaling calculations suggesting orders-of-magnitude gains in energy efficiency, speed, and parameter density. These gains would reduce datacenter energy use, enable edge deployment of large models, and support even larger (10^15–10^18 parameter) models. The paper discusses prospects in nanoelectronics and other platforms while outlining major research challenges.
Significance. If the engineering assumptions hold, the proposal could substantially advance AI hardware by addressing the energy and scalability limits of current digital inference accelerators. The core idea of co-designing physical dynamics with foundation-model workloads is timely and extends prior analog/optical computing concepts to the specific regime of trillion-parameter models. Credit is due for framing the problem around the current foundation-model paradigm and for explicitly listing open challenges rather than overstating readiness.
major comments (2)
- [Abstract and scaling calculations] Abstract and scaling calculations: The orders-of-magnitude claims for energy, speed, and density rest on the unquantified assumption that physical media can realize the precise high-dimensional linear and nonlinear operations of 10^12-parameter networks with acceptable noise, fabrication variation, and a trainable procedure. No error analysis, bounds on effective precision (e.g., bits per weight), or estimates of how index fluctuations or absorption would degrade matrix multiplications and activations are supplied, leaving the central efficiency advantage unsupported.
- [Discussion of research challenges] Discussion of research challenges: The manuscript correctly flags major open problems but provides neither prototype data, simulation results, nor a quantitative feasibility study for the 3D-glass optical example. Without such grounding, the extrapolation to 10^15- or 10^18-parameter PFMs remains speculative and cannot yet be evaluated against the weakest assumption identified in the stress-test note.
minor comments (1)
- [Abstract] The abstract introduces 'Physical Foundation Models' without an explicit one-sentence definition or a brief contrast to conventional fixed-weight accelerators and existing analog computing approaches.
Simulated Author's Rebuttal
We thank the referee for their constructive review, the recognition of the proposal's timeliness, and the credit given for framing the work around the foundation-model paradigm while listing open challenges. We address the two major comments point by point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract and scaling calculations] Abstract and scaling calculations: The orders-of-magnitude claims for energy, speed, and density rest on the unquantified assumption that physical media can realize the precise high-dimensional linear and nonlinear operations of 10^12-parameter networks with acceptable noise, fabrication variation, and a trainable procedure. No error analysis, bounds on effective precision (e.g., bits per weight), or estimates of how index fluctuations or absorption would degrade matrix multiplications and activations are supplied, leaving the central efficiency advantage unsupported.
Authors: We agree that the scaling calculations are back-of-the-envelope estimates and that the manuscript does not supply a quantitative error analysis or precision bounds. The paper is framed as a conceptual proposal to motivate research rather than a detailed engineering feasibility study. In revision we will expand the scaling section and abstract to (1) state the key physical assumptions more explicitly, (2) reference existing literature on noise, index fluctuations, and absorption limits in 3D optical media, and (3) add a short qualitative discussion of how these effects could affect effective precision. We will also note that full quantitative bounds require device-level simulations that lie beyond the present scope. These changes will make the limitations of the efficiency claims clearer without overstating the current analysis. revision: partial
-
Referee: [Discussion of research challenges] Discussion of research challenges: The manuscript correctly flags major open problems but provides neither prototype data, simulation results, nor a quantitative feasibility study for the 3D-glass optical example. Without such grounding, the extrapolation to 10^15- or 10^18-parameter PFMs remains speculative and cannot yet be evaluated against the weakest assumption identified in the stress-test note.
Authors: We acknowledge that the absence of prototype data or simulations leaves the large-scale extrapolations speculative. Because the manuscript is a high-level proposal and no trillion-parameter physical systems of this type have been fabricated, we cannot supply such data. In revision we will strengthen the research-challenges section by (1) citing relevant smaller-scale experimental demonstrations in 3D optical computing and nanostructured media, (2) structuring the open problems into a clearer roadmap with suggested quantitative metrics for future work, and (3) adding an explicit statement that evaluating the weakest assumptions will require new device modeling and prototyping. These additions will improve grounding while preserving the honest assessment of current readiness. revision: partial
- Prototype data, simulation results, or a quantitative feasibility study for a 3D-glass optical PFM at the 10^12-parameter scale (or larger), as no such systems exist and generating them requires substantial new experimental and modeling research outside the scope of this conceptual paper.
Circularity Check
No circularity: scaling arguments are prospective proposals grounded in external physical principles
full rationale
The paper advances a forward-looking proposal for Physical Foundation Models via back-of-the-envelope scaling estimates for optical (3D nanostructured glass) and nanoelectronic platforms. These estimates draw on established physical quantities such as refractive-index modulation, degrees of freedom in 3D media, and energy scaling laws that are independent of the target PFM performance claims. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and no load-bearing premises rest on self-citations whose validity is presupposed by the present work. The manuscript explicitly frames the calculations as illustrative and lists open research challenges, keeping the derivation chain self-contained against external benchmarks rather than internally referential.
Axiom & Free-Parameter Ledger
free parameters (1)
- physical implementation efficiency ratios
axioms (1)
- domain assumption Physical systems can be fabricated and trained to realize the exact computations of large neural networks with sufficient precision and stability
Reference graph
Works this paper leans on
-
[1]
Noffsinger, M
J. Noffsinger, M. Goodpaster, M. Patel, H. Chang, P. Sachdeva, and A. Bhan, The cost of compute: A$7 trillion race to scale data centers.McKinsey Quarterly(2025)
2025
-
[2]
S. S. Girija, L. Arora, S. Kapoor, D. Pradhan, A. Raj, and A. Shetgaonkar, Optimizing LLMs for resource-constrained environments: A survey of model compression techniques. In2025 IEEE 49th Annual Computers, Software, and Appli- cations Conference (COMPSAC), 1657–1664 (2025)
2025
-
[3]
Accessed: 2026-01-15 (2022)
Groq, Inc.,Groq LPU Inference Engine. Accessed: 2026-01-15 (2022)
2026
-
[4]
D. Abts, J. Ross, J. Sparling, M. Wong-VanHaren, M. Baker, T. Hawkins, A. Bell, J. Thompson, T. Kahsai, G. Kimmell et al. Think fast: A tensor streaming processor (TSP) for accelerating deep learning workloads. In2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 145–158 (2020)
2020
-
[5]
Lie, Cerebras architecture deep dive: First look inside the hardware/software co-design for deep learning.IEEE Micro 43, 18–30 (2023)
S. Lie, Cerebras architecture deep dive: First look inside the hardware/software co-design for deep learning.IEEE Micro 43, 18–30 (2023)
2023
-
[6]
Accessed: 2026-04-25 (2024)
Taalas, Inc.,Hardcore models: Direct-to-silicon AI inference(https://taalas.com). Accessed: 2026-04-25 (2024)
2026
-
[7]
Vaswani, N
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need.Advances in Neural Information Processing Systems30(2017)
2017
- [8]
-
[9]
H. Graf, L. Jackel, and W. Hubbard, VLSI implementation of a neural network model.Computer21, 41–49 (1988)
1988
-
[10]
Scaling Laws for Neural Language Models
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, Scaling laws for neural language models.arXiv:2001.08361(2020)
work page internal anchor Pith review arXiv 2001
-
[11]
Chowdhery, S
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann et al. PaLM: Scaling Language Modeling with Pathways.Journal of Machine Learning Research24, 1–113 (2023)
2023
-
[12]
A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan et al. The Llama 3 Herd of Models.arXiv:2407.21783(2024)
work page internal anchor Pith review arXiv 2024
-
[13]
A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan et al. DeepSeek-V3 Technical Report.arXiv:2412.19437(2024)
work page internal anchor Pith review arXiv 2024
-
[14]
Accessed: 2025-08-25 (2025)
Meta AI,The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation(Meta AI Blog,https: //ai.meta.com/blog/llama-4-multimodal-intelligence/). Accessed: 2025-08-25 (2025)
2025
-
[15]
Deep Learning Scaling is Predictable, Empirically
J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Malik, M. Austin, S. Sengupta, and Y. Yang, Deep Learning Scaling is Predictable, Empirically.arXiv preprint arXiv:1712.00409(2017)
work page internal anchor Pith review arXiv 2017
-
[16]
Training Compute-Optimal Large Language Models
J. Hoffmann, S. Borgeaud, A. Mensch, E. Rutherford, K. Millican, G. van den Driessche, J.-B. Lespiau, and et al., Training Compute-Optimal Large Language Models.arXiv preprint arXiv:2203.15556(2022)
work page internal anchor Pith review arXiv 2022
-
[17]
Bahri, E
Y. Bahri, E. Dyer, J. Kaplan, J. Lee, and U. Sharma, Explaining Neural Scaling Laws.Proceedings of the National Academy of Sciences121, e2311878121 (2024)
2024
-
[18]
LeCun, L
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition.Proceedings of the IEEE86, 2278–2324 (1998). 19
1998
-
[19]
W. Jung, H. Kim, D.-B. Kim, T.-H. Kim, N. Lee, D. Shin, M. Kim, Y. Rho, H.-J. Lee, Y. Hyun et al. A 280-Layer 1Tb 4b/cell 3D-NAND Flash Memory with a 28.5 Gb/mm2 Areal Density and a 3.2 GB/s High-Speed IO Rate. In2024 IEEE International Solid-State Circuits Conference (ISSCC)Vol. 67, 236–237 (2024)
2024
-
[20]
Horowitz, Computing’s energy problem (and what we can do about it)
M. Horowitz, Computing’s energy problem (and what we can do about it). In2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 10–14 (2014)
2014
-
[21]
Levy,Some Arguments Against Strong Scaling(AI Alignment Forum,https://www.alignmentforum.org/posts/ DvCLEkr9pXLnWikB8/some-arguments-against-strong-scaling)
B. Levy,Some Arguments Against Strong Scaling(AI Alignment Forum,https://www.alignmentforum.org/posts/ DvCLEkr9pXLnWikB8/some-arguments-against-strong-scaling). Accessed: 2025-12-26 (2023)
2025
-
[22]
Marcus,Breaking: OpenAI’s efforts at pure scaling have hit a wall(Substack:Marcus on AI,https://garymarcus
G. Marcus,Breaking: OpenAI’s efforts at pure scaling have hit a wall(Substack:Marcus on AI,https://garymarcus. substack.com/p/breaking-openais-efforts-at-pure). Accessed: 2025-12-26 (2025)
2025
-
[23]
Beniaguev, I
D. Beniaguev, I. Segev, and M. London, Single cortical neurons as deep artificial neural networks.Neuron109, 2727–2739 (2021)
2021
-
[24]
Ielmini and H.-S
D. Ielmini and H.-S. P. Wong, In-memory computing with resistive switching devices.Nature electronics1, 333–343 (2018)
2018
-
[25]
Verma, H
N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay, L.-Y. Chen, B. Zhang, and P. Deaville, In-memory computing: Advances and prospects.IEEE Solid-State Circuits Magazine11, 43–55 (2019)
2019
- [26]
-
[27]
J. B¨ uchel, I. Chalas, G. Acampa, A. Chen, O. Fagbohungbe, S. Tsai, K. E. Maghraoui, M. L. Gallo, A. Rahimi, and A. Sebastian, Analog Foundation Models.arXiv:2505.09663(2025)
-
[28]
Ambrogio, P
S. Ambrogio, P. Narayanan, A. Okazaki, A. Fasoli, C. Mackin, K. Hosokawa, A. Nomura, T. Yasuda, A. Chen, A. Friz et al. An analog-AI chip for energy-efficient speech recognition and transcription.Nature620, 768–775 (2023)
2023
-
[29]
Haensch, A
W. Haensch, A. Raghunathan, K. Roy, B. Chakrabarti, C. M. Phatak, C. Wang, and S. Guha, Compute in-memory with non-volatile elements for neural networks: A review from a co-design perspective.Advanced Materials35, 2204944 (2023)
2023
-
[30]
On the Opportunities and Risks of Foundation Models
R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al. On the Opportunities and Risks of Foundation Models.arXiv:2108.07258(2021)
work page internal anchor Pith review arXiv 2021
-
[31]
Y. Chen, G. Yin, Z. Tan, M. Lee, Z. Yang, Y. Liu, H. Yang, K. Ma, and X. Li, YOLoC: Deploy large-scale neural network by ROM-based computing-in-memory using residual branch on a chip. InProceedings of the 59th ACM/IEEE Design Automation Conference, 1093–1098 (2022)
2022
-
[32]
Sehgal, R
R. Sehgal, R. Mehra, C. Ni, and J. P. Kulkarni, Compute-MLROM: Compute-in-multi level read only memory for energy efficient edge AI inference engines. InESSCIRC 2023-IEEE 49th European Solid State Circuits Conference (ESSCIRC), 37–40 (2023)
2023
-
[33]
G. Yin, Y. Chen, M. Lee, X. Du, Y. Ke, W. Tang, Z. Chen, M. Zhou, J. Yue, H. Yang et al. Hybrid SRAM/ROM Compute-in-Memory Architecture for High Task-Level Energy Efficiency in Transformer Models With 8928-kb/mmTEXPRESERVE0 Density in 28nm CMOS.IEEE Journal of Solid-State Circuits(2025)
2025
- [34]
-
[35]
Mead, Neuromorphic electronic systems.Proceedings of the IEEE78, 1629–1636 (1990)
C. Mead, Neuromorphic electronic systems.Proceedings of the IEEE78, 1629–1636 (1990)
1990
-
[36]
Sarpeshkar, Analog versus digital: extrapolating from electronics to neurobiology.Neural Computation10, 1601–1638 (1998)
R. Sarpeshkar, Analog versus digital: extrapolating from electronics to neurobiology.Neural Computation10, 1601–1638 (1998)
1998
-
[37]
L. G. Wright, T. Onodera, M. M. Stein, T. Wang, D. T. Schachter, Z. Hu, and P. L. McMahon, Deep physical neural networks trained with backpropagation.Nature601, 549–555 (2022)
2022
-
[38]
P. L. McMahon, The physics of optical computing.Nature Reviews Physics5, 717–734 (2023)
2023
-
[39]
R. P. Feynman, There’s Plenty of Room at the Bottom.Engineering and Science23, 22–36 (1960)
1960
-
[40]
C. E. Leiserson, N. C. Thompson, J. S. Emer, B. C. Kuszmaul, B. W. Lampson, D. Sanchez, and T. B. Schardl, There’s plenty of room at the Top: What will drive computer performance after Moore’s law?Science368, eaam9744 (2020)
2020
-
[41]
R. A. Brooks, (1999)Cambrian Intelligence: The Early History of the New AI. (The MIT Press)
1999
-
[42]
Hooker, The Hardware Lottery.Communications of the ACM64, 58–65 (2021)
S. Hooker, The Hardware Lottery.Communications of the ACM64, 58–65 (2021)
2021
- [43]
-
[44]
Markovi´ c, A
D. Markovi´ c, A. Mizrahi, D. Querlioz, and J. Grollier, Physics for neuromorphic computing.Nature Reviews Physics2, 499–510 (2020)
2020
-
[45]
M. Yan, C. Huang, P. Bienstman, P. Tino, W. Lin, and J. Sun, Emerging opportunities and challenges for the future of reservoir computing.Nature Communications15, 2056 (2024)
2056
-
[46]
Momeni, B
A. Momeni, B. Rahmani, B. Scellier, L. G. Wright, P. L. McMahon, C. C. Wanjura, Y. Li, A. Skalli, N. G. Berloff, T. Onodera et al. Training of physical neural networks.Nature645, 53–61 (2025)
2025
-
[47]
S. Yu, X. Sun, X. Peng, and S. Huang, Compute-in-memory with emerging nonvolatile-memories: Challenges and prospects. In2020 ieee custom integrated circuits conference (cicc), 1–4 (2020)
2020
-
[48]
T. Yu, Z. Chen, Y. Chen, S. Wang, Y. Liu, H. Yang, and X. Li, DSC-ROM: A Fully Digital Sparsity-Compressed Compute-in-ROM Architecture for on-Chip Deployment of Large-Scale DNNs. In2025 Design, Automation & Test in Europe Conference (DATE), 1–6 (2025)
2025
-
[49]
Agarwal, T.-T
S. Agarwal, T.-T. Quach, O. Parekh, A. H. Hsia, E. P. DeBenedictis, C. D. James, M. J. Marinella, and J. B. Aimone, Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding.Frontiers in neuroscience9, 484 (2016). 20
2016
-
[50]
Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund et al. Deep learning with coherent nanophotonic circuits.Nature Photonics11, 441–446 (2017)
2017
-
[51]
M. G. Anderson, S.-Y. Ma, T. Wang, L. G. Wright, and P. L. McMahon, Optical Transformers.Transactions on Machine Learning Research(2024)
2024
-
[52]
Verhelst, L
M. Verhelst, L. Benini, and N. Verma, How to keep pushing ML accelerator performance? Know your rooflines!IEEE Journal of Solid-State Circuits(2025)
2025
-
[53]
J. L. Hennessy, D. A. Patterson, and C. Kozyrakis, (2025)Computer architecture: A Quantitative Approach. (Morgan Kaufmann, San Francisco, CA)
2025
-
[54]
Khoram, A
E. Khoram, A. Chen, D. Liu, L. Ying, Q. Wang, M. Yuan, and Z. Yu, Nanophotonic media for artificial neural inference. Photonics Research7, 823–827 (2019)
2019
-
[55]
T. Onodera, M. M. Stein, B. A. Ash, M. M. Sohoni, M. Bosch, R. Yanagimoto, M. Jankowski, T. P. McKenna, T. Wang, G. Shvets, M. R. Shcherbakov, L. G. Wright, and P. L. McMahon, Arbitrary control over multimode wave propagation for machine learning.Nature Physics(2026). Earlier version: arXiv:2402.17750 (2024)
-
[56]
Hamerly, L
R. Hamerly, L. Bernstein, A. Sludds, M. Soljaˇ ci´ c, and D. Englund, Large-scale optical neural networks based on photo- electric multiplication.Physical Review X9, 021032 (2019)
2019
-
[57]
Psaltis, D
D. Psaltis, D. Brady, X.-G. Gu, and S. Lin, Holography in artificial neural networks.Nature343, 325–330 (1990)
1990
-
[58]
X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, All-optical machine learning using diffractive deep neural networks.Science361, 1004–1008 (2018)
2018
-
[59]
D. A. B. Miller, Why optics needs thickness.Science379, 41–45 (2023)
2023
-
[60]
Li and F
Y. Li and F. Monticone, The spatial complexity of optical computing: toward space-efficient design.Nature Communi- cations16, 8588 (2025)
2025
-
[61]
K. P. Kalinin and et al. Analog optical computer for AI inference and combinatorial optimization.Nature645, 354–361 (2025)
2025
-
[62]
Nikkhah, A
V. Nikkhah, A. Pirmoradi, F. Ashtiani, B. Edwards, F. Aflatouni, and N. Engheta, Inverse-designed low-index-contrast structures on a silicon photonics platform for vector–matrix multiplication.Nature Photonicspp. 1–8 (2024)
2024
-
[63]
Labroille, B
G. Labroille, B. Denolle, P. Jian, P. Genevaux, N. Treps, and J.-F. Morizur, Efficient and mode selective spatial mode multiplexer based on multi-plane light conversion.Optics express22, 15599–15607 (2014)
2014
-
[64]
N. K. Fontaine, R. Ryf, H. Chen, D. Neilson, and J. Carpenter, Design of high order mode-multiplexers using multiplane light conversion. In2017 European Conference on Optical Communication (ECOC), 1–3 (2017)
2017
-
[65]
Kupianskyi, S
H. Kupianskyi, S. A. Horsley, and D. B. Phillips, High-dimensional spatial mode sorting and optical circuit design using multi-plane light conversion.Apl Photonics8(2023)
2023
-
[66]
T. W. Hughes, I. A. Williamson, M. Minkov, and S. Fan, Wave physics as an analog recurrent neural network.Science advances5, eaay6946 (2019)
2019
-
[67]
Nakajima, K
M. Nakajima, K. Tanaka, and T. Hashimoto, Neural schr¨ odinger equation: Physical law as deep neural network.IEEE transactions on neural networks and learning systems33, 2686–2700 (2021)
2021
-
[68]
Te˘ gin, M
U. Te˘ gin, M. Yıldırım, ˙I. O˘ guz, C. Moser, and D. Psaltis, Scalable optical learning operator.Nature Computational Science1, 542–549 (2021)
2021
-
[69]
B. D. H. Tellegen, A general network theorem, with applications.Philips Research Reports7, 259–269 (1952)
1952
-
[70]
T. Chen, J. van Gelder, B. van de Ven, S. V. Amitonov, B. De Wilde, H.-C. Ruiz Euler, H. Broersma, P. A. Bobbert, F. A. Zwanenburg, and W. G. van der Wiel, Classification with a disordered dopant-atom network in silicon.Nature577, 341–345 (2020)
2020
-
[71]
Ruiz Euler, M
H.-C. Ruiz Euler, M. N. Boon, J. T. Wildeboer, B. van de Ven, T. Chen, H. Broersma, P. A. Bobbert, and W. G. van der Wiel, A deep-learning approach to realizing functionality in nanoelectronic devices.Nature nanotechnology15, 992–998 (2020)
2020
-
[72]
M. N. Boon, H.-C. Ruiz Euler, T. Chen, B. van de Ven, U. Ibarra Garc´ ıa-Padilla, P. A. Bobbert, and W. G. van der Wiel, Dopant network processing units: towards efficient neural-network emulators with high-capacity nanoelectronic nodes. Neuromorphic Computing and Engineering1, 013001 (2021)
2021
-
[73]
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, Adaptive Mixtures of Local Experts.Neural Computation 3, 79–87 (1991)
1991
-
[74]
Shazeer, A
N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.Proceedings of the International Conference on Learning Representations (ICLR)(2017)
2017
-
[75]
Lepikhin, H
D. Lepikhin, H. Lee, Y. Xu, D. Chen, O. Firat, Y. Huang, M. Krikun, N. Shazeer, and et al., GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. InProceedings of the International Conference on Learning Representations (ICLR)(2021)
2021
-
[76]
Fedus, B
W. Fedus, B. Zoph, and N. Shazeer, Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research23, 1–39 (2022)
2022
-
[77]
J. Wang, J. Wang, B. Athiwaratkun, C. Zhang, and J. Zou, Mixture-of-Agents Enhances Large Language Model Capa- bilities.Proceedings of the International Conference on Learning Representations (ICLR)(2025)
2025
-
[78]
Breiman, Bagging Predictors.Machine Learning24, 123–140 (1996)
L. Breiman, Bagging Predictors.Machine Learning24, 123–140 (1996)
1996
-
[79]
Freund and R
Y. Freund and R. E. Schapire, Experiments with a New Boosting Algorithm. InProceedings of the 13th International Conference on Machine Learning (ICML), 148–156 (1996). 21
1996
-
[80]
T. G. Dietterich, An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.Machine Learning40, 139–157 (2000)
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.