Recognition: 2 theorem links
· Lean TheoremNeural Computers
Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3
The pith
Neural computers can learn basic interface alignment and short-horizon control directly from input-output traces
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Neural Computers instantiated as video models that roll out screen frames from instructions, pixels, and actions can acquire I/O alignment and short-horizon control when trained solely on collected interaction traces, although routine reuse, controlled updates, and symbolic stability stay out of reach.
What carries the argument
Video model that generates sequences of screen frames conditioned on instructions and user actions to serve as the runtime state of a Neural Computer
Load-bearing premise
Elementary Neural Computer primitives such as I/O alignment can be learned solely from collected input-output traces without any instrumented access to internal program states.
What would settle it
Train a video-based Neural Computer on traces from one set of interfaces and test whether it produces correct screen outputs and alignments for instructions on a previously unseen program or interface without further training.
read the original abstract
We propose a new frontier: Neural Computers (NCs) that unify computation, memory, and I/O of traditional computers in a learned runtime state. Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine form, with stable execution, explicit reprogramming, and durable capability reuse. As an initial step, we study whether elementary NC primitives can be learned solely from collected I/O traces, without instrumented program state. Concretely, we instantiate NCs as video models that roll out screen frames from instructions, pixels, and user actions (when available) in CLI and GUI settings. We show that NCs can acquire elementary interface primitives, especially I/O alignment and short-horizon control, while routine reuse, controlled updates, and symbolic stability remain challenging. We outline a roadmap toward CNCs, to establish a new computing paradigm beyond today's agents and conventional computers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Neural Computers (NCs) as a new paradigm that unifies computation, memory, and I/O within a learned runtime state, with the long-term aim of Completely Neural Computers (CNCs) featuring stable execution and reusable capabilities. As an initial step, it examines whether elementary NC primitives can be acquired solely from I/O traces by instantiating NCs as video models that predict screen frames from instructions, pixels, and actions in CLI/GUI settings. The authors claim to demonstrate acquisition of I/O alignment and short-horizon control while identifying ongoing challenges in reuse, updates, and stability, and outline a roadmap toward CNCs.
Significance. If the empirical claims were substantiated with verifiable evidence of learned stateful computation, this could introduce a distinct approach to acquiring interface primitives through end-to-end video prediction, potentially informing future work on neural systems that emulate traditional computing abstractions. At present, however, the absence of methods, metrics, or internal-state verification limits any assessment of whether the results advance beyond standard visual prediction or support the unification thesis.
major comments (2)
- Abstract: The central claim that 'NCs can acquire elementary interface primitives, especially I/O alignment and short-horizon control' is asserted without any description of experimental methods, training data scale, evaluation metrics, baselines, or controls. This renders the empirical result unevaluable and load-bearing for the paper's contribution.
- Abstract: The equivalence between accurate screen-frame rollout and acquisition of computational primitives (unification of computation/memory/I/O with internal state management) is not demonstrated. Frame prediction can succeed via pixel-level pattern matching on instructions and visuals without encoding explicit alignment, control, or memory operations, and no instrumented verification of learned runtime state is provided.
minor comments (1)
- Abstract: The distinction between Neural Computers (NCs) and Completely Neural Computers (CNCs) is introduced but remains high-level; a more precise definition of the 'mature' capabilities (e.g., explicit reprogramming) would clarify the roadmap.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review of our manuscript on Neural Computers. The comments highlight important issues regarding the evaluability of our claims and the interpretation of the video-model experiments. We address each major comment point by point below, providing clarifications based on the full manuscript and indicating where revisions will be made to strengthen the presentation.
read point-by-point responses
-
Referee: Abstract: The central claim that 'NCs can acquire elementary interface primitives, especially I/O alignment and short-horizon control' is asserted without any description of experimental methods, training data scale, evaluation metrics, baselines, or controls. This renders the empirical result unevaluable and load-bearing for the paper's contribution.
Authors: We agree that the abstract as written is too high-level and does not allow readers to assess the empirical support for the claim. The full manuscript instantiates NCs as video prediction models trained on I/O traces collected from CLI and GUI environments (e.g., terminal sessions and desktop interactions), where the model receives instructions, current screen pixels, and optional actions to predict future frames. Evaluation focuses on rollout accuracy (pixel-level and semantic consistency over short horizons) and qualitative analysis of whether the generated sequences demonstrate instruction-following and basic control. No large-scale quantitative baselines or controls are reported in the initial experiments, as the work is positioned as a feasibility study rather than a comparative benchmark. We will revise the abstract to include a concise description of the video-model instantiation, the trace-based training data, and the primary evaluation criteria (rollout fidelity and observed primitive behaviors). revision: yes
-
Referee: Abstract: The equivalence between accurate screen-frame rollout and acquisition of computational primitives (unification of computation/memory/I/O with internal state management) is not demonstrated. Frame prediction can succeed via pixel-level pattern matching on instructions and visuals without encoding explicit alignment, control, or memory operations, and no instrumented verification of learned runtime state is provided.
Authors: This observation is correct and points to a genuine interpretive gap. Pixel-level frame prediction can indeed be achieved through statistical pattern matching without necessarily learning explicit computational abstractions or maintaining verifiable internal state. Our experiments show that the trained models produce coherent multi-step rollouts that align instructions with resulting screen changes and exhibit short-horizon action selection, which we interpret as evidence of basic I/O alignment and control primitives. However, the manuscript does not include instrumented probes of the model's internal activations to confirm explicit state management, memory operations, or unification of computation/memory/I/O; this is consistent with our stated methodology of learning solely from external I/O traces without access to program internals. The paper already notes that routine reuse, controlled updates, and symbolic stability remain open challenges. We will add a dedicated limitations paragraph clarifying that predictive accuracy does not equate to verified stateful computation and explicitly state the absence of internal-state instrumentation as a current limitation of the study. revision: partial
Circularity Check
No circularity: conceptual proposal with no derivations or self-referential fits
full rationale
The paper advances a conceptual framework for Neural Computers by proposing unification of computation/memory/I/O in learned states, instantiated as video models trained on I/O traces. No equations, parameter estimations, or derivation chains appear that could reduce predictions to inputs by construction. Claims about acquiring primitives (I/O alignment, short-horizon control) are presented as empirical observations rather than closed-form results derived from prior fitted values or self-citations. The roadmap toward CNCs is forward-looking and does not rely on load-bearing self-citations or ansatzes smuggled from prior work. The provided text contains no self-referential reductions, making the argument self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Neural Computer (NC)
no independent evidence
-
Completely Neural Computer (CNC)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We instantiate NCs as video models that roll out screen frames from instructions, pixels, and user actions... ht = Fθ(ht−1, xt, ut), xt+1 ∼ Gθ(ht)
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NCs can acquire elementary interface primitives, especially I/O alignment and short-horizon control
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents
Intent compilation turns vague human goals into verifiable artifacts, using closure-gap vectors and delegation envelopes to separate open-world agent challenges from closed-world solvers and to benchmark closure fixes...
Reference graph
Works this paper leans on
-
[1]
Adam: A Method for Stochastic Optimization
Kingma DP Ba J Adam et al. A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 1412(6),
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Computer architecture and instruction set design
Paul Constantine Anagnostopoulos, MJ Michel, Gary H Sockut, George M Stabler, and Andries van Dam. Computer architecture and instruction set design. InProceedings of the June 4-8, 1973, national computer conference and exposition, pages 519–527,
1973
-
[3]
Computer use tool — platform.claude.com
Anthropic. Computer use tool — platform.claude.com. https://platform.claude.com/docs/en/agents-and-tools/ tool-use/computer-use-tool. [Accessed 02-02-2026]. Anthropic. Introducing claude sonnet 4.5.https://www.anthropic.com/news/claude-sonnet-4-5, September
2026
-
[4]
Program Synthesis with Large Language Models
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
The fortran automatic coding system
John W Backus, Robert J Beeber, Sheldon Best, Richard Goldberg, Lois M Haibt, Harlan L Herrick, Robert A Nelson, David Sayre, Peter B Sheridan, Harold Stern, et al. The fortran automatic coding system. InPapers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability, pages 188–198,
1957
-
[6]
On the Opportunities and Risks of Foundation Models
ISBN 9780387310732. doi: 10.1117/1.2819119.http://www.library.wisc.edu/selectedtocs/bg0137. pdf. Rishi Bommasani. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1117/1.2819119.http://www.library.wisc.edu/selectedtocs/bg0137
-
[7]
doi: https://doi.org/10.1016/0004-3702(91)90053-M. https://www.sciencedirect.com/science/article/pii/ 000437029190053M. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information process...
-
[8]
Estimating risk and uncertainty in deep reinforcement learning.arXiv preprint arXiv:1905.09638,
William R Clements, Bastien Van Delft, Benoît-Marie Robaglia, Reda Bahi Slaoui, and Sébastien Toth. Estimating risk and uncertainty in deep reinforcement learning.arXiv preprint arXiv:1905.09638,
-
[9]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Finetuning offline world models in the real world,
29 Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chandramouli Rajagopalan, and Xiaolong Wang. Finetuning offline world models in the real world.arXiv preprint arXiv:2310.16029,
-
[11]
Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines.arXiv preprint arXiv:1410.5401,
work page internal anchor Pith review arXiv
-
[12]
David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122,
work page internal anchor Pith review arXiv
-
[13]
David Ha, Andrew Dai, and Quoc V Le. Hypernetworks.arXiv preprint arXiv:1609.09106,
work page internal anchor Pith review arXiv
-
[14]
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination.arXiv preprint arXiv:1912.01603, 2019a. Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on mac...
work page internal anchor Pith review arXiv 1912
-
[15]
Matrix-game 2.0: An open-source real-time and streaming interactive world model
Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, et al. Matrix-game 2.0: An open-source, real-time, and streaming interactive world model.arXiv preprint arXiv:2508.13009,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
MikeInnes, AlanEdelman, KenoFischer, ChrisRackauckas, ElliotSaba, ViralBShah, andWillTebbutt. Adifferentiable programming system to bridge machine learning and scientific computing.arXiv preprint arXiv:1907.07587,
-
[17]
Martin.Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, with Language Models
Daniel Jurafsky and James H. Martin.Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, with Language Models. 3rd edition, 2026.https: //web.stanford.edu/~jurafsky/slp3/. Online manuscript released January 6,
2026
-
[18]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
Deep Learning: A Critical Appraisal
Gary Marcus. Deep learning: A critical appraisal.arXiv preprint arXiv:1801.00631,
-
[20]
ISSN 0001-0782. doi: 10.1145/360018.360022.https://doi.org/10.1145/360018.360022. Derrick Nguyen and Bernard Widrow. The truck backer-upper: An example of self-learning in neural networks. In Advanced neural computers, pages 11–19. Elsevier,
work page doi:10.1145/360018.360022.https://doi.org/10.1145/360018.360022
-
[21]
[Accessed 07-02-2026]. OpenAI. Sora by openai.https://openai.com/sora/,
2026
-
[22]
Accessed: 2025-07-14. OpenAI. Sora 2 is here.https://openai.com/index/sora-2/, September
2025
-
[23]
Movie Gen: A Cast of Media Foundation Models
Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih- Yao Ma, Ching-Yao Chuang, et al. Movie gen: A cast of media foundation models.arXiv preprint arXiv:2410.13720,
work page internal anchor Pith review arXiv
-
[24]
Neural programmer-interpreters
Scott Reed and Nando De Freitas. Neural programmer-interpreters.arXiv preprint arXiv:1511.06279,
-
[25]
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et al. A generalist agent.arXiv preprint arXiv:2205.06175,
work page internal anchor Pith review arXiv
-
[26]
Prompt programming for large language models: Beyond the few-shot paradigm
Laria Reynolds and Kyle McDonell. Prompt programming for large language models: Beyond the few-shot paradigm. InExtended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA ’21, New York, NY, USA,
2021
-
[27]
Association for Computing Machinery. ISBN 9781450380959. doi: 10.1145/3411763.3451760. https://doi.org/10.1145/3411763.3451760. Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. Playing for data: Ground truth from computer games. InEuropean conference on computer vision, pages 102–118. Springer,
-
[28]
Luke Rivard, Sun Sun, Hongyu Guo, Wenhu Chen, and Yuntian Deng. Neuralos: Towards simulating operating systems via neural generative models.arXiv preprint arXiv:2507.08800,
-
[29]
Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks.arXiv preprint arXiv:1606.04671,
work page internal anchor Pith review arXiv
-
[30]
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions
Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, and Thilo Stadelmann. A comprehensive survey of agents for computer use: Foundations, challenges, and future directions, 2025.https://arxiv.org/abs/2501.16150. J. Schmidhuber. Learning to control fast-wei...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
Jürgen Schmidhuber. On learning to think: Algorithmic information theory for novel combinations of reinforcement learning controllers and recurrent neural world models.arXiv preprint arXiv:1511.09249,
-
[32]
One big net for everything.Preprint arXiv:1802.08864,
Jürgen Schmidhuber. One big net for everything.arXiv preprint arXiv:1802.08864,
-
[33]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer
doi: 10.1017/s0140525x00052432. Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efficient processing of deep neural networks: A tutorial and survey.Proceedings of the IEEE, 105(12):2295–2329,
-
[34]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions.arXiv preprint arXiv:1802.04730,
-
[35]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314,
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2023.https://arxiv.org/abs/2201.11903. Paul J Werbos. Learning how the world works: Specifications for predictive networks in robots and brains. In Proceedings of IEEE Intern...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Video models are zero-shot learners and reasoners
Thaddäus Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, and Robert Geirhos. Video models are zero-shot learners and reasoners.arXiv preprint arXiv:2509.20328,
work page internal anchor Pith review arXiv
-
[38]
Does this step contain a critical error? Answer with only ‘yes’ or ‘no’
Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. Gptswarm: Language agents as optimizable graphs. InForty-first International Conference on Machine Learning, 2024a. Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishna...
-
[39]
1\",\"5\
Full metric definitions and the evaluation protocol are provided in Appendix B.3. 48 E Additional Visualizations E.1 CLIGen Visualizations This subsection consolidatesallCLIGen visualization pages referenced in the paper. The main text keeps section-local thumbnail panels at the end of each visualization subsection, while full-size pages are collected her...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.