A Differentiable Atari VCS:A Complex, Fully Known Ground Truth for Explainable AI
Pith reviewed 2026-06-26 11:06 UTC · model grok-4.3
The pith
The Atari 2600 VCS hardware can be reformulated so its execution is fully differentiable while remaining bit-identical to the original at any finite temperature.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Treating the cartridge ROM as a weight tensor, RAM as a soft tape, and control flow as gates, the differentiable (soft) execution equals the original (hard) one bit-for-bit in the forward pass at any finite temperature, while exposing surrogate gradients where the bit logic has none. Both the Julia and JAX ports match the reference emulator on every game and open a GPU path for batched rollouts at millions of environment steps per second.
What carries the argument
The soft-logic reformulation that converts VCS ROM into weights, RAM into a continuous tape, and control flow into gates, thereby preserving exact forward equivalence while adding gradients.
If this is right
- Gradient-based XAI methods become applicable to a complex, fully known system whose every state can be inspected.
- Batched rollouts on commodity GPUs reach millions of environment steps per second.
- Explanations produced by any method can be checked directly against the actual cartridge and RAM states.
- The open-source ports supply a reproducible testbed for any future gradient or explanation technique.
Where Pith is reading between the lines
- The same ROM-as-weights and gate-as-control reformulation could be tried on other fixed-hardware emulators to create additional known-complex test objects.
- Direct gradient descent through the emulator itself becomes possible for tasks that previously required black-box reinforcement learning.
- Qualitative studies of gradient flow through the exposed surrogate paths could show how explanations scale from simple logic to full game cartridges.
Load-bearing premise
The soft-logic reformulation preserves exact forward-pass equivalence to the discrete hardware at finite temperature.
What would settle it
Any run of the differentiable emulator on one of the 64 games that produces even a single differing RAM byte or screen pixel compared with the xitari reference.
Figures
read the original abstract
Explanation requires ground truth: to verify an account of a system we must know its inner functioning-just what is missing where explainable AI (XAI) is most needed. Systems we can study fall into two camps. Simple, procedural one-decision trees, rule lists, sparse linear models-have a known but trivial mechanism, so explaining them tests nothing; genuinely complex ones-deep networks, real-world tasks-need XAI but have no ground-truth inner functioning, so an explanation can be plausible, confident, and wrong with no way to tell. We remove this dichotomy with a study object both genuinely complex and fully specified-inspectable by construction-and, so gradient methods apply, fully differentiable. We reimplement the Atari 2600 Video Computer System (VCS)-a real computer architecture, and the cradle of deep reinforcement learning-as two independent end-to-end differentiable emulators in Julia (jutari) and JAX (jaxtari), each validated bit-for-bit against xitari. Both reproduce xitari on all 64 supported Arcade Learning Environment (ALE) games: 64/64 byte-identical RAM and 64/64 pixel-identical screens. Treating the cartridge ROM as a weight tensor, RAM as a soft tape, and control flow as gates, we prove the differentiable (soft) execution equals the original (hard) one bit-for-bit in the forward pass at any finite temperature, while exposing surrogate gradients where the bit logic has none. The JAX port also opens a GPU path: batched differentiable rollouts reach millions of environment-steps/s on one commodity GPU. The system was built in roughly 137 active hours over 29 calendar days, much of it written autonomously by coding agents. This paper builds and validates the foundation, showing-theoretically and in a qualitative gradient study-that gradient-based XAI on it is feasible. Both ports' full code is available under the MIT license at https://github.com/akmaier/UnderstandingVCS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to have created two end-to-end differentiable emulators of the Atari 2600 VCS (jutari in Julia and jaxtari in JAX) that achieve byte-identical RAM and pixel-identical screen matches with the xitari reference on all 64 supported ALE games. Treating cartridge ROM as a weight tensor, RAM as a soft tape, and control flow as gates, it asserts a proof that the differentiable (soft) execution equals the original (hard) execution bit-for-bit in the forward pass at any finite temperature while exposing surrogate gradients; the JAX version enables GPU batched rollouts at high throughput, and the system is positioned as a fully known ground-truth benchmark for gradient-based XAI methods.
Significance. If the claimed forward-pass equivalence holds, the work supplies a genuinely complex yet fully inspectable and differentiable system for XAI evaluation, directly addressing the dichotomy between trivial known mechanisms and opaque complex ones. The exhaustive 64/64 empirical validation, open MIT-licensed code, and GPU acceleration path constitute concrete strengths that would make the artifact useful for the community.
major comments (1)
- [Abstract and theoretical argument] Abstract and theoretical argument: the central claim of exact bit-for-bit forward-pass identity between the soft-logic reformulation and discrete hardware at any finite temperature rests on the specific gate and tape constructions. The supplied evidence is the xitari match; this empirically confirms reproduction of hard behavior but does not independently establish that the soft construction itself is mathematically identical for arbitrary finite temperature, because any unstated assumption about how temperature enters the control-flow gates could break the identity while still passing the xitari test.
minor comments (1)
- The abstract refers to 'a qualitative gradient study' demonstrating feasibility of gradient-based XAI; a short description or pointer to the relevant section/figure would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the opportunity to clarify the theoretical claims. We respond to the major comment below.
read point-by-point responses
-
Referee: Abstract and theoretical argument: the central claim of exact bit-for-bit forward-pass identity between the soft-logic reformulation and discrete hardware at any finite temperature rests on the specific gate and tape constructions. The supplied evidence is the xitari match; this empirically confirms reproduction of hard behavior but does not independently establish that the soft construction itself is mathematically identical for arbitrary finite temperature, because any unstated assumption about how temperature enters the control-flow gates could break the identity while still passing the xitari test.
Authors: We thank the referee for highlighting the need for a clearer separation between the mathematical construction and its empirical validation. The manuscript derives the bit-for-bit forward-pass identity directly from the definitions of the soft gates (which implement exact logical operations via temperature-independent selection) and the soft tape (which preserves exact addressing and state updates). Temperature enters exclusively in the backward pass to supply surrogate gradients; the forward computation is constructed to be identical to the discrete case for any finite temperature by design. The xitari match validates that the implementation faithfully realizes these constructions across all 64 games. We agree that the current exposition could make the independence from temperature more explicit and will revise the theoretical section to include a step-by-step derivation of the identity, stating all assumptions regarding gate and tape behavior. revision: yes
Circularity Check
No significant circularity; derivation self-contained via explicit proof and external validation
full rationale
The paper asserts a mathematical proof that the soft execution equals the hard one bit-for-bit at finite temperature, supported by bit-identical validation against the independent xitari emulator on 64 games. No equations define a quantity in terms of itself, no parameters are fitted and then called predictions, and no load-bearing claims rest on self-citations. The central equivalence is presented as proven independently of the empirical match, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The xitari reference emulator correctly implements the Atari 2600 hardware specification.
- standard math Finite-temperature soft logic gates admit surrogate gradients that do not alter the exact forward-pass identity.
Reference graph
Works this paper leans on
-
[1]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Estimating or Propagating Gradients through Stochastic Neurons for Conditional Computation. arXiv:1308.3432. Bezanson, J.; Edelman, A.; Karpinski, S.; and Shah, V. B
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Bradbury, J.; Frostig, R.; Hawkins, P.; Johnson, M
Julia: A Fresh Approach to Numerical Computing.SIAM Review, 59(1): 65–98. Bradbury, J.; Frostig, R.; Hawkins, P.; Johnson, M. J.; Leary, C.; Maclaurin,D.;Necula,G.;Paszke,A.;VanderPlas,J.;Wanderman- Milne,S.;andZhang,Q.2018. JAX:ComposableTransformations of Python+NumPy Programs. http://github.com/google/jax. Soft- ware. Chattopadhyay,A.;Sarkar,A.;Howlade...
2018
-
[3]
Dalton, S.; Frosio, I.; and Garland, M
A Survey on Explainable Deep Reinforcement Learning.arXiv preprint arXiv:2502.06869. Dalton, S.; Frosio, I.; and Garland, M
- [4]
-
[5]
https://github.com/google-deepmind/xitari
Xitari: An Arcade Learning Environment Fork. https://github.com/google-deepmind/xitari. Accessed: 2026-06-
2026
-
[6]
OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments. arXiv:2306.08649. Delfosse, Q.; Emunds, R.; Seitz, P.; Wette, S.; Blüml, J.; and Ker- sting, K
-
[7]
https://github.com/k4ntz/JAXAtari
JAXAtari: A High-Performance Framework for Reasoning Agents. https://github.com/k4ntz/JAXAtari. Software; accessed 2026-06-19. Freeman,C.D.;Frey,E.;Raichuk,A.;Girgin,S.;Mordatch,I.;and Bachem,O.2021.Brax—ADifferentiablePhysicsEngineforLarge Scale Rigid Body Simulation.arXiv preprint arXiv:2106.13281. Goldberg, D
-
[8]
Neural Turing Machines. arXiv:1410.5401. Greydanus,S.;Koul,A.;Dodge,J.;andFern,A.2018. Visualizing and Understanding Atari Agents. InProceedings of the 35th In- ternational Conference on Machine Learning (ICML), volume 80, 1792–1801. PMLR. IEEE
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic (IEEE Std 754-2019). IEEE Std 754-2019. Innes, M
2019
-
[10]
Don't Unroll Adjoint: Differentiating SSA-Form Programs
Don’t Unroll Adjoint: Differentiating SSA-Form Programs. arXiv:1810.07951. Jaderberg, M.; Simonyan, K.; Zisserman, A.; and Kavukcuoglu, K
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
XDQN: Inherently Interpretable DQN through Mimicking. arXiv:2301.03043. Machado, M. C.; Bellemare, M. G.; Talvitie, E.; Veness, J.; Hausknecht, M.; and Bowling, M
-
[12]
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning.arXiv preprint arXiv:1312.5602. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; and Hassabis, D
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Free- Lunch Saliency via Attention in Atari Agents. arXiv:1908.02511. Qing, Y.; Liu, S.; Song, J.; Zhou, Y.; Chen, K.; Wang, H.; and Song, M
-
[14]
Raissi, M.; Perdikaris, P.; and Karniadakis, G
A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, and Challenges.arXiv preprint arXiv:2211.06665. Raissi, M.; Perdikaris, P.; and Karniadakis, G. E
-
[15]
ASurveyofExplainableReinforcementLearn- ing:Targets,MethodsandNeeds.arXivpreprintarXiv:2507.12599
Saulières,L.2025. ASurveyofExplainableReinforcementLearn- ing:Targets,MethodsandNeeds.arXivpreprintarXiv:2507.12599. Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D
-
[16]
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.International Journal ofComputerVision,128(2):336–359.OriginallyarXiv:1610.02391 (2016). StellaTeam.2024. Stella:AMulti-PlatformAtari2600VCSEmu- lator. https://stella-emu.github.io. Accessed: 2026-06-16. Such, F. P.; Madhavan, V.; Liu, R.; Wang, R.; Castro, P. S.; Li, Y.; Zhi...
-
[17]
and its backward pass is a surrogate.Thefullyrelaxedmode(full)isusedonlyforthe temperature-limit analysis (Theorem 4); its forward pass is bit-exact only inside the corner of smallTand largeα. Mode Forward Gradient Used for hardbit-exact none conformance soft-ste=hardsurrogate attribution fullexact in corner relaxedT→0study Numerical scope.Soft mode keeps...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.