Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer

Dongming Jin; Jie Fu; Kechi Zhang; Wei Bi; Yifan Zhang; Zhi Jin

arxiv: 2601.05770 · v3 · pith:4JGIN6A2new · submitted 2026-01-09 · 💻 cs.LG · cs.CL

Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer

Yifan Zhang , Wei Bi , Kechi Zhang , Dongming Jin , Jie Fu , Zhi Jin This is my paper

classification 💻 cs.LG cs.CL

keywords discretetransformerprogramsextractionsymbolictasksalgorithmexecutable

0 comments

read the original abstract

Algorithm extraction aims to synthesize executable programs directly from models trained on algorithmic tasks, enabling de novo recovery of executable mechanisms from weights without relying on human-written target programs. However, applying this paradigm to Transformer is complicated by representation entanglement (e.g., superposition), where features encoded in overlapping directions substantially hinder the recovery of symbolic expressions. We propose the Discrete Transformer, an architecture explicitly designed to bridge the gap between continuous representations and discrete symbolic logic. By injecting discreteness through temperature-annealed sampling, our framework effectively leverages hypothesis testing and symbolic regression to extract human-readable programs. Empirically, the Discrete Transformer achieves performance comparable to the RNN-based MIPS baseline on shared discrete tasks, while broadening extraction to tasks with continuous-valued intermediate computations. Finally, we show that architectural inductive biases provide fine-grained control over synthesized programs, establishing the Discrete Transformer as a controllable testbed for algorithm extraction and Transformer interpretability.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Loom: A Scalable Analytical Neural Computer Architecture
cs.LG 2026-04 unverdicted novelty 7.0

Loom implements a 22-opcode C-compatible computer inside an 8-layer transformer with analytically derived, program-independent weights, executing instructions iteratively on a fixed-size state tensor at constant per-s...