pith. sign in

arxiv: 1410.4615 · v3 · pith:EH3GOJHKnew · submitted 2014-10-17 · 💻 cs.NE · cs.AI· cs.LG

Learning to Execute

classification 💻 cs.NE cs.AIcs.LG
keywords curriculumlearningnetworksprogramsimprovedlstmlstmsmemory
0
0 comments X
read the original abstract

Recurrent Neural Networks (RNNs) with Long Short-Term Memory units (LSTM) are widely used because they are expressive and are easy to train. Our interest lies in empirically evaluating the expressiveness and the learnability of LSTMs in the sequence-to-sequence regime by training them to evaluate short computer programs, a domain that has traditionally been seen as too complex for neural networks. We consider a simple class of programs that can be evaluated with a single left-to-right pass using constant memory. Our main result is that LSTMs can learn to map the character-level representations of such programs to their correct outputs. Notably, it was necessary to use curriculum learning, and while conventional curriculum learning proved ineffective, we developed a new variant of curriculum learning that improved our networks' performance in all experimental conditions. The improved curriculum had a dramatic impact on an addition problem, making it possible to train an LSTM to add two 9-digit numbers with 99% accuracy.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    cs.CL 2022-01 accept novelty 9.0

    Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.

  2. Show Your Work: Scratchpads for Intermediate Computation with Language Models

    cs.LG 2021-11 unverdicted novelty 8.0

    Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.

  3. The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

    cs.LG 2026-05 unverdicted novelty 7.0

    Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.

  4. Training Transformers as a Universal Computer

    cs.AI 2026-04 unverdicted novelty 7.0

    A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.

  5. SPaCe: Unlocking Sample-Efficient Large Language Models Training With Self-Pace Curriculum Learning

    cs.LG 2025-08 unverdicted novelty 6.0

    SPaCe uses semantic clustering to shrink training sets and a multi-armed bandit to adaptively select samples, matching or beating baselines on reasoning benchmarks with up to 100x fewer examples.

  6. Program Synthesis with Large Language Models

    cs.PL 2021-08 unverdicted novelty 6.0

    Large language models synthesize Python code from descriptions with log-linear scaling in performance, reaching 59.6% on MBPP via few-shot prompting and 83.8% on MathQA-Python after fine-tuning, while human feedback h...

  7. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    cs.LG 2021-04 accept novelty 6.0

    Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

  8. Universal Transformers

    cs.CL 2018-07 unverdicted novelty 6.0

    Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

  9. DeepFWI: Identifying Bug-Sensitive Warnings with Multi-Modal Code-Warning Semantics

    cs.SE 2024-03 conditional novelty 5.0

    DeepFWI is a multi-modal LSTM model with cross-attention that identifies bug-sensitive warnings at warning granularity, reaching 67.06% F1 on a 280k-warning dataset and surfacing 25 confirmed bugs in four open-source ...

  10. PaLM 2 Technical Report

    cs.CL 2023-05 unverdicted novelty 5.0

    PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

  11. Growing Action Spaces

    cs.LG 2019-06 unverdicted novelty 5.0

    A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.