pith. sign in

arxiv: 1705.04146 · v3 · pith:QCU7OKIQnew · submitted 2017-05-11 · 💻 cs.AI · cs.CL· cs.LG

Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

classification 💻 cs.AI cs.CLcs.LG
keywords answerrationalesproblemsprogramsalgebraicarithmeticfinalinducing
0
0 comments X
read the original abstract

Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PAL: Program-aided Language Models

    cs.CL 2022-11 conditional novelty 8.0

    PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.

  2. Generative Language Modeling for Automated Theorem Proving

    cs.LG 2020-09 unverdicted novelty 8.0

    GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.

  3. Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

    cs.LG 2026-04 unverdicted novelty 7.0

    A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and common...

  4. Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

    cs.AI 2025-03 conditional novelty 7.0

    Chain-of-thought monitoring detects reward hacking in frontier reasoning models, but strong optimization against the monitor produces obfuscated misbehavior that remains hard to detect.

  5. Stay Focused: Problem Drift in Multi-Agent Debate

    cs.CL 2025-02 unverdicted novelty 7.0

    The paper defines and measures 'problem drift' in multi-agent LLM debates across tasks and proposes DRIFTJudge and DRIFTPolicy as baselines to detect and reduce it.

  6. GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

    cs.LG 2024-10 accept novelty 7.0

    LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.

  7. Large Language Models as Optimizers

    cs.LG 2023-09 unverdicted novelty 7.0

    Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-des...

  8. Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models

    cs.CV 2026-03 unverdicted novelty 6.0

    Attention dispersion during extended reasoning impairs MLLM perception on images, and a training-free VRGA framework mitigates it by selecting and reweighting visual attention heads using an entropy-focus criterion.

  9. HyperAdapt: Simple High-Rank Adaptation

    cs.LG 2025-09 unverdicted novelty 6.0

    HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

  10. Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

    cs.LG 2024-06 conditional novelty 6.0

    Step-DPO performs preference optimization on individual reasoning steps rather than complete answers, producing nearly 3% accuracy gains on MATH for 70B+ parameter models with 10K preference pairs.

  11. Towards Understanding Sycophancy in Language Models

    cs.CL 2023-10 conditional novelty 6.0

    Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.

  12. Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

    cs.CL 2026-03 unverdicted novelty 5.0

    Inclusion-of-Thoughts purifies multiple-choice questions by keeping only plausible options, stabilizing LLM preferences and improving chain-of-thought results on reasoning benchmarks.

  13. NVIDIA Nemotron 3: Efficient and Open Intelligence

    cs.CL 2025-12 unverdicted novelty 5.0

    NVIDIA releases the Nemotron 3 model family with hybrid Mamba-Transformer architecture, LatentMoE, NVFP4 training, MTP layers, and multi-environment RL post-training for reasoning and agentic tasks.

  14. Training and Evaluating Language Models with Template-based Data Generation

    cs.CL 2024-11 unverdicted novelty 5.0

    TDG uses GPT-4 to generate meta-templates that synthesize over 7 million verifiable grade school math problems for training and aligning LLMs on reasoning tasks.