How well do large language models perform in arithmetic tasks?

Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang · 2023 · arXiv 2304.02015

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

cs.CL · 2026-05-19 · conditional · novelty 6.0

DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.

Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Multimodal LLMs perceive numbers accurately across modalities but fail at multi-digit multiplication, with performance predicted by an arithmetic load metric C and degradation confirmed as computational rather than perceptual.

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

cs.CL · 2023-08-03 · unverdicted · novelty 6.0

Pre-training loss predicts LLM math reasoning better than parameter count; rejection sampling fine-tuning with diverse paths raises LLaMA-7B accuracy on GSM8K from 35.9% with SFT to 49.3%.

Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization

cs.RO · 2026-04-15 · unverdicted · novelty 5.0

EEAgent with LSTRO sets new state-of-the-art results on six VIMA-Bench robotic manipulation tasks by dynamically refining prompts through reflection on successes and failures.

citing papers explorer

Showing 4 of 4 citing papers.

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models cs.CL · 2026-05-19 · conditional · none · ref 32
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs cs.CL · 2026-04-20 · unverdicted · none · ref 33
Multimodal LLMs perceive numbers accurately across modalities but fail at multi-digit multiplication, with performance predicted by an arithmetic load metric C and degradation confirmed as computational rather than perceptual.
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models cs.CL · 2023-08-03 · unverdicted · none · ref 103
Pre-training loss predicts LLM math reasoning better than parameter count; rejection sampling fine-tuning with diverse paths raises LLaMA-7B accuracy on GSM8K from 35.9% with SFT to 49.3%.
Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization cs.RO · 2026-04-15 · unverdicted · none · ref 7
EEAgent with LSTRO sets new state-of-the-art results on six VIMA-Bench robotic manipulation tasks by dynamically refining prompts through reflection on successes and failures.

How well do large language models perform in arithmetic tasks?

fields

years

verdicts

representative citing papers

citing papers explorer