DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
How well do large language models perform in arithmetic tasks?
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Multimodal LLMs perceive numbers accurately across modalities but fail at multi-digit multiplication, with performance predicted by an arithmetic load metric C and degradation confirmed as computational rather than perceptual.
Pre-training loss predicts LLM math reasoning better than parameter count; rejection sampling fine-tuning with diverse paths raises LLaMA-7B accuracy on GSM8K from 35.9% with SFT to 49.3%.
EEAgent with LSTRO sets new state-of-the-art results on six VIMA-Bench robotic manipulation tasks by dynamically refining prompts through reflection on successes and failures.
citing papers explorer
-
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
-
Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs
Multimodal LLMs perceive numbers accurately across modalities but fail at multi-digit multiplication, with performance predicted by an arithmetic load metric C and degradation confirmed as computational rather than perceptual.
-
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Pre-training loss predicts LLM math reasoning better than parameter count; rejection sampling fine-tuning with diverse paths raises LLaMA-7B accuracy on GSM8K from 35.9% with SFT to 49.3%.
-
Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization
EEAgent with LSTRO sets new state-of-the-art results on six VIMA-Bench robotic manipulation tasks by dynamically refining prompts through reflection on successes and failures.