pith. sign in

arxiv: 2005.00730 · v2 · pith:QNBUIQ5Lnew · submitted 2020-05-02 · 💻 cs.CL · cs.LG

ESPRIT: Explaining Solutions to Physical Reasoning Tasks

classification 💻 cs.CL cs.LG
keywords physicalespritdescriptionseventshumanapproachframeworkinterpretable
0
0 comments X
read the original abstract

Neural networks lack the ability to reason about qualitative physics and so cannot generalize to scenarios and tasks unseen during training. We propose ESPRIT, a framework for commonsense reasoning about qualitative physics in natural language that generates interpretable descriptions of physical events. We use a two-step approach of first identifying the pivotal physical events in an environment and then generating natural language descriptions of those events using a data-to-text approach. Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution using those interpretable descriptions. Human evaluations indicate that ESPRIT produces crucial fine-grained details and has high coverage of physical concepts compared to even human annotations. Dataset, code and documentation are available at https://github.com/salesforce/esprit.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Do generative video models understand physical principles?

    cs.CV 2025-01 unverdicted novelty 8.0

    Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.

  2. $\Delta$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos

    cs.CV 2026-05 unverdicted novelty 6.0

    A vision-language framework generates text-based rigid-body scene configurations from videos using motion reasoning and optical flow, reporting 0.30 IoU on CLEVRER (7x over baselines) and transfer to 235 real videos.

  3. VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

    cs.CV 2026-02 unverdicted novelty 6.0

    VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.