pith. sign in

arxiv: 2606.09368 · v1 · pith:7EOY7PMRnew · submitted 2026-06-08 · 💻 cs.CV · cs.AI

PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments

Pith reviewed 2026-06-27 17:29 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords scene graphsphysics experimentsvisual reasoningdatasetrelational reasoningfunctional relationsexperimental setupsscientific scenes
0
0 comments X

The pith

PhysScene is the first scene graph dataset built for physics experiments rather than everyday scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fill a gap where current scene graph datasets cover only generic natural images and therefore cannot test reasoning about the specialized instruments and logical dependencies that appear in actual physics experiments. It does this by releasing PhysScene, a collection that prioritizes high relation density and semantic constraints over sheer size. A sympathetic reader would care because the new data lets algorithms be evaluated on functional relations inside lab setups instead of only spatial co-occurrence. The work shows through analysis and experiments that this domain-specific testbed reveals limitations in existing parsing methods while opening a route to better scientific visual reasoning tools.

Core claim

PhysScene is the first scene graph dataset tailored to physics experiments. It encompasses specialized instruments, structured experimental setups, and functional relations intrinsic to experimental environments, enabling reasoning that extends beyond spatial co-occurrence to logical dependencies. Rather than pursuing large data scale, PhysScene focuses on strong semantic constraints and high relation density in experimental scenes, posing new challenges for existing scene parsing algorithms while offering opportunities for further improvements.

What carries the argument

Scene graphs that record objects together with their pairwise functional relations inside structured physics experiment setups.

If this is right

  • Scene parsing algorithms must now handle logical dependencies among lab instruments in addition to spatial layout.
  • The dataset supplies a concrete testbed for measuring progress on scientific visual reasoning tasks.
  • Development of monitoring and analysis systems for physics experiments gains a dedicated evaluation resource.
  • High relation density and semantic constraints in the data expose where current methods fall short.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same construction approach could be repeated for other laboratory domains such as chemistry or biology to create comparable testbeds.
  • Trained models could eventually support automated logging or safety checks that track whether an experiment is following its intended functional sequence.
  • The emphasis on functional over purely spatial relations suggests future scene graph work may need explicit modules for causal or procedural links.

Load-bearing premise

Existing scene graph datasets focus mainly on generic natural scenes and therefore leave domain-specific experimental scenes underexplored.

What would settle it

If models trained only on generic scene graph datasets achieve comparable accuracy and relation prediction scores on PhysScene images as they do on natural-image benchmarks, the claimed need for a physics-specific dataset would be weakened.

Figures

Figures reproduced from arXiv: 2606.09368 by Abdulmotaleb El Saddik, Baoquan Zhao, Guanghui Yue, Minghao Zou, Qingtian Zeng, Shangkun Liu, Wei Zhou, Yanda Meng.

Figure 1
Figure 1. Figure 1: An overview of the PhysScene dataset, including the dataset construction pipeline, representative image samples, and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative examples illustrating collection vari [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of object categories in PhysScene. The dataset covers 34 object categories spanning experimental [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of relation annotations in PhysScene, including human actions, object attributes, and spatial relations. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise relationships. Despite recent progress, existing datasets primarily focus on generic natural contexts, leaving domain-specific and function-oriented scenes largely underexplored. This limitation restricts the evaluation of relational reasoning in scientific experimental scenes, thereby hindering the development of intelligent monitoring, analysis, and related applications in such scenes. To address this gap, we introduce PhysScene, the first SG dataset tailored to physics experiments. PhysScene encompasses specialized instruments, structured experimental setups, and functional relations intrinsic to experimental environments, enabling reasoning that extends beyond spatial co-occurrence to logical dependencies. Rather than pursuing large data scale, PhysScene focuses on strong semantic constraints and high relation density in experimental scenes, posing new challenges for existing scene parsing algorithms while offering opportunities for further improvements. Extensive analyses and experiments show that PhysScene complements existing benchmarks and establishes a valuable testbed for advancing scientific visual reasoning. The dataset is publicly available at https://github.com/ZMH-SDUST/PhysScene.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces PhysScene, the first scene graph dataset tailored to physics experiments. It claims that existing SG datasets focus on generic natural contexts, while PhysScene includes specialized instruments, structured experimental setups, and functional relations intrinsic to experimental environments. The dataset emphasizes strong semantic constraints and high relation density rather than large scale, posing new challenges for scene parsing algorithms. Extensive analyses and experiments are presented to show that PhysScene complements existing benchmarks and establishes a valuable testbed for scientific visual reasoning. The dataset is released publicly via GitHub.

Significance. If the dataset indeed supplies functional relations and logical dependencies in physics lab scenes that go beyond spatial co-occurrence in generic datasets, it would fill a documented niche and support progress in domain-specific relational reasoning. The standard dataset-contribution structure (gap identification, targeted collection with high relation density, complementarity demonstration) is internally consistent, and the public release aids reproducibility.

minor comments (3)
  1. Abstract and introduction: the assertion that PhysScene is 'the first' SG dataset for physics experiments requires an explicit comparison table or paragraph citing the closest prior datasets (e.g., Visual Genome, CLEVR, or any lab-specific efforts) to substantiate the novelty claim.
  2. Dataset description section: statistics on number of scenes, objects per scene, relation types, and relation density should be presented in a table early in the paper so readers can evaluate the 'high relation density' claim without needing to inspect the GitHub repository.
  3. Experiments section: the claim that PhysScene 'poses new challenges for existing scene parsing algorithms' should be supported by at least one quantitative baseline result (e.g., SG generation mAP or relation prediction accuracy) on PhysScene versus a generic dataset, even if only as a preliminary result.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report contains no major comments, so there are no specific points to address point-by-point.

Circularity Check

0 steps flagged

No significant circularity; dataset contribution is self-contained

full rationale

The paper introduces PhysScene as a new scene-graph dataset for physics experiments. It contains no equations, derivations, fitted parameters, or predictions that could reduce to inputs by construction. The central claim (first dataset tailored to this domain with functional relations) is a standard novelty argument supported by comparison to existing benchmarks; it does not rely on self-citation chains, uniqueness theorems from the same authors, or any self-definitional loop. The contribution stands on the released data and analyses rather than any internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper. No free parameters, mathematical axioms, or invented entities are described or required by the abstract.

pith-pipeline@v0.9.1-grok · 5732 in / 1000 out tokens · 20403 ms · 2026-06-27T17:29:19.230769+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references

  1. [1]

    Anja Belz, Adrian Muscat, Pierre Anguill, Mouhamadou Sow, Gaétan Vincent, and Yassine Zinessabah. 2018. SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects. InProceedings of the 11th International Conference on Natural Language Generation. ACL, Tilburg, 140–145

  2. [2]

    Asish Bera, Zachary Wharton, Yonghuai Liu, Nik Bessis, and Ardhendu Behera

  3. [3]

    SR-GNN: Spatial Relation-Aware Graph Neural Network for Fine-Grained Image Categorization.IEEE Transactions on Image Processing31 (2022), 6017– 6031

  4. [4]

    Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann. 2021. A Comprehensive Survey of Scene Graphs: Generation and Application.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 1 (2021), 1–26

  5. [5]

    Jie Chen, Xing Zhou, Yi Zhang, Geng Sun, Min Deng, and Haifeng Li. 2021. Message-Passing-Driven Triplet Representation for Geo-Object Relational Infer- ence in HRSI.IEEE Geoscience and Remote Sensing Letters19 (2021), 1–5

  6. [6]

    Zuyao Chen, Jinlin Wu, Zhen Lei, and Chang Wen Chen. 2025. From Data to Modeling: Fully Open-Vocabulary Scene Graph Generation.arXiv preprint arXiv:2505.20106(2025), 1–14

  7. [7]

    Shaohua Gao, Kailun Yang, Hao Shi, Kaiwei Wang, and Jian Bai. 2022. Review on Panoramic Imaging and its Applications in Scene Understanding.IEEE Transac- tions on Instrumentation and Measurement71 (2022), 1–34

  8. [8]

    Tao He, Lianli Gao, Jingkuan Song, and Yuan-Fang Li. 2022. Towards Open- Vocabulary Scene Graph Generation with Prompt-Based Finetuning. InEuropean Conference on Computer Vision. Springer, Tel Aviv, 56–73

  9. [9]

    Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al

  10. [10]

    Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations.International Journal of Computer Vision123, 1 (2017), 32–73

  11. [11]

    Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al. 2020. The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale.International Journal of Computer Vision128, 7 (2020), 1956–1981

  12. [12]

    Yansheng Li, Kun Li, Yongjun Zhang, Linlin Wang, and Dingwen Zhang. 2024. AUG: A New Dataset and an Efficient Model for Aerial Image Urban Scene Graph Generation.arXiv preprint arXiv:2404.07788(2024), 1–16

  13. [13]

    Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, et al . 2025. STAR: A First-Ever Dataset and a Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery.IEEE Transactions on Pattern Analysis and Machine Intelligence47, 3 (2025), 1832–1849

  14. [14]

    Yuanzhi Liang, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, and Tao Mei

  15. [15]

    InProceedings of the IEEE/CVF International Conference on Computer Vision

    VrR-VG: Refocusing Visually-Relevant Relationships. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Seoul, 10403–10412

  16. [16]

    Julian Lorenz, Florian Barthel, Daniel Kienzle, and Rainer Lienhart. 2023. Haystack: A Panoptic Scene Graph Dataset to Evaluate Rare Predicate Classes. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Paris, 62–70

  17. [17]

    Jeeseung Park, Jin-Woo Park, and Jong-Seok Lee. 2023. VIPLO: Vision Trans- former Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17152–17162

  18. [18]

    Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased Scene Graph Generation from Biased Training. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Seattle, 3716– 3725

  19. [19]

    Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene Graph Generation by Iterative Message Passing. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, 5410–5419

  20. [20]

    Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, and Ziwei Liu. 2022. Panoptic Scene Graph Generation. InEuropean Conference on Computer Vision. Springer, Tel Aviv, 178–196

  21. [21]

    Kaiyu Yang, Olga Russakovsky, and Jia Deng. 2019. SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Seoul, 2051–2060

  22. [22]

    Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural Motifs: Scene Graph Parsing with Global Context. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, 5831–5840

  23. [23]

    Ruonan Zhang, Gaoyun An, Yiqing Hao, and Dapeng Oliver Wu. 2024. Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 11 (2024), 7102–7119

  24. [24]

    Yong Zhang, Yingwei Pan, Ting Yao, Rui Huang, Tao Mei, and Chang-Wen Chen

  25. [25]

    InProceedings of the IEEE/CVF International Conference on Computer Vision

    Learning to Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Vancouver, 2915–2924

  26. [26]

    Zijian Zhou, Zheng Zhu, Holger Caesar, and Miaojing Shi. 2024. OpenPSG: Open-Set Panoptic Scene Graph Generation via Large Multimodal Models. In European Conference on Computer Vision. Springer, Milan, 199–215

  27. [27]

    Minghao Zou, Qingtian Zeng, Yongping Miao, Shangkun Liu, Zilong Wang, Hantao Liu, and Wei Zhou. 2025. PhysLab: A Benchmark Dataset for Multi- Granularity Visual Parsing of Physics Experiments. InProceedings of the 33rd ACM International Conference on Multimedia. ACM, Dublin, 12799–12806