PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments
Pith reviewed 2026-06-27 17:29 UTC · model grok-4.3
The pith
PhysScene is the first scene graph dataset built for physics experiments rather than everyday scenes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PhysScene is the first scene graph dataset tailored to physics experiments. It encompasses specialized instruments, structured experimental setups, and functional relations intrinsic to experimental environments, enabling reasoning that extends beyond spatial co-occurrence to logical dependencies. Rather than pursuing large data scale, PhysScene focuses on strong semantic constraints and high relation density in experimental scenes, posing new challenges for existing scene parsing algorithms while offering opportunities for further improvements.
What carries the argument
Scene graphs that record objects together with their pairwise functional relations inside structured physics experiment setups.
If this is right
- Scene parsing algorithms must now handle logical dependencies among lab instruments in addition to spatial layout.
- The dataset supplies a concrete testbed for measuring progress on scientific visual reasoning tasks.
- Development of monitoring and analysis systems for physics experiments gains a dedicated evaluation resource.
- High relation density and semantic constraints in the data expose where current methods fall short.
Where Pith is reading between the lines
- The same construction approach could be repeated for other laboratory domains such as chemistry or biology to create comparable testbeds.
- Trained models could eventually support automated logging or safety checks that track whether an experiment is following its intended functional sequence.
- The emphasis on functional over purely spatial relations suggests future scene graph work may need explicit modules for causal or procedural links.
Load-bearing premise
Existing scene graph datasets focus mainly on generic natural scenes and therefore leave domain-specific experimental scenes underexplored.
What would settle it
If models trained only on generic scene graph datasets achieve comparable accuracy and relation prediction scores on PhysScene images as they do on natural-image benchmarks, the claimed need for a physics-specific dataset would be weakened.
Figures
read the original abstract
Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise relationships. Despite recent progress, existing datasets primarily focus on generic natural contexts, leaving domain-specific and function-oriented scenes largely underexplored. This limitation restricts the evaluation of relational reasoning in scientific experimental scenes, thereby hindering the development of intelligent monitoring, analysis, and related applications in such scenes. To address this gap, we introduce PhysScene, the first SG dataset tailored to physics experiments. PhysScene encompasses specialized instruments, structured experimental setups, and functional relations intrinsic to experimental environments, enabling reasoning that extends beyond spatial co-occurrence to logical dependencies. Rather than pursuing large data scale, PhysScene focuses on strong semantic constraints and high relation density in experimental scenes, posing new challenges for existing scene parsing algorithms while offering opportunities for further improvements. Extensive analyses and experiments show that PhysScene complements existing benchmarks and establishes a valuable testbed for advancing scientific visual reasoning. The dataset is publicly available at https://github.com/ZMH-SDUST/PhysScene.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PhysScene, the first scene graph dataset tailored to physics experiments. It claims that existing SG datasets focus on generic natural contexts, while PhysScene includes specialized instruments, structured experimental setups, and functional relations intrinsic to experimental environments. The dataset emphasizes strong semantic constraints and high relation density rather than large scale, posing new challenges for scene parsing algorithms. Extensive analyses and experiments are presented to show that PhysScene complements existing benchmarks and establishes a valuable testbed for scientific visual reasoning. The dataset is released publicly via GitHub.
Significance. If the dataset indeed supplies functional relations and logical dependencies in physics lab scenes that go beyond spatial co-occurrence in generic datasets, it would fill a documented niche and support progress in domain-specific relational reasoning. The standard dataset-contribution structure (gap identification, targeted collection with high relation density, complementarity demonstration) is internally consistent, and the public release aids reproducibility.
minor comments (3)
- Abstract and introduction: the assertion that PhysScene is 'the first' SG dataset for physics experiments requires an explicit comparison table or paragraph citing the closest prior datasets (e.g., Visual Genome, CLEVR, or any lab-specific efforts) to substantiate the novelty claim.
- Dataset description section: statistics on number of scenes, objects per scene, relation types, and relation density should be presented in a table early in the paper so readers can evaluate the 'high relation density' claim without needing to inspect the GitHub repository.
- Experiments section: the claim that PhysScene 'poses new challenges for existing scene parsing algorithms' should be supported by at least one quantitative baseline result (e.g., SG generation mAP or relation prediction accuracy) on PhysScene versus a generic dataset, even if only as a preliminary result.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report contains no major comments, so there are no specific points to address point-by-point.
Circularity Check
No significant circularity; dataset contribution is self-contained
full rationale
The paper introduces PhysScene as a new scene-graph dataset for physics experiments. It contains no equations, derivations, fitted parameters, or predictions that could reduce to inputs by construction. The central claim (first dataset tailored to this domain with functional relations) is a standard novelty argument supported by comparison to existing benchmarks; it does not rely on self-citation chains, uniqueness theorems from the same authors, or any self-definitional loop. The contribution stands on the released data and analyses rather than any internal reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Anja Belz, Adrian Muscat, Pierre Anguill, Mouhamadou Sow, Gaétan Vincent, and Yassine Zinessabah. 2018. SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects. InProceedings of the 11th International Conference on Natural Language Generation. ACL, Tilburg, 140–145
2018
-
[2]
Asish Bera, Zachary Wharton, Yonghuai Liu, Nik Bessis, and Ardhendu Behera
-
[3]
SR-GNN: Spatial Relation-Aware Graph Neural Network for Fine-Grained Image Categorization.IEEE Transactions on Image Processing31 (2022), 6017– 6031
2022
-
[4]
Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann. 2021. A Comprehensive Survey of Scene Graphs: Generation and Application.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 1 (2021), 1–26
2021
-
[5]
Jie Chen, Xing Zhou, Yi Zhang, Geng Sun, Min Deng, and Haifeng Li. 2021. Message-Passing-Driven Triplet Representation for Geo-Object Relational Infer- ence in HRSI.IEEE Geoscience and Remote Sensing Letters19 (2021), 1–5
2021
-
[6]
Zuyao Chen, Jinlin Wu, Zhen Lei, and Chang Wen Chen. 2025. From Data to Modeling: Fully Open-Vocabulary Scene Graph Generation.arXiv preprint arXiv:2505.20106(2025), 1–14
arXiv 2025
-
[7]
Shaohua Gao, Kailun Yang, Hao Shi, Kaiwei Wang, and Jian Bai. 2022. Review on Panoramic Imaging and its Applications in Scene Understanding.IEEE Transac- tions on Instrumentation and Measurement71 (2022), 1–34
2022
-
[8]
Tao He, Lianli Gao, Jingkuan Song, and Yuan-Fang Li. 2022. Towards Open- Vocabulary Scene Graph Generation with Prompt-Based Finetuning. InEuropean Conference on Computer Vision. Springer, Tel Aviv, 56–73
2022
-
[9]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al
-
[10]
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations.International Journal of Computer Vision123, 1 (2017), 32–73
2017
-
[11]
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al. 2020. The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale.International Journal of Computer Vision128, 7 (2020), 1956–1981
2020
-
[12]
Yansheng Li, Kun Li, Yongjun Zhang, Linlin Wang, and Dingwen Zhang. 2024. AUG: A New Dataset and an Efficient Model for Aerial Image Urban Scene Graph Generation.arXiv preprint arXiv:2404.07788(2024), 1–16
arXiv 2024
-
[13]
Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, et al . 2025. STAR: A First-Ever Dataset and a Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery.IEEE Transactions on Pattern Analysis and Machine Intelligence47, 3 (2025), 1832–1849
2025
-
[14]
Yuanzhi Liang, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, and Tao Mei
-
[15]
InProceedings of the IEEE/CVF International Conference on Computer Vision
VrR-VG: Refocusing Visually-Relevant Relationships. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Seoul, 10403–10412
-
[16]
Julian Lorenz, Florian Barthel, Daniel Kienzle, and Rainer Lienhart. 2023. Haystack: A Panoptic Scene Graph Dataset to Evaluate Rare Predicate Classes. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Paris, 62–70
2023
-
[17]
Jeeseung Park, Jin-Woo Park, and Jong-Seok Lee. 2023. VIPLO: Vision Trans- former Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17152–17162
2023
-
[18]
Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased Scene Graph Generation from Biased Training. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Seattle, 3716– 3725
2020
-
[19]
Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene Graph Generation by Iterative Message Passing. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, 5410–5419
2017
-
[20]
Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, and Ziwei Liu. 2022. Panoptic Scene Graph Generation. InEuropean Conference on Computer Vision. Springer, Tel Aviv, 178–196
2022
-
[21]
Kaiyu Yang, Olga Russakovsky, and Jia Deng. 2019. SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Seoul, 2051–2060
2019
-
[22]
Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural Motifs: Scene Graph Parsing with Global Context. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, 5831–5840
2018
-
[23]
Ruonan Zhang, Gaoyun An, Yiqing Hao, and Dapeng Oliver Wu. 2024. Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 11 (2024), 7102–7119
2024
-
[24]
Yong Zhang, Yingwei Pan, Ting Yao, Rui Huang, Tao Mei, and Chang-Wen Chen
-
[25]
InProceedings of the IEEE/CVF International Conference on Computer Vision
Learning to Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space. InProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE/CVF, Vancouver, 2915–2924
-
[26]
Zijian Zhou, Zheng Zhu, Holger Caesar, and Miaojing Shi. 2024. OpenPSG: Open-Set Panoptic Scene Graph Generation via Large Multimodal Models. In European Conference on Computer Vision. Springer, Milan, 199–215
2024
-
[27]
Minghao Zou, Qingtian Zeng, Yongping Miao, Shangkun Liu, Zilong Wang, Hantao Liu, and Wei Zhou. 2025. PhysLab: A Benchmark Dataset for Multi- Granularity Visual Parsing of Physics Experiments. InProceedings of the 33rd ACM International Conference on Multimedia. ACM, Dublin, 12799–12806
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.