Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety
Pith reviewed 2026-05-19 05:02 UTC · model grok-4.3
The pith
Dual knowledge graphs enable high-accuracy multi-hop reasoning over complex safety regulations for automated compliance checking.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BifrostRAG models both linguistic relationships and document structure using dual knowledge graphs. It employs a hybrid retrieval mechanism that combines graph traversal with vector-based semantic search. On a multi-hop question dataset, this yields 92.8% precision, 85.5% recall, and an F1 score of 87.3%, significantly outperforming vector-only and graph-only RAG baselines and serving as a robust knowledge engine for LLM-driven compliance checking.
What carries the argument
The dual-graph hybrid retrieval mechanism, which integrates linguistic relationship graphs with structural document graphs to support multi-hop synthesis across regulatory clauses.
If this is right
- Supports accurate synthesis of information across interlinked clauses in regulatory texts.
- Outperforms traditional single-graph or vector-based retrieval methods in precision and recall for compliance queries.
- Offers a blueprint for applying similar dual-graph approaches to complex technical documents in other engineering fields.
- Enhances the reliability of LLM-based systems for automated construction compliance checking.
Where Pith is reading between the lines
- Applying this to other regulatory areas like building codes or environmental standards could yield similar gains in query handling.
- Future work might test integration with real-time project data to flag compliance issues proactively during construction planning.
- Scalability to larger regulatory corpora without performance loss would need verification in expanded datasets.
Load-bearing premise
The linguistic and structural relationships captured by the dual knowledge graphs are sufficient to support accurate multi-hop reasoning over the full complexity of regulatory text without additional domain-specific tuning or post-processing.
What would settle it
Testing the system on a new multi-hop dataset derived from a different set of construction regulations where the F1 score falls below that of vector-only or graph-only baselines would indicate the dual-graph approach does not generalize as claimed.
Figures
read the original abstract
Information retrieval and question answering from safety regulations are essential for automated construction compliance checking but are hindered by the linguistic and structural complexity of regulatory text. Many queries are multi-hop, requiring synthesis across interlinked clauses. To address the challenge, this paper introduces BifrostRAG, a dual-graph retrieval-augmented generation (RAG) system that models both linguistic relationships and document structure. The proposed architecture supports a hybrid retrieval mechanism that combines graph traversal with vector-based semantic search, enabling large language models to reason over both the content and the structure of the text. On a multi-hop question dataset, BifrostRAG achieves 92.8% precision, 85.5% recall, and an F1 score of 87.3%. These results significantly outperform vector-only and graph-only RAG baselines, establishing BifrostRAG as a robust knowledge engine for LLM-driven compliance checking. The dual-graph, hybrid retrieval mechanism presented in this paper offers a transferable blueprint for navigating complex technical documents across knowledge-intensive engineering domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BifrostRAG, a dual knowledge graph RAG system that models both linguistic relationships and document structure for multi-hop question answering over construction safety regulations. It uses hybrid retrieval combining graph traversal and vector-based search to support LLM reasoning, reporting 92.8% precision, 85.5% recall, and 87.3% F1 on a multi-hop question dataset while outperforming vector-only and graph-only baselines.
Significance. If the empirical results hold under proper validation, the dual-graph hybrid approach offers a practical blueprint for navigating complex regulatory texts in engineering domains. The work directly targets a real-world need in automated compliance checking and could transfer to other knowledge-intensive technical documents.
major comments (2)
- Abstract: The central claim of robustness for multi-hop reasoning rests on the reported metrics (92.8% precision, 85.5% recall, 87.3% F1) significantly outperforming baselines, yet no details are supplied on dataset size, construction method (expert annotation vs. synthetic/LLM-generated), clause-type diversity, or whether the test questions were held out from dual-graph construction. This omission is load-bearing because it prevents distinguishing genuine generalization from possible dataset artifacts that exploit the exact linguistic/structural links modeled by the graphs.
- Evaluation (implied by abstract claims): Without reported statistical significance tests, error analysis, baseline implementation details, or dataset statistics, it is impossible to assess whether the outperformance demonstrates sufficiency for full regulatory complexity or merely reflects a small or specially crafted test set.
Simulated Author's Rebuttal
Thank you for reviewing our manuscript and providing valuable feedback. We have carefully considered the major comments regarding the transparency of our dataset and evaluation details. We address each point below and have revised the manuscript to include the requested information.
read point-by-point responses
-
Referee: Abstract: The central claim of robustness for multi-hop reasoning rests on the reported metrics (92.8% precision, 85.5% recall, 87.3% F1) significantly outperforming baselines, yet no details are supplied on dataset size, construction method (expert annotation vs. synthetic/LLM-generated), clause-type diversity, or whether the test questions were held out from dual-graph construction. This omission is load-bearing because it prevents distinguishing genuine generalization from possible dataset artifacts that exploit the exact linguistic/structural links modeled by the graphs.
Authors: We agree with the referee that these details are crucial for evaluating the generalizability of our results. Although the full manuscript describes the dataset construction in Section 4.1, we recognize that the abstract lacked this information. We have revised the abstract to briefly note the dataset size and expert annotation process. Furthermore, we have added explicit statements confirming that the test questions were held out from the dual-graph construction and included statistics on clause-type diversity in the revised Experiments section. revision: yes
-
Referee: Evaluation (implied by abstract claims): Without reported statistical significance tests, error analysis, baseline implementation details, or dataset statistics, it is impossible to assess whether the outperformance demonstrates sufficiency for full regulatory complexity or merely reflects a small or specially crafted test set.
Authors: We acknowledge that the original submission did not include statistical significance tests or a dedicated error analysis. In the revised version, we have added these elements: we now report p-values from appropriate statistical tests showing the significance of the performance improvements. A new error analysis subsection discusses the remaining failure cases and their implications for regulatory complexity. We have also expanded the baseline descriptions with implementation details and added a table with comprehensive dataset statistics to better characterize the test set. revision: yes
Circularity Check
No circularity detected; results rest on external dataset and baselines
full rationale
The paper presents BifrostRAG as an architectural system combining dual knowledge graphs with hybrid retrieval for multi-hop QA over regulatory text. Performance is reported as empirical metrics (92.8% precision, 85.5% recall, 87.3% F1) on a multi-hop question dataset, with direct comparison to vector-only and graph-only RAG baselines. No equations, derivations, fitted parameters, or self-citations appear in the provided text that would reduce these outcomes to inputs by construction. The evaluation chain relies on an external test set and standard retrieval baselines, remaining self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BifrostRAG: dual knowledge graph architecture... Entity Network Graph (linguistic relationships) and Document Navigator Graph (hierarchical/cross-reference structure)... hybrid retrieval combining graph traversal with vector-based semantic search
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Evaluation on 93 multi-hop questions... precision 92.8%, recall 85.5%, F1 87.3%
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Q. Chen, D. Long, C. Yang, H. Xu, Knowledge graph improved dy- namic risk analysis method for behavior-based safety management on a construction site, Journal of Management in Engineering 39 (4) (2023) 04023023. doi:10.1061/JMENEA.MEENG-5306
-
[2]
Y . Lu, Q. Li, Z. Zhou, Y . Deng, Ontology-based knowledge modeling for automated construction safety checking, Safety Science 79 (2015) 11–18. doi:10.1016/j.ssci.2015.05.008
-
[3]
S. Choe, S. Yun, F. Leite, Analysis of the e ffectiveness of the osha steel erection standard in the construction industry, Safety Science 89 (2016) 190–200. doi:10.1016/j.ssci.2016.06.016
-
[4]
P. K. Howard, The Death of Common Sense: How Law Is Su ffocating America, Random House Publishing Group, 2011
work page 2011
-
[5]
P. Schulte, A. Okun, C. Stephenson, M. Colligan, H. Ahlers, C. Gjessing, G. Loos, R. Niemeier, M. Sweeney, Information dissemination and use: Critical components in occupational safety and health, American Journal of Industrial Medicine 44 (5) (2003) 515–531. doi:10.1002/ajim. 10295
-
[6]
W. Solihin, C. Eastman, A knowledge representation approach in bim rule requirement analysis using the conceptual graph, Journal of Information Technology in Construction (ITcon) 21 (24) (2016) 370–401
work page 2016
-
[7]
Y . Zhou, W. Solihin, J. K. W. Yeoh, Facilitating knowledge transfer dur- ing code compliance checking using conceptual graphs, Journal of Com- puting in Civil Engineering 37 (5) (2023) 05023001. doi:10.1061/ JCCEE5.CPENG-4884
work page 2023
-
[9]
H. Wu, B. Zhong, H. Li, P. Love, X. Pan, N. Zhao, Combining com- puter vision with semantic reasoning for on-site safety management in construction, Journal of Building Engineering 42 (2021) 103036. doi: 10.1016/j.jobe.2021.103036
-
[10]
A. S. Kulinan, M. Park, P. P. W. Aung, G. Cha, S. Park, Advancing construction site workforce safety monitoring through bim and com- puter vision integration, Automation in Construction 158 (2024) 105227. doi:10.1016/j.autcon.2023.105227
-
[11]
D. Cui, S. Xu, S. Wang, K. Zhang, Beyond the images: Comprehensible unsafe behaviour recognition boosted by joint inference graph with multi- hop reasoning, Advanced Engineering Informatics 66 (2025) 103454. doi:10.1016/j.aei.2025.103454
- [12]
-
[13]
J. Lee, S. Ahn, D. Kim, D. Kim, Performance comparison of retrieval- augmented generation and fine-tuned large language models for construc- tion safety management knowledge retrieval, Automation in Construction 168 (2024) 105846
work page 2024
- [14]
-
[15]
C. Wu, W. Ding, Q. Jin, J. Jiang, R. Jiang, Q. Xiao, L. Liao, X. Li, Re- trieval augmented generation-driven information retrieval and question answering in construction management, Advanced Engineering Informat- ics 65 (2025) 103158. doi:10.1016/j.aei.2025.103158
-
[16]
L. Guo, F. Yan, T. Li, T. Yang, Y . Lu, An automatic method for con- structing machining process knowledge base from knowledge graph, Robotics and Computer-Integrated Manufacturing 73 (2022) 102222. doi:10.1016/j.rcim.2021.102222
- [18]
- [19]
-
[20]
X. Wang, N. El-Gohary, Deep learning-based named entity recognition and resolution of referential ambiguities for enhanced information ex- traction from construction safety regulations, Journal of Computing in Civil Engineering 37 (5) (2023) 04023023. doi:10.1061/(ASCE)CP. 1943-5487.0001064
- [21]
-
[22]
H. Wang, S. Xu, D. Cui, H. Xu, H. Luo, Information integration of regulation texts and tables for automated construction safety knowledge mapping, Journal of Construction Engineering and Management 150 (5) (2024) 04024034. doi:10.1061/JCEMD4.COENG-14436
-
[23]
V . Mavi, A. Jangra, A. Jatowt, Multi-hop question answering, Foun- dations and Trends ® in Information Retrieval 17 (5) (2024) 457–586. doi:10.1561/1500000102
- [24]
-
[25]
H. Paulheim, Knowledge graph refinement: A survey of approaches and evaluation methods, Semantic Web 8 (3) (2017) 489–508. doi: 10.3233/SW-160218
-
[26]
A. Hogan, E. Blomqvist, M. Cochez, C. D’amato, G. De Melo, C. Gutier- rez, S. Kirrane, et al., Knowledge graphs, ACM Computing Surveys 54 (4) (2022) 1–37. doi:10.1145/3447772
-
[27]
S. Malyshev, M. Kr ¨otzsch, L. Gonz ´alez, J. Gonsior, A. Bielefeldt, Get- ting the most out of wikidata: Semantic technology usage in wikipedia’s knowledge graph, in: The Semantic Web – ISWC 2018, 2018, pp. 376– 394
work page 2018
- [28]
-
[29]
X. Zou, A survey on application of knowledge graph, Journal of Physics: Conference Series 1487 (1) (2020) 012016.doi:10.1088/1742-6596/ 1487/1/012016
-
[30]
J. Qian, X.-Y . Li, C. Zhang, L. Chen, T. Jung, J. Han, Social network de- anonymization and privacy inference with knowledge graph model, IEEE Transactions on Dependable and Secure Computing 16 (4) (2019) 679–
work page 2019
-
[31]
doi:10.1109/TDSC.2017.2697854
-
[32]
Z. Wang, T. Chen, J. Ren, W. Yu, H. Cheng, L. Lin, Deep reasoning with knowledge graph for social relationship understanding, arXiv (2018). doi:10.48550/arXiv.1807.00504
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1807.00504 2018
-
[33]
S. Zhu, J. Zhou, L. Cheng, X. Fu, Y . Wang, K. Dai, Research on a bim model quality compliance checking method based on a knowledge graph, Journal of Computing in Civil Engineering 39 (1) (2025) 04024049.doi: 10.1061/JCCEE5.CPENG-5950
-
[34]
V . K. Kommineni, B. K ¨onig-Ries, S. Samuel, From human experts to machines: An llm supported approach to ontology and knowledge graph construction, arXiv (2024). doi:10.48550/arXiv.2403.08345
-
[35]
L. Asprino, E. Daga, A. Gangemi, P. Mulholland, Knowledge graph con- struction with a fac ¸ade: A unified method to access heterogeneous data sources on the web, ACM Transactions on Internet Technology 23 (1) (2023) 6:1–6:31. doi:10.1145/3555312
-
[36]
S. Ji, S. Pan, E. Cambria, P. Marttinen, P. S. Yu, A survey on knowledge graphs: Representation, acquisition, and applications, IEEE Transactions on Neural Networks and Learning Systems 33 (2) (2022) 494–514. doi: 10.1109/TNNLS.2021.3070843
-
[37]
L. Zhong, J. Wu, Q. Li, H. Peng, X. Wu, A comprehensive survey on au- tomatic knowledge graph construction, ACM Computing Surveys 56 (4) (2023) 94:1–94:62. doi:10.1145/3618295
-
[38]
T. Al-Moslmi, M. G. Ocana, A. L. Opdahl, C. Veres, Named entity extrac- tion for knowledge graphs: A literature overview, IEEE Access 8 (2020) 32862–32881. doi:10.1109/ACCESS.2020.2973928
-
[39]
K. Nassiri, M. Akhloufi, Transformer models used for text-based ques- tion answering systems, Applied Intelligence 53 (9) (2023) 10602–10635. 18 doi:10.1007/s10489-022-04052-8
-
[40]
L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, et al., A survey on hallucination in large language models: Principles, taxon- omy, challenges, and open questions, ACM Transactions on Information Systems 43 (2) (2025) 1–55. doi:10.1145/3703155
- [41]
-
[42]
Z. Liu, W. Ping, R. Roy, P. Xu, C. Lee, M. Shoeybi, B. Catanzaro, Chatqa: Surpassing gpt-4 on conversational qa and rag, Advances in Neural Infor- mation Processing Systems 37 (2025) 15416–15459
work page 2025
-
[43]
T. R. McIntosh, T. Liu, T. Susnjak, P. Watters, A. Ng, M. N. Halgamuge, A culturally sensitive test to evaluate nuanced gpt hallucination, IEEE Transactions on Artificial Intelligence 5 (6) (2024) 2739–2751. doi: 10.1109/TAI.2023.3332837
-
[44]
S. Pan, L. Luo, Y . Wang, C. Chen, J. Wang, X. Wu, Unifying large language models and knowledge graphs: A roadmap, IEEE Transac- tions on Knowledge and Data Engineering 36 (7) (2024) 3580–3599. doi:10.1109/TKDE.2024.3352100
-
[45]
Y . Zhu, X. Wang, J. Chen, S. Qiao, Y . Ou, Y . Yao, S. Deng, H. Chen, N. Zhang, Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities, World Wide Web 27 (5) (2024) 58. doi:10.1007/s11280-024-01297-w
-
[46]
L.-P. Meyer, J. Frey, F. Brei, N. Arndt, Assessing sparql capabilities of large language models, arXiv (2024). doi:10.48550/arXiv.2409. 05925
-
[47]
T. Taipalus, Vector database management systems: Fundamental con- cepts, use-cases, and current challenges, Cognitive Systems Research 85 (2024) 101216. doi:10.1016/j.cogsys.2024.101216
-
[48]
Y . Wan, Z. Chen, Y . Liu, C. Chen, M. Packianather, Empowering llms by hybrid retrieval-augmented generation for domain-centric q&a in smart manufacturing, Advanced Engineering Informatics 65 (2025) 103212. doi:10.1016/j.aei.2025.103212
-
[49]
X. Pan, W. Zhuang, S. Wen, W. Yu, J. Bao, X. Li, A context-aware kg- llm collaborated conceptual design approach for personalized products: A case in lower limbs rehabilitation assistive devices, Advanced Engi- neering Informatics 66 (2025) 103422. doi:10.1016/j.aei.2025. 103422
-
[50]
D. Zhang, G. Ma, T. Qu, X. Wang, W. Zhou, X. Wang, A knowl- edge graph-enhanced large language model for question answering of hy- draulic structure safety management, Advanced Engineering Informatics 66 (2025) 103468. doi:10.1016/j.aei.2025.103468
-
[51]
N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, V . Marsault, S. Plantikow, M. Rydberg, P. Selmer, A. Taylor, Cypher: An evolving query language for property graphs, in: Proceedings of the 2018 In- ternational Conference on Management of Data, 2018, pp. 1433–1445. doi:10.1145/3183713.3190657
-
[52]
F. Sammour, J. Xu, X. Wang, M. Hu, Z. Zhang, Responsible ai in construction safety: Systematic evaluation of large language models and prompt engineering, arXiv (2024). doi:10.48550/arXiv.2411. 08320
-
[53]
J. Pujara, E. Augustine, L. Getoor, Sparsity and noise: Where knowl- edge graph embeddings fall short, in: Proceedings of the 2017 Confer- ence on Empirical Methods in Natural Language Processing, 2017, pp. 1751–1756. doi:10.18653/v1/D17-1184
-
[54]
R. Omar, O. Mangukiya, P. Kalnis, E. Mansour, Chatgpt versus tradi- tional question answering for knowledge graphs: Current status and fu- ture directions towards knowledge graph chatbots, arXiv (2023). doi: 10.48550/arXiv.2302.06466
-
[55]
Y . Tan, D. Min, Y . Li, W. Li, N. Hu, Y . Chen, G. Qi, Can chatgpt replace traditional kbqa models? an in-depth analysis of the question answering performance of the gpt llm family, in: The Semantic Web - ISWC 2023, 2023, pp. 348–367
work page 2023
- [57]
- [58]
- [59]
-
[60]
X. Xue, J. Zhang, Y . Chen, Question-answering framework for build- ing codes using fine-tuned and distilled pre-trained transformer mod- els, Automation in Construction 168 (2024) 105730. doi:10.1016/j. autcon.2024.105730
work page doi:10.1016/j 2024
-
[61]
K. Jeon, G. Lee, Hybrid large language model approach for prompt and sensitive defect management: A comparative analysis of hybrid, non- hybrid, and graphrag approaches, Advanced Engineering Informatics 64 (2025) 103076. doi:10.1016/j.aei.2024.103076
- [62]
-
[63]
J. Zhang, N. M. El-Gohary, Semantic nlp-based information extrac- tion from construction regulatory documents for automated compliance checking, Journal of Computing in Civil Engineering 30 (2) (2016) 04015014. doi:10.1061/(ASCE)CP.1943-5487.0000346
-
[64]
R. Zhang, N. El-Gohary, A deep neural network-based method for deep information extraction using transfer learning strategies to support au- tomated compliance checking, Automation in Construction 132 (2021) 103834. doi:10.1016/j.autcon.2021.103834. 19
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.