pith. sign in

arxiv: 2507.15736 · v2 · submitted 2025-07-21 · 💻 cs.CL

IDRBench: Understanding the Capability of Large Language Models on Interdisciplinary Research

Pith reviewed 2026-05-19 03:44 UTC · model grok-4.3

classification 💻 cs.CL
keywords interdisciplinary researchlarge language modelsbenchmarkknowledge integrationidea recommendationevaluation frameworkcross-disciplinary tasks
0
0 comments X

The pith

IDRBench offers the first comprehensive benchmark for large language models in interdisciplinary research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces IDRBench to evaluate how well large language models can perform tasks that require combining knowledge from multiple disciplines. This matters because significant innovation often emerges from bridging separate fields, and LLMs may help if their current abilities are properly understood and measured. The framework defines three tasks—IDR Paper Identification, IDR Idea Integration, and IDR Idea Recommendation—along with datasets to create concrete evaluations. Analysis of ten mainstream LLMs then provides behavioral insights and establishes initial benchmarks and baselines.

Core claim

IDRBench is the first framework to comprehensively investigate LLMs' interdisciplinary research capability through datasets and three tasks: IDR Paper Identification, IDR Idea Integration, and IDR Idea Recommendation, with evaluations on ten mainstream LLMs providing analysis and setting benchmarks for future work.

What carries the argument

The IDRBench framework consisting of datasets and three specific evaluation tasks designed to measure LLMs' ability to integrate knowledge across disciplines.

If this is right

  • Establishes standardized benchmarks and baselines that future work on LLMs for cross-disciplinary tasks can track and improve upon.
  • Reveals specific patterns in how current LLMs handle identification, integration, and recommendation of ideas from different fields.
  • Creates a foundation for AI systems intended to assist researchers in locating and combining knowledge across disciplinary boundaries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Models that improve on the recommendation task might be deployed to automatically surface potential new research directions that combine insights from distant fields.
  • The benchmark approach could be adapted to measure AI performance on other forms of creative knowledge synthesis beyond research papers.
  • Scores on IDRBench could serve as a selection criterion when choosing or adapting LLMs for real collaborative projects involving multiple disciplines.

Load-bearing premise

The three defined tasks serve as valid and sufficient proxies for measuring genuine interdisciplinary research capability in LLMs.

What would settle it

A controlled study in which human domain experts judge that high-scoring LLM outputs on the three IDRBench tasks do not produce or validate genuinely novel interdisciplinary research contributions.

Figures

Figures reproduced from arXiv: 2507.15736 by Daniel Xavier de Sousa, Hongyu Guo, Ricardo Mar\c{c}al, Xiaodan Zhu, Yuanhao Shen.

Figure 1
Figure 1. Figure 1: Triplet data format in IDR￾Bench - Showing that Papers PB and PC are integrated (more than merely referenced) to generate the IDR Paper PA. To obtain a positive triplet, the annotators need to identify PA from the candidate pool, then figure out the key cited papers PB and PC that play central roles in deriving the IDR idea. To contribute to understanding LLMs’ abilities in IDR, we took a small step to int… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of tasks IPI, I3, and I2R within IDRBench. Orange and green arrows stand [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of binary discipline combinations for ArXiv data and positive samples in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 7
Figure 7. Figure 7: Finally, they are asked to annotate the specific sentence(s) in this IDR paper that specifically [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 4
Figure 4. Figure 4: List of papers available for the annotator to choose from. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Task 1, displayed after the annotator selects a paper. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Task 2, displayed if the annotator answers "Yes" in Task 1. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Task 3, displayed after completing Task 2. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Task 4, displayed after completing Task 3. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Review page, where the annotator can review and optionally go back and edit their answers. [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative comparison sample on both idea integration reasoning and full abstract [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
read the original abstract

Innovation is a key driving force of human civilization. As the body of knowledge has grown considerably, bridging knowledge across different disciplines, where significant innovation often emerges, has become increasingly challenging. The recent advancements in machine learning models, particularly Large Language Models (LLMs), have provided effective access to extensive knowledge sources and shown impressive abilities in reasoning, rendering significant opportunities for interdisciplinary discovery. Our research aims to understand the capabilities of state-of-the-art LLMs in integrating knowledge from different fields for interdisciplinary research (IDR). To address this fundamental problem, we introduce IDRBench, a pioneering framework that includes both datasets and evaluation tasks: (1) IDR Paper Identification, (2) IDR Idea Integration, and (3) IDR Idea Recommendation. Our study on ten mainstream LLMs provides a comprehensive analysis of their behavior and establishes benchmarks and baselines for future research. To the best of our knowledge, IDRBench is the first to provide a comprehensive investigation of LLMs' IDR capability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces IDRBench, a benchmark framework with three tasks (IDR Paper Identification, IDR Idea Integration, and IDR Idea Recommendation) and associated datasets to evaluate how well large language models can perform interdisciplinary research. It reports results on ten mainstream LLMs and claims this is the first comprehensive investigation of LLMs' IDR capabilities.

Significance. If the tasks are shown to be valid proxies, the benchmark would provide useful baselines and a starting point for measuring and improving LLMs' ability to integrate knowledge across fields, which is relevant given the role of interdisciplinarity in innovation.

major comments (2)
  1. [Task definitions and evaluation setup] The central claim that the three tasks measure LLMs' IDR capability rests on their validity as proxies, yet the manuscript reports no human expert validation, inter-annotator agreement scores, or correlation analysis showing that model performance tracks actual interdisciplinary synthesis (as opposed to retrieval or pattern matching on the curated data).
  2. [Evaluation and results] No human baselines or expert ratings of task realism are provided for any of the three tasks, which is required to interpret the LLM performance numbers as evidence of genuine IDR capability rather than benchmark-specific artifacts.
minor comments (2)
  1. [Abstract and introduction] The abstract's novelty claim ('to the best of our knowledge, IDRBench is the first') would be strengthened by a dedicated related-work subsection that explicitly contrasts the new tasks against prior LLM benchmarks on cross-domain reasoning or knowledge integration.
  2. [Dataset construction] Clarify the dataset construction process, including source selection criteria and any steps taken to mitigate domain-specific biases, to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the presentation of IDRBench as a valid benchmark for LLM interdisciplinary research capabilities.

read point-by-point responses
  1. Referee: [Task definitions and evaluation setup] The central claim that the three tasks measure LLMs' IDR capability rests on their validity as proxies, yet the manuscript reports no human expert validation, inter-annotator agreement scores, or correlation analysis showing that model performance tracks actual interdisciplinary synthesis (as opposed to retrieval or pattern matching on the curated data).

    Authors: We agree that explicit validation of the tasks as proxies for genuine IDR is important for interpreting the results. The three tasks were constructed by drawing on established definitions and examples of interdisciplinary research from the scholarly literature, with datasets drawn from real cross-field publications. Nevertheless, we acknowledge that the current manuscript does not include human expert validation or inter-annotator agreement statistics. In the revised version we will add a dedicated validation subsection that reports expert review of task realism on a sampled subset of instances together with inter-annotator agreement scores. Direct correlation analysis with downstream research impact would require longitudinal outcome data that is not yet available for this initial benchmark; we will note this limitation and identify it as an important avenue for follow-up studies. revision: partial

  2. Referee: [Evaluation and results] No human baselines or expert ratings of task realism are provided for any of the three tasks, which is required to interpret the LLM performance numbers as evidence of genuine IDR capability rather than benchmark-specific artifacts.

    Authors: We appreciate the referee's emphasis on the need for human baselines to contextualize the reported LLM scores. In the revised manuscript we will include human performance baselines collected from domain experts on a representative sample of each task, along with expert ratings of task realism. These additions will allow readers to better distinguish benchmark-specific effects from broader IDR capabilities and will be presented alongside the existing LLM results. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark creation is self-contained empirical contribution

full rationale

The paper introduces IDRBench as a new framework consisting of three explicitly defined tasks (IDR Paper Identification, IDR Idea Integration, IDR Idea Recommendation) along with associated datasets and an evaluation of ten LLMs. No equations, fitted parameters, or derivations are present that reduce to prior inputs by construction. The central claim of providing the first comprehensive investigation rests on the novelty of the benchmark itself rather than any self-citation chain or self-definitional loop. The tasks are presented as proxies by design choice, not derived from fitted results or prior author work invoked as uniqueness theorems. This is a standard empirical benchmark paper with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the domain assumption that the three constructed tasks adequately represent interdisciplinary research capability; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Interdisciplinary research capability in LLMs can be meaningfully decomposed into the tasks of paper identification, idea integration, and idea recommendation.
    The benchmark framework is built directly on these three tasks as the core evaluation components.

pith-pipeline@v0.9.0 · 5714 in / 1219 out tokens · 29497 ms · 2026-05-19T03:44:41.735424+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 7 internal anchors

  1. [1]

    Claude 3.7 Sonnet

    Anthropic. Claude 3.7 Sonnet. https://www.anthropic.com/claude/sonnet. Accessed: 2025-05-15. 2024

  2. [2]

    S ci BERT : A Pretrained Language Model for Scientific Text

    Iz Beltagy, Kyle Lo, and Arman Cohan. “SciBERT: A Pretrained Language Model for Sci- entific Text”. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Pro- cessing (EMNLP-IJCNLP). Ed. by Kentaro Inui et al. Hong Kong, China: Association for Computational...

  3. [3]

    SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

    Ben Bogin et al. “SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories”. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Ed. by Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 12622– 12645. DOI: 10....

  4. [4]

    Mapping the backbone of science

    Kevin W. Boyack, Richard Klavans, and Katy Börner. “Mapping the backbone of science”. In: Scientometrics 64.3 (2005), pp. 351–374. ISSN : 1588-2861. DOI: 10.1007/s11192-005- 0255-6. URL: https://doi.org/10.1007/s11192-005-0255-6

  5. [5]

    Language Models are Few-Shot Learners

    Tom Brown et al. “Language Models are Few-Shot Learners”. In:Advances in Neural Infor- mation Processing Systems. Ed. by H. Larochelle et al. V ol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. URL: https://proceedings.neurips.cc/paper_files/paper/2020/ file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

  6. [7]

    An Overview of Diffusion Models for Text Generation

    Helena ˇCeovi´c et al. “An Overview of Diffusion Models for Text Generation”. In:2023 46th MIPRO ICT and Electronics Convention (MIPRO) . 2023, pp. 941–946. DOI: 10 . 23919 / MIPRO57284.2023.10159911

  7. [8]

    Mean Reciprocal Rank

    Nick Craswell. “Mean Reciprocal Rank”. In: Encyclopedia of Database Systems . Ed. by LING LIU and M. TAMER ÖZSU. Boston, MA: Springer US, 2009, pp. 1703–1703. ISBN : 978-0-387-39940-9. DOI: 10.1007/978-0-387-39940-9_488 . URL: https://doi.org/ 10.1007/978-0-387-39940-9_488

  8. [9]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforce- ment Learning. 2025. arXiv: 2501.12948 [cs.CL]. URL: https://arxiv.org/abs/2501. 12948

  9. [10]

    Meta-learning Approaches for Few-Shot Learning: A Survey of Recent Advances

    Hassan Gharoun et al. “Meta-learning Approaches for Few-Shot Learning: A Survey of Recent Advances”. In: ACM Comput. Surv. 56.12 (July 2024). ISSN : 0360-0300. DOI: 10.1145/ 3659943. URL: https://doi.org/10.1145/3659943

  10. [11]

    The Llama 3 Herd of Models

    Aaron Grattafiori et al. The Llama 3 Herd of Models . 2024. arXiv: 2407.21783 [cs.AI] . URL: https://arxiv.org/abs/2407.21783

  11. [12]

    Ideabench: Benchmarking large language models for research idea generation

    Sikun Guo et al. IdeaBench: Benchmarking Large Language Models for Research Idea Generation. 2024. arXiv: 2411.02429 [cs.CL] . URL: https://arxiv.org/abs/2411. 02429

  12. [13]

    Preliminary study on Wilcoxon learning machines

    Jer Guang Hsieh, Yih Lon Lin, and Jyh Horng Jeng. “Preliminary study on Wilcoxon learning machines”. In: Journal of IEEE Transactions on Neural Networks and Learning Systems 19.2 (2008), pp. 201–211

  13. [14]

    16 Karen Spärck Jones

    Kalervo Järvelin and Jaana Kekäläinen. “Cumulated gain-based evaluation of IR techniques”. In: ACM Trans. Inf. Syst. 20.4 (Oct. 2002), pp. 422–446. ISSN : 1046-8188. DOI: 10.1145/ 582415.582418. URL: https://doi.org/10.1145/582415.582418

  14. [15]

    Convergence: Facilitating Transdisciplinary Integration of Life Sciences

    Committee on Key Challenge Areas for Convergence, Health; Board on Life Sciences; Di- vision on Earth, and Life Studies; National Research Council. Convergence: Facilitating Transdisciplinary Integration of Life Sciences. Washington, DC: The National Academies Press, 2014. ISBN : 978-0-309-30151-0. DOI: 10.17226/18722 . URL: https://pubmed. ncbi.nlm.nih.g...

  15. [16]

    The Eureka Moment

    Guenther Knoblich and Michael Oellinger. “The Eureka Moment”. In: Scientific American Mind 17.5 (2006), pp. 38–43. ISSN : 15552284, 2331379X. URL: http://www.jstor.org/ stable/24921587 (visited on 05/15/2025)

  16. [17]

    Large language models are zero-shot reasoners

    Takeshi Kojima et al. “Large language models are zero-shot reasoners”. In:Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22. New Orleans, LA, USA: Curran Associates Inc., 2022. ISBN : 9781713871088

  17. [18]

    Advances and challenges in artificial intelligence text generation

    Bing Li et al. “Advances and challenges in artificial intelligence text generation”. In:Frontiers of Information Technology & Electronic Engineering 25.1 (Jan. 2024), pp. 64–83. ISSN : 2095-9230. DOI: 10.1631/FITEE.2300410 . URL: https://doi.org/10.1631/FITEE. 2300410

  18. [19]

    ROUGE: A Package for Automatic Evaluation of Summaries

    Chin-Yew Lin. “ROUGE: A Package for Automatic Evaluation of Summaries”. In: Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, July 2004, pp. 74–81. URL: https://aclanthology.org/W04-1013/

  19. [20]

    Evaluating and enhancing large language models for novelty assessment in scholarly publications

    Ethan Lin, Zhiyuan Peng, and Yi Fang. Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications. 2024. arXiv: 2409.16605 [cs.CL]. URL: https://arxiv.org/abs/2409.16605

  20. [21]

    AIGC-Enabled Interdisciplinary Science Measurement

    Jiangfeng Liu et al. “AIGC-Enabled Interdisciplinary Science Measurement”. In: Wisdom, Well-Being, Win-Win. Ed. by Isaac Sserwanga et al. Cham: Springer Nature Switzerland, 2024, pp. 161–170. ISBN : 978-3-031-57850-2

  21. [22]

    arXiv preprint arXiv:2409.12538 (2024)

    Yiren Liu et al. “Personaflow: Boosting research ideation with llm-simulated expert personas”. In: arXiv preprint arXiv:2409.12538 (2024)

  22. [23]

    ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition

    Yujie Liu et al. ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration- Based Task Decomposition. 2025. arXiv: 2503.21248 [cs.CL] . URL: https://arxiv. org/abs/2503.21248

  23. [24]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Chris Lu et al. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

  24. [25]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    arXiv: 2408.06292 [cs.AI]. URL: https://arxiv.org/abs/2408.06292

  25. [26]

    Quantifying and addressing uncertainty in the measurement of interdisciplinarity

    Maryam Nakhoda, Peter Whigham, and Sander Zwanenburg. “Quantifying and addressing uncertainty in the measurement of interdisciplinarity”. In: Scientometrics 128.11 (Sept. 2023), pp. 6107–6127. ISSN : 0138-9130. DOI: 10.1007/s11192- 023- 04822- 2 . URL: https: //doi.org/10.1007/s11192-023-04822-2

  26. [27]

    Introducing OpenAI o1-preview

    OpenAI. Introducing OpenAI o1-preview. https://openai.com/index/introducing- openai-o1-preview/. Accessed: 2025-05-15. 2024

  27. [28]

    GPT-4 Technical Report

    OpenAI et al. GPT-4 Technical Report. 2024. arXiv: 2303.08774 [cs.CL] . URL: https: //arxiv.org/abs/2303.08774

  28. [29]

    BLEU: a method for automatic evaluation of machine translation

    Kishore Papineni et al. “BLEU: a method for automatic evaluation of machine translation”. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACL ’02. Philadelphia, Pennsylvania: Association for Computational Linguistics, 2002, pp. 311–

  29. [30]

    B leu: a Method for Automatic Evaluation of Machine Translation

    DOI: 10.3115/1073083.1073135 . URL: https://doi.org/10.3115/1073083. 1073135

  30. [31]

    11 Céline McKeown

    Marissa Radensky et al. Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination. 2025. arXiv: 2409.14634 [cs.HC] . URL: https: //arxiv.org/abs/2409.14634

  31. [32]

    Multi-, Inter-, and Transdisciplinarity within the Public Health Workforce: A Scoping Review to Assess Definitions and Applications of Concepts

    Kerstin Sell et al. “Multi-, Inter-, and Transdisciplinarity within the Public Health Workforce: A Scoping Review to Assess Definitions and Applications of Concepts”. In: International Journal of Environmental Research and Public Health19.17 (2022). ISSN : 1660-4601. DOI: 10.3390/ijerph191710902. URL: https://www.mdpi.com/1660-4601/19/17/10902

  32. [33]

    Perfect absorption in complex scattering systems with or without hidden symmetries,

    James Shi Feng ; Evans. “Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines”. In: Conference on Human Factors in Computing Systems | CHI Workshop 2024(2024). DOI: 10.1038/s41467- 023-36741-4

  33. [34]

    Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

    Chenglei Si, Diyi Yang, and Tatsunori Hashimoto. “Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers”. In: The Thirteenth International Conference on Learning Representations. 2025. URL: https://openreview.net/forum? id=M23dTGWCZy

  34. [35]

    Automated content analysis and crisis communication research

    Toni GLA Van Der Meer. “Automated content analysis and crisis communication research”. In: Public Relations Review 42.5 (2016), pp. 952–961. 12

  35. [36]

    A Theoretical Analysis of NDCG Type Ranking Measures

    Yining Wang et al. A Theoretical Analysis of NDCG Type Ranking Measures. 2013. arXiv: 1304.6480 [cs.LG]. URL: https://arxiv.org/abs/1304.6480

  36. [37]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei et al. “Chain-of-thought prompting elicits reasoning in large language models”. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22. New Orleans, LA, USA: Curran Associates Inc., 2022.ISBN : 9781713871088

  37. [38]

    Identifying multidisciplinary problems from scientific publications based on a text generation method

    Ziyan Xu et al. “Identifying multidisciplinary problems from scientific publications based on a text generation method”. In: Journal of Data and Information Science 9.3 (2024), pp. 213–237. DOI: 10.2478/jdis-2024-0021. URL: https://doi.org/10.2478/jdis-2024-0021

  38. [39]

    URL https://doi.org/10.18653/v1/2024.findings- emnlp.420

    Zonglin Yang et al. “Large Language Models for Automated Open-domain Scientific Hypothe- ses Discovery”. In: Findings of the Association for Computational Linguistics: ACL 2024. Ed. by Lun-Wei Ku, Andre Martins, and Vivek Srikumar. Bangkok, Thailand: Association for Com- putational Linguistics, Aug. 2024, pp. 13545–13565. DOI: 10.18653/v1/2024.findings- a...

  39. [40]

    MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

    Zonglin Yang et al. “MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses”. In: The Thirteenth International Conference on Learning Representations. 2025. URL: https://openreview.net/forum?id=X9OfMNNepI

  40. [41]

    DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration

    Chengbo Zheng et al. “DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration”. In: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. UIST ’24. Pittsburgh, PA, USA: Association for Computing Machinery, 2024. ISBN : 9798400706288. DOI: 10.1145/3654777.3676366 . URL: https://doi.o...