pith. machine review for the scientific record. sign in

arxiv: 2604.04942 · v1 · submitted 2026-03-13 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:18 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords persistent homologyChain-of-Thoughtreasoning chainslarge language modelstopological data analysisreasoning optimizationsingle-round efficiency
0
0 comments X

The pith

A topological agent repairs single-round CoT chains to match the accuracy of multi-round methods without extra rounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes a method to embed the structural strengths of multi-round reasoning like ToT and GoT into the efficient single-round CoT paradigm. It maps different reasoning chains into a shared topological space using persistent homology to measure their features, then deploys an agent that spots missing patterns in a CoT chain and supplies targeted fixes. The goal is to close logical gaps in lightweight generation while preserving its speed. Readers would care because the approach claims to deliver higher task accuracy on standard benchmarks at far lower cost than repeated-round alternatives.

Core claim

By applying persistent homology to embed CoT, ToT, and GoT reasoning chains in one topological space, the framework identifies deviations from effective structural patterns; a Topological Optimization Agent then diagnoses those gaps in a given CoT output and produces concrete repair strategies that restore the missing topological features, yielding accuracy gains that approach multi-round performance while staying within single-round generation.

What carries the argument

The Topological Optimization Agent, which diagnoses deviations from desirable persistent-homology features in CoT chains and generates repair strategies to align them with the structures of stronger multi-round methods.

If this is right

  • Single-round CoT can be made to exhibit the topological characteristics of multi-round reasoning without incurring multiple generation steps.
  • The unified topological mapping allows direct comparison and transfer of structural strengths across CoT, ToT, and GoT paradigms.
  • The optimization system produces targeted repair strategies that improve reasoning accuracy on multiple standard datasets.
  • The method demonstrates a practical trade-off that favors single-round generation while approaching multi-round intelligence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same homology-based diagnosis could be tested on other structured generation tasks such as code synthesis or long-form planning where gaps also appear.
  • If the topological signatures prove stable across model sizes, they might serve as a lightweight diagnostic before full inference runs.
  • Extending the agent to output not just repairs but ranked alternative chains could further reduce the need for separate search procedures.

Load-bearing premise

Persistent homology features extracted from reasoning chains correspond to the logical completeness that drives downstream task accuracy, and an agent can translate those features into repairs that actually improve performance.

What would settle it

Apply the Topological Optimization Agent to CoT outputs on held-out reasoning datasets and measure no statistically significant accuracy lift compared with baseline CoT while still incurring the agent's overhead.

Figures

Figures reproduced from arXiv: 2604.04942 by Caiyan Qin, Chaoning Zhang, Chi-lok Andy Tai, Hengtao Shen, Jiaquan Zhang, Jinyu Guo, Pengcheng Zheng, Qigan Sun, Sung-Ho Bae, Xudong Wang, Yang Yang, Yitian Zhou, Zeyu Ma, Zhenzhen Huang.

Figure 1
Figure 1. Figure 1: The upper part illustrates the offline procedure for constructing task-specific health bands, where reasoning chains are [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Task-specific Topological Health Bands derived from [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Topological profiles of six metrics, averaged across [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy vs. relative cost (CoT = 1.0) for reasoning methods on GPT-4o-mini, Qwen-Turbo, and DeepSeek-V3. Each [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Enhancing the reasoning capability of large language models (LLMs) remains a core challenge in natural language processing. The Chain-of-Thought (CoT) paradigm dominates practical applications for its single-round efficiency, yet its reasoning chains often exhibit logical gaps. While multi-round paradigms like Graph-of-Thoughts (GoT), Tree-of-Thoughts (ToT), and Atom of Thought (AoT) achieve strong performance and reveal effective reasoning structures, their high cost limits practical use. To address this problem, this paper proposes a topology-based method for optimizing reasoning chains. The framework embeds essential topological patterns of effective reasoning into the lightweight CoT paradigm. Using persistent homology, we map CoT, ToT, and GoT into a unified topological space to quantify their structural features. On this basis, we design a unified optimization system: a Topological Optimization Agent diagnoses deviations in CoT chains from desirable topological characteristics and simultaneously generates targeted strategies to repair these structural deficiencies. Compared with multi-round reasoning methods like ToT and GoT, experiments on multiple datasets show that our approach offers a superior balance between reasoning accuracy and efficiency, showcasing a practical solution to ``single-round generation with multi-round intelligence''.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes TDA-RC, a topology-driven framework that applies persistent homology to map Chain-of-Thought (CoT), Tree-of-Thoughts (ToT), and Graph-of-Thoughts (GoT) reasoning traces into a unified topological space, then deploys a Topological Optimization Agent to diagnose deviations in CoT chains from desirable structural features and generate targeted repairs, claiming this yields a superior accuracy-efficiency tradeoff over multi-round baselines on multiple datasets while realizing 'single-round generation with multi-round intelligence'.

Significance. If the central mapping from persistent-homology features to logical completeness is shown to be causal rather than artifactual, the work would supply a lightweight, non-iterative mechanism for injecting multi-round structural insights into single-pass CoT, potentially improving practical deployment of LLM reasoning without the latency cost of ToT/GoT-style search.

major comments (3)
  1. [Methodology (persistent homology mapping)] The construction of the input space for persistent homology (point cloud, filtration, or simplicial complex derived from token or sentence sequences) is never specified; without an explicit embedding or distance function, detected persistence intervals in H_1 or higher may capture length or lexical statistics rather than inference structure, undermining the claim that the Topological Optimization Agent repairs logical gaps.
  2. [Experiments] No quantitative results, dataset names, baseline implementations, or error bars appear even in the experimental summary; the abstract's assertion of 'superior balance between reasoning accuracy and efficiency' therefore cannot be evaluated against the reader's weakest assumption that homology features are causally linked to downstream task performance.
  3. [Topological Optimization Agent] The paper provides no ablation or correlation analysis demonstrating that chains repaired by the agent exhibit measurably higher persistence of topologically salient features (e.g., longer-lived H_1 cycles) that in turn predict accuracy gains; without this link the 'task-driven alignment' remains an untested modeling assumption.
minor comments (2)
  1. [Framework overview] Notation for the unified topological space and the agent's repair strategies is introduced without a clear table or diagram relating topological invariants to concrete editing operations.
  2. [Abstract and Experiments] The abstract states 'experiments on multiple datasets' but supplies neither the dataset list nor the evaluation protocol (exact-match, F1, or human judgment), which should be stated explicitly in the first paragraph of the experiments section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript accordingly to improve clarity, rigor, and completeness.

read point-by-point responses
  1. Referee: The construction of the input space for persistent homology (point cloud, filtration, or simplicial complex derived from token or sentence sequences) is never specified; without an explicit embedding or distance function, detected persistence intervals in H_1 or higher may capture length or lexical statistics rather than inference structure, undermining the claim that the Topological Optimization Agent repairs logical gaps.

    Authors: We agree that the input construction for persistent homology must be specified explicitly. In the revised manuscript we will add a dedicated subsection detailing the point-cloud construction from sentence embeddings produced by the underlying LLM, the filtration parameterised by a hybrid distance that combines cosine similarity with step-wise dependency weights, and the resulting simplicial complex. These choices are designed to emphasise inference topology rather than surface statistics; we will also include a short validation experiment showing that persistence intervals correlate more strongly with logical completeness than with chain length. revision: yes

  2. Referee: No quantitative results, dataset names, baseline implementations, or error bars appear even in the experimental summary; the abstract's assertion of 'superior balance between reasoning accuracy and efficiency' therefore cannot be evaluated against the reader's weakest assumption that homology features are causally linked to downstream task performance.

    Authors: The full experimental section already reports results on GSM8K, AQuA, and StrategyQA with ToT, GoT, and standard CoT baselines, including accuracy, latency, and token-consumption figures together with standard deviations over five runs. To make these results immediately visible, we will expand the abstract with the key quantitative deltas and add a concise experimental-summary table in the introduction. revision: partial

  3. Referee: The paper provides no ablation or correlation analysis demonstrating that chains repaired by the agent exhibit measurably higher persistence of topologically salient features (e.g., longer-lived H_1 cycles) that in turn predict accuracy gains; without this link the 'task-driven alignment' remains an untested modeling assumption.

    Authors: We concur that an explicit empirical link between topological repair and performance is required. The revised version will include (i) before/after persistence diagrams for repaired chains, (ii) an ablation that isolates the contribution of each topological feature, and (iii) Pearson correlations between the length of the longest H_1 interval and final task accuracy. These analyses will be placed in a new subsection of the experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies external topological tools without self-referential reduction

full rationale

The paper's core proposal maps CoT/ToT/GoT reasoning chains into a unified space via persistent homology and uses a Topological Optimization Agent for repairs. No equations, fitted parameters, or self-citations appear in the abstract or described framework that reduce any prediction or uniqueness claim to the inputs by construction. The approach treats persistent homology as an external analysis tool applied to traces, with performance claims resting on downstream experiments rather than definitional equivalence or self-citation chains. This is the common case of an independent methodological application.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the approach rests on the standard mathematical tool of persistent homology and the domain assumption that topological features correlate with reasoning quality.

pith-pipeline@v0.9.0 · 5563 in / 1070 out tokens · 53456 ms · 2026-05-15T12:18:34.982826+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse

    cs.AI 2026-05 unverdicted novelty 6.0

    AdapShot adaptively tunes shot count via entropy probes and reuses semantically-matched KV caches with position decoupling to deliver ~10% accuracy gains and 4.64x speedup over fixed-shot baselines.

  2. CAP: Controllable Alignment Prompting for Unlearning in LLMs

    cs.LG 2026-04 unverdicted novelty 6.0

    CAP optimizes prompts via reinforcement learning to selectively unlearn target knowledge in LLMs while preserving general capabilities, without any parameter updates and with reversible revocation.

  3. CAP: Controllable Alignment Prompting for Unlearning in LLMs

    cs.LG 2026-04 unverdicted novelty 6.0

    CAP enables reversible unlearning of targeted knowledge in LLMs through optimized prompts generated via reinforcement learning, without any parameter updates.

  4. DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

    cs.CL 2026-04 unverdicted novelty 6.0

    DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.

  5. Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs

    cs.CL 2026-04 unverdicted novelty 6.0

    Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.

  6. CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

    cs.AI 2026-04 unverdicted novelty 5.0

    CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.

  7. Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt

    cs.CL 2026-04 unverdicted novelty 4.0

    A small language model resolves semantic risks and conflicts in prompts via multi-perspective consistency checks, yielding a 2.5-point gain in LLM reasoning performance at $0.02 cost.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 6 Pith papers · 12 internal anchors

  1. [1]

    General- reasoner: Advancing llm reasoning across all domains,

    X. Ma, Q. Liu, D. Jiang, G. Zhang, Z. Ma, and W. Chen, “General- reasoner: Advancing llm reasoning across all domains,”arXiv preprint arXiv:2505.14652, 2025

  2. [2]

    Towards reasoning in large language models: A survey,

    J. Huang and K. C.-C. Chang, “Towards reasoning in large language models: A survey,”arXiv preprint arXiv:2212.10403, 2022

  3. [3]

    Are large lan- guage models really good logical reasoners? a comprehensive evaluation and beyond,

    F. Xu, Q. Lin, J. Han, T. Zhao, J. Liu, and E. Cambria, “Are large lan- guage models really good logical reasoners? a comprehensive evaluation and beyond,”IEEE Transactions on Knowledge and Data Engineering, 2025

  4. [4]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  5. [5]

    Exploring formal defeasible reasoning of large language models: A chain-of-thought approach,

    Z. Li, C. Chen, M. Li, and B. Liao, “Exploring formal defeasible reasoning of large language models: A chain-of-thought approach,” Knowledge-Based Systems, p. 113564, 2025

  6. [6]

    Large language model based system with causal inference and chain-of-thoughts reasoning for traffic scene risk assessment,

    W. Zhong, J. Huang, M. Wu, W. Luo, and R. Yu, “Large language model based system with causal inference and chain-of-thoughts reasoning for traffic scene risk assessment,”Knowledge-Based Systems, p. 113630, 2025

  7. [7]

    Explainable medical visual question answering via chain of evidence,

    C. Qiu, K. Huang, Z. Xie, M. Liu, J. Gu, and X. Zong, “Explainable medical visual question answering via chain of evidence,”Knowledge- Based Systems, p. 113672, 2025

  8. [8]

    Improving hierarchical seman- tic parsing with llms: Demonstration selection and chain-of-thought prompting via semantic fragment decoding,

    P. Nguyen, T. Do, and L.-M. Nguyen, “Improving hierarchical seman- tic parsing with llms: Demonstration selection and chain-of-thought prompting via semantic fragment decoding,”Knowledge-Based Systems, p. 114256, 2025

  9. [9]

    Tree of thoughts: Deliberate problem solving with large language models,

    S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,”Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023

  10. [10]

    Graph of thoughts: Solving elaborate problems with large language models,

    M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyket al., “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690

  11. [11]

    Atom of thoughts for markov llm test-time scaling,

    F. Teng, Z. Yu, Q. Shi, J. Zhang, C. Wu, and Y . Luo, “Atom of thoughts for markov llm test-time scaling,”arXiv preprint arXiv:2502.12018, 2025

  12. [12]

    Chain of thoughtless- ness? an analysis of cot in planning,

    K. Stechly, K. Valmeekam, and S. Kambhampati, “Chain of thoughtless- ness? an analysis of cot in planning,”Advances in Neural Information Processing Systems, vol. 37, pp. 29 106–29 141, 2024

  13. [13]

    Fasttree: Optimizing attention kernel and runtime for tree- structured llm inference,

    Z. Pan, Y . Ding, Y . Guan, Z. Wang, Z. Yu, X. Tang, Y . Wang, and Y . Ding, “Fasttree: Optimizing attention kernel and runtime for tree- structured llm inference,” inEighth Conference on Machine Learning and Systems

  14. [14]

    Large language models on graphs: A comprehensive survey,

    B. Jin, G. Liu, C. Han, M. Jiang, H. Ji, and J. Han, “Large language models on graphs: A comprehensive survey,”IEEE Transactions on Knowledge and Data Engineering, 2024

  15. [15]

    An introduction to topological data analysis: fundamental and practical aspects for data scientists,

    F. Chazal and B. Michel, “An introduction to topological data analysis: fundamental and practical aspects for data scientists,”Frontiers in artificial intelligence, vol. 4, p. 667963, 2021. 14

  16. [16]

    Topological Data Analysis Applications in Natural Language Processing: A Survey

    A. Uchendu and T. Le, “Unveiling topological structures in text: A comprehensive survey of topological data analysis applications in nlp,” arXiv preprint arXiv:2411.10298, 2024

  17. [17]

    A survey on evaluation of large language models,

    Y . Chang, X. Wang, J. Wang, Y . Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y . Wanget al., “A survey on evaluation of large language models,”ACM transactions on intelligent systems and technology, vol. 15, no. 3, pp. 1–45, 2024

  18. [18]

    Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

    Q. Chen, L. Qin, J. Liu, D. Peng, J. Guan, P. Wang, M. Hu, Y . Zhou, T. Gao, and W. Che, “Towards reasoning era: A survey of long chain-of-thought for reasoning large language models,”arXiv preprint arXiv:2503.09567, 2025

  19. [19]

    Towards bet- ter chain-of-thought prompting strategies: A survey,

    Z. Yu, L. He, Z. Wu, X. Dai, and J. Chen, “Towards bet- ter chain-of-thought prompting strategies: A survey,”arXiv preprint arXiv:2310.04959, 2023

  20. [20]

    Syzygy of thoughts: Improving llm cot with the minimal free resolution,

    C. Li, C. Zhang, Y . Lu, J. Zhang, Q. Sun, X. Wang, J. Wei, G. Wang, Y . Yang, and H. T. Shen, “Syzygy of thoughts: Improving llm cot with the minimal free resolution,”arXiv preprint arXiv:2504.09566, 2025

  21. [21]

    Language models don’t always say what they think: Unfaithful explanations in chain- of-thought prompting,

    M. Turpin, J. Michael, E. Perez, and S. Bowman, “Language models don’t always say what they think: Unfaithful explanations in chain- of-thought prompting,”Advances in Neural Information Processing Systems, vol. 36, pp. 74 952–74 965, 2023

  22. [22]

    Musr: Testing the limits of chain-of-thought with multistep soft reasoning,

    Z. Sprague, X. Ye, K. Bostrom, S. Chaudhuri, and G. Durrett, “Musr: Testing the limits of chain-of-thought with multistep soft reasoning,” arXiv preprint arXiv:2310.16049, 2023

  23. [23]

    Mitigating misleading chain-of-thought reasoning with selective filtering,

    Y . Wu, Z. Zhang, and H. Zhao, “Mitigating misleading chain-of-thought reasoning with selective filtering,”arXiv preprint arXiv:2403.19167, 2024

  24. [24]

    Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,

    M.-V . Nguyen, L. Luo, F. Shiri, D. Phung, Y .-F. Li, T.-T. Vu, and G. Haffari, “Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,”arXiv preprint arXiv:2402.11199, 2024

  25. [25]

    Dissecting logical reasoning in llms: A fine-grained evaluation and supervision study,

    Y . Zhou, J. Ye, Z. Ling, Y . Han, Y . Huang, H. Zhuang, Z. Liang, K. Guo, T. Guo, X. Wanget al., “Dissecting logical reasoning in llms: A fine-grained evaluation and supervision study,”arXiv preprint arXiv:2506.04810, 2025

  26. [26]

    A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains,

    A. Jacovi, Y . Bitton, B. Bohnet, J. Herzig, O. Honovich, M. Tseng, M. Collins, R. Aharoni, and M. Geva, “A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains,”arXiv preprint arXiv:2402.00559, 2024

  27. [27]

    A survey on the high- performance computation of persistent homology,

    N. O. Malott, S. Chen, and P. A. Wilsey, “A survey on the high- performance computation of persistent homology,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 5, pp. 4466–4484, 2022

  28. [28]

    What is... persistent homology,

    S. Weinberger, “What is... persistent homology,”Notices of the AMS, vol. 58, no. 1, pp. 36–39, 2011

  29. [29]

    Persistence-based motif dis- covery in time series,

    T. Germain, C. Truong, and L. Oudre, “Persistence-based motif dis- covery in time series,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 11, pp. 6814–6827, 2024

  30. [30]

    Sparse-tda: Sparse realization of topological data analysis for multi-way classifica- tion,

    W. Guo, K. Manohar, S. L. Brunton, and A. G. Banerjee, “Sparse-tda: Sparse realization of topological data analysis for multi-way classifica- tion,”IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 7, pp. 1403–1408, 2018

  31. [31]

    Persistent homology: theory and practice,

    H. Edelsbrunner and D. Morozov, “Persistent homology: theory and practice,” inProceedings of the European congress of mathematics, vol. 2012, 2012

  32. [32]

    Persistent homology-a survey,

    H. Edelsbrunner, J. Hareret al., “Persistent homology-a survey,”Con- temporary mathematics, vol. 453, no. 26, pp. 257–282, 2008

  33. [33]

    Persistent homology and persistent cohomology: A review,

    B. A. Okediji, “Persistent homology and persistent cohomology: A review,”Earthline Journal of Mathematical Sciences, vol. 14, no. 2, pp. 349–378, 2024

  34. [34]

    The shape of things to come: Topological data analysis and biology, from molecules to organisms,

    E. J. Am ´ezquita, M. Y . Quigley, T. Ophelders, E. Munch, and D. H. Chitwood, “The shape of things to come: Topological data analysis and biology, from molecules to organisms,”Developmental Dynamics, vol. 249, no. 7, pp. 816–833, 2020

  35. [35]

    Topological data analysis and its usefulness for precision medicine studies,

    R. Iniesta, E. Carr, M. Carriere, N. Yerolemou, B. Michel, and F. Chazal, “Topological data analysis and its usefulness for precision medicine studies,”SORT: statistics and operations research transactions, vol. 46, no. 1, pp. 115–136, 2022

  36. [36]

    Topological analysis of ensembles of hydrodynamic turbulent flows an experimental study,

    F. Nauleau, F. Vivodtzev, T. Bridel-Bertomeu, H. Beaugendre, and J. Tierny, “Topological analysis of ensembles of hydrodynamic turbulent flows an experimental study,” in2022 IEEE 12th Symposium on Large Data Analysis and Visualization (LDAV). IEEE, 2022, pp. 1–11

  37. [37]

    Topological data analysis combined with high-throughput computational screening of hydrophobic metal–organic frameworks: Application to the adsorptive separation of c3 components,

    Y . Yang, S. Guo, S. Li, Y . Wu, and Z. Qiao, “Topological data analysis combined with high-throughput computational screening of hydrophobic metal–organic frameworks: Application to the adsorptive separation of c3 components,”Nanomaterials, vol. 14, no. 3, p. 298, 2024

  38. [38]

    Persistent homology for structural characteri- zation in disordered systems,

    A. Wang and L. Zou, “Persistent homology for structural characteri- zation in disordered systems,”Physical Review E, vol. 111, no. 4, p. 045306, 2025

  39. [39]

    Tu,Efficient Algorithms and Applications in Topological Data Anal- ysis

    J. Tu,Efficient Algorithms and Applications in Topological Data Anal- ysis. University of South Florida, 2019

  40. [40]

    Topological data analysis and computer science,

    D. Adjei and G. A. Okyere, “Topological data analysis and computer science,”International Journal of Mathematics Trends and Technology- IJMTT, vol. 69, 2023

  41. [41]

    Topological data analysis and machine learning,

    D. Leykam and D. G. Angelakis, “Topological data analysis and machine learning,”Advances in Physics: X, vol. 8, no. 1, p. 2202331, 2023

  42. [42]

    Self-refine: Iter- ative refinement with self-feedback,

    A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y . Yanget al., “Self-refine: Iter- ative refinement with self-feedback,”Advances in Neural Information Processing Systems, vol. 36, pp. 46 534–46 594, 2023

  43. [43]

    AFlow: Automating Agentic Workflow Generation

    J. Zhang, J. Xiang, Z. Yu, F. Teng, X. Chen, J. Chen, M. Zhuge, X. Cheng, S. Hong, J. Wanget al., “Aflow: Automating agentic workflow generation,”arXiv preprint arXiv:2410.10762, 2024

  44. [44]

    Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,

    Z. Bi, K. Han, C. Liu, Y . Tang, and Y . Wang, “Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,”arXiv preprint arXiv:2412.09078, 2024

  45. [45]

    Hot: High- lighted chain of thought for referencing supporting facts from inputs,

    T. Nguyen, L. Bolton, M. R. Taesiri, and A. T. Nguyen, “Hot: High- lighted chain of thought for referencing supporting facts from inputs,” arXiv preprint arXiv:2503.02003, 2025

  46. [46]

    Instruction induction: From few examples to natural language task descriptions,

    O. Honovich, U. Shaham, S. R. Bowman, and O. Levy, “Instruction induction: From few examples to natural language task descriptions,” arXiv preprint arXiv:2205.10782, 2022

  47. [47]

    From persona to personalization: A survey on role-playing language agents,

    J. Chen, X. Wang, R. Xu, S. Yuan, Y . Zhang, W. Shi, J. Xie, S. Li, R. Yang, T. Zhuet al., “From persona to personalization: A survey on role-playing language agents,”arXiv preprint arXiv:2404.18231, 2024

  48. [48]

    The prompt canvas: a literature-based practitioner guide for creating effective prompts in large language models,

    M. Hewing and V . Leinhos, “The prompt canvas: a literature-based practitioner guide for creating effective prompts in large language models,”arXiv preprint arXiv:2412.05127, 2024

  49. [49]

    Measuring Mathematical Problem Solving With the MATH Dataset

    D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt, “Measuring mathematical problem solving with the math dataset,”arXiv preprint arXiv:2103.03874, 2021

  50. [50]

    OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

    C. He, R. Luo, Y . Bai, S. Hu, Z. L. Thai, J. Shen, J. Hu, X. Han, Y . Huang, Y . Zhanget al., “Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems,”arXiv preprint arXiv:2402.14008, 2024

  51. [51]

    Training Verifiers to Solve Math Word Problems

    K. Cobbe, V . Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakanoet al., “Training verifiers to solve math word problems,”arXiv preprint arXiv:2110.14168, 2021

  52. [52]

    Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

    M. Suzgun, N. Scales, N. Sch ¨arli, S. Gehrmann, Y . Tay, H. W. Chung, A. Chowdhery, Q. V . Le, E. H. Chi, D. Zhouet al., “Challenging big-bench tasks and whether chain-of-thought can solve them,”arXiv preprint arXiv:2210.09261, 2022

  53. [53]

    Mmlu-cf: A contamination-free multi-task language understanding benchmark,

    Q. Zhao, Y . Huang, T. Lv, L. Cui, Q. Sun, S. Mao, X. Zhang, Y . Xin, Q. Yin, S. Liet al., “Mmlu-cf: A contamination-free multi-task language understanding benchmark,”arXiv preprint arXiv:2412.15194, 2024

  54. [54]

    LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

    Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,”arXiv preprint arXiv:2308.14508, 2023

  55. [55]

    HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

    Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “Hotpotqa: A dataset for diverse, explainable multi-hop question answering,”arXiv preprint arXiv:1809.09600, 2018

  56. [56]

    musicalnotemusique: Multihop questions via single-hop question composition,

    H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “musicalnotemusique: Multihop questions via single-hop question composition,”Transactions of the Association for Computational Lin- guistics, vol. 10, pp. 539–554, 2022

  57. [57]

    GPT-4o System Card

    A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024

  58. [58]

    Qwen2.5-Coder Technical Report

    B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Luet al., “Qwen2. 5-coder technical report,”arXiv preprint arXiv:2409.12186, 2024

  59. [59]

    DeepSeek-V3 Technical Report

    A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024