arxiv: 2604.04942 · v1 · submitted 2026-03-13 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

Jiaquan Zhang , Qigan Sun , Chaoning Zhang , Xudong Wang , Zhenzhen Huang , Yitian Zhou , Pengcheng Zheng , Chi-lok Andy Tai

show 6 more authors

Sung-Ho Bae Zeyu Ma Caiyan Qin Jinyu Guo Yang Yang Hengtao Shen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:18 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords persistent homologyChain-of-Thoughtreasoning chainslarge language modelstopological data analysisreasoning optimizationsingle-round efficiency

0 comments

The pith

A topological agent repairs single-round CoT chains to match the accuracy of multi-round methods without extra rounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes a method to embed the structural strengths of multi-round reasoning like ToT and GoT into the efficient single-round CoT paradigm. It maps different reasoning chains into a shared topological space using persistent homology to measure their features, then deploys an agent that spots missing patterns in a CoT chain and supplies targeted fixes. The goal is to close logical gaps in lightweight generation while preserving its speed. Readers would care because the approach claims to deliver higher task accuracy on standard benchmarks at far lower cost than repeated-round alternatives.

Core claim

By applying persistent homology to embed CoT, ToT, and GoT reasoning chains in one topological space, the framework identifies deviations from effective structural patterns; a Topological Optimization Agent then diagnoses those gaps in a given CoT output and produces concrete repair strategies that restore the missing topological features, yielding accuracy gains that approach multi-round performance while staying within single-round generation.

What carries the argument

The Topological Optimization Agent, which diagnoses deviations from desirable persistent-homology features in CoT chains and generates repair strategies to align them with the structures of stronger multi-round methods.

If this is right

Single-round CoT can be made to exhibit the topological characteristics of multi-round reasoning without incurring multiple generation steps.
The unified topological mapping allows direct comparison and transfer of structural strengths across CoT, ToT, and GoT paradigms.
The optimization system produces targeted repair strategies that improve reasoning accuracy on multiple standard datasets.
The method demonstrates a practical trade-off that favors single-round generation while approaching multi-round intelligence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same homology-based diagnosis could be tested on other structured generation tasks such as code synthesis or long-form planning where gaps also appear.
If the topological signatures prove stable across model sizes, they might serve as a lightweight diagnostic before full inference runs.
Extending the agent to output not just repairs but ranked alternative chains could further reduce the need for separate search procedures.

Load-bearing premise

Persistent homology features extracted from reasoning chains correspond to the logical completeness that drives downstream task accuracy, and an agent can translate those features into repairs that actually improve performance.

What would settle it

Apply the Topological Optimization Agent to CoT outputs on held-out reasoning datasets and measure no statistically significant accuracy lift compared with baseline CoT while still incurring the agent's overhead.

Figures

Figures reproduced from arXiv: 2604.04942 by Caiyan Qin, Chaoning Zhang, Chi-lok Andy Tai, Hengtao Shen, Jiaquan Zhang, Jinyu Guo, Pengcheng Zheng, Qigan Sun, Sung-Ho Bae, Xudong Wang, Yang Yang, Yitian Zhou, Zeyu Ma, Zhenzhen Huang.

**Figure 1.** Figure 1: The upper part illustrates the offline procedure for constructing task-specific health bands, where reasoning chains are [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: Task-specific Topological Health Bands derived from [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 2.** Figure 2: Topological profiles of six metrics, averaged across [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 4.** Figure 4: Accuracy vs. relative cost (CoT = 1.0) for reasoning methods on GPT-4o-mini, Qwen-Turbo, and DeepSeek-V3. Each [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Enhancing the reasoning capability of large language models (LLMs) remains a core challenge in natural language processing. The Chain-of-Thought (CoT) paradigm dominates practical applications for its single-round efficiency, yet its reasoning chains often exhibit logical gaps. While multi-round paradigms like Graph-of-Thoughts (GoT), Tree-of-Thoughts (ToT), and Atom of Thought (AoT) achieve strong performance and reveal effective reasoning structures, their high cost limits practical use. To address this problem, this paper proposes a topology-based method for optimizing reasoning chains. The framework embeds essential topological patterns of effective reasoning into the lightweight CoT paradigm. Using persistent homology, we map CoT, ToT, and GoT into a unified topological space to quantify their structural features. On this basis, we design a unified optimization system: a Topological Optimization Agent diagnoses deviations in CoT chains from desirable topological characteristics and simultaneously generates targeted strategies to repair these structural deficiencies. Compared with multi-round reasoning methods like ToT and GoT, experiments on multiple datasets show that our approach offers a superior balance between reasoning accuracy and efficiency, showcasing a practical solution to ``single-round generation with multi-round intelligence''.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TDA-RC applies persistent homology to unify and repair CoT chains, but the topology-to-logic link stays unproven and the experiments are not shown.

read the letter

The core move here is to embed CoT, ToT, and GoT reasoning traces into a shared topological space via persistent homology, then use a Topological Optimization Agent to spot deviations in a single CoT chain and suggest fixes. That unification step and the agent are the actual new pieces relative to the usual CoT/ToT/GoT papers. It is a clean way to turn the intuition that good reasoning has certain structural properties into something measurable with existing TDA tools, and the goal of getting multi-round quality without multi-round cost is worth pursuing for practical NLP use. The abstract frames the agent as diagnosing and repairing in one go, which could be useful if it works. The construction itself is presented as distinct from prior work, and the authors cite the relevant reasoning paradigms without obvious self-referential fitting. That part is straightforward and honest. The main weakness is that the paper never spells out how a token sequence or sentence chain becomes a point cloud or simplicial complex for the filtration. Without that step, any persistence intervals could just reflect length, vocabulary overlap, or embedding artifacts rather than logical completeness. The abstract asserts better accuracy-efficiency trade-offs on multiple datasets, yet supplies no numbers, baselines, or error breakdowns, so the claim cannot be checked. The stress-test concern about missing validation that the features are causal for logic rather than surface statistics therefore lands. If the full paper has a clear construction and reproducible results, those gaps could be fixed; on the current evidence they are load-bearing. This is aimed at people working on efficient single-pass reasoning for LLMs. A reader already comfortable with TDA in NLP would get the most out of the framework, while someone looking for immediate deployable gains would need the missing experimental detail first. It deserves a serious referee because the topological angle is fresh enough to test, even if the current write-up leaves the central causal claim open.

Referee Report

3 major / 2 minor

Summary. The paper proposes TDA-RC, a topology-driven framework that applies persistent homology to map Chain-of-Thought (CoT), Tree-of-Thoughts (ToT), and Graph-of-Thoughts (GoT) reasoning traces into a unified topological space, then deploys a Topological Optimization Agent to diagnose deviations in CoT chains from desirable structural features and generate targeted repairs, claiming this yields a superior accuracy-efficiency tradeoff over multi-round baselines on multiple datasets while realizing 'single-round generation with multi-round intelligence'.

Significance. If the central mapping from persistent-homology features to logical completeness is shown to be causal rather than artifactual, the work would supply a lightweight, non-iterative mechanism for injecting multi-round structural insights into single-pass CoT, potentially improving practical deployment of LLM reasoning without the latency cost of ToT/GoT-style search.

major comments (3)

[Methodology (persistent homology mapping)] The construction of the input space for persistent homology (point cloud, filtration, or simplicial complex derived from token or sentence sequences) is never specified; without an explicit embedding or distance function, detected persistence intervals in H_1 or higher may capture length or lexical statistics rather than inference structure, undermining the claim that the Topological Optimization Agent repairs logical gaps.
[Experiments] No quantitative results, dataset names, baseline implementations, or error bars appear even in the experimental summary; the abstract's assertion of 'superior balance between reasoning accuracy and efficiency' therefore cannot be evaluated against the reader's weakest assumption that homology features are causally linked to downstream task performance.
[Topological Optimization Agent] The paper provides no ablation or correlation analysis demonstrating that chains repaired by the agent exhibit measurably higher persistence of topologically salient features (e.g., longer-lived H_1 cycles) that in turn predict accuracy gains; without this link the 'task-driven alignment' remains an untested modeling assumption.

minor comments (2)

[Framework overview] Notation for the unified topological space and the agent's repair strategies is introduced without a clear table or diagram relating topological invariants to concrete editing operations.
[Abstract and Experiments] The abstract states 'experiments on multiple datasets' but supplies neither the dataset list nor the evaluation protocol (exact-match, F1, or human judgment), which should be stated explicitly in the first paragraph of the experiments section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript accordingly to improve clarity, rigor, and completeness.

read point-by-point responses

Referee: The construction of the input space for persistent homology (point cloud, filtration, or simplicial complex derived from token or sentence sequences) is never specified; without an explicit embedding or distance function, detected persistence intervals in H_1 or higher may capture length or lexical statistics rather than inference structure, undermining the claim that the Topological Optimization Agent repairs logical gaps.

Authors: We agree that the input construction for persistent homology must be specified explicitly. In the revised manuscript we will add a dedicated subsection detailing the point-cloud construction from sentence embeddings produced by the underlying LLM, the filtration parameterised by a hybrid distance that combines cosine similarity with step-wise dependency weights, and the resulting simplicial complex. These choices are designed to emphasise inference topology rather than surface statistics; we will also include a short validation experiment showing that persistence intervals correlate more strongly with logical completeness than with chain length. revision: yes
Referee: No quantitative results, dataset names, baseline implementations, or error bars appear even in the experimental summary; the abstract's assertion of 'superior balance between reasoning accuracy and efficiency' therefore cannot be evaluated against the reader's weakest assumption that homology features are causally linked to downstream task performance.

Authors: The full experimental section already reports results on GSM8K, AQuA, and StrategyQA with ToT, GoT, and standard CoT baselines, including accuracy, latency, and token-consumption figures together with standard deviations over five runs. To make these results immediately visible, we will expand the abstract with the key quantitative deltas and add a concise experimental-summary table in the introduction. revision: partial
Referee: The paper provides no ablation or correlation analysis demonstrating that chains repaired by the agent exhibit measurably higher persistence of topologically salient features (e.g., longer-lived H_1 cycles) that in turn predict accuracy gains; without this link the 'task-driven alignment' remains an untested modeling assumption.

Authors: We concur that an explicit empirical link between topological repair and performance is required. The revised version will include (i) before/after persistence diagrams for repaired chains, (ii) an ablation that isolates the contribution of each topological feature, and (iii) Pearson correlations between the length of the longest H_1 interval and final task accuracy. These analyses will be placed in a new subsection of the experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies external topological tools without self-referential reduction

full rationale

The paper's core proposal maps CoT/ToT/GoT reasoning chains into a unified space via persistent homology and uses a Topological Optimization Agent for repairs. No equations, fitted parameters, or self-citations appear in the abstract or described framework that reduce any prediction or uniqueness claim to the inputs by construction. The approach treats persistent homology as an external analysis tool applied to traces, with performance claims resting on downstream experiments rather than definitional equivalence or self-citation chains. This is the common case of an independent methodological application.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the approach rests on the standard mathematical tool of persistent homology and the domain assumption that topological features correlate with reasoning quality.

pith-pipeline@v0.9.0 · 5563 in / 1070 out tokens · 53456 ms · 2026-05-15T12:18:34.982826+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using persistent homology, we map CoT, ToT, and GoT into a unified topological space... Topological Optimization Agent diagnoses deviations... Fcoh from H1 persistence
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

health bands H(T)k = [Q1, Q3] from correct traces; deviation ek triggers repair

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse
cs.AI 2026-05 unverdicted novelty 6.0

AdapShot adaptively tunes shot count via entropy probes and reuses semantically-matched KV caches with position decoupling to deliver ~10% accuracy gains and 4.64x speedup over fixed-shot baselines.
CAP: Controllable Alignment Prompting for Unlearning in LLMs
cs.LG 2026-04 unverdicted novelty 6.0

CAP optimizes prompts via reinforcement learning to selectively unlearn target knowledge in LLMs while preserving general capabilities, without any parameter updates and with reversible revocation.
CAP: Controllable Alignment Prompting for Unlearning in LLMs
cs.LG 2026-04 unverdicted novelty 6.0

CAP enables reversible unlearning of targeted knowledge in LLMs through optimized prompts generated via reinforcement learning, without any parameter updates.
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
cs.CL 2026-04 unverdicted novelty 6.0

DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs
cs.CL 2026-04 unverdicted novelty 6.0

Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.
CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning
cs.AI 2026-04 unverdicted novelty 5.0

CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.
Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt
cs.CL 2026-04 unverdicted novelty 4.0

A small language model resolves semantic risks and conflicts in prompts via multi-perspective consistency checks, yielding a 2.5-point gain in LLM reasoning performance at $0.02 cost.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 6 Pith papers · 12 internal anchors

[1]

General- reasoner: Advancing llm reasoning across all domains,

X. Ma, Q. Liu, D. Jiang, G. Zhang, Z. Ma, and W. Chen, “General- reasoner: Advancing llm reasoning across all domains,”arXiv preprint arXiv:2505.14652, 2025

work page arXiv 2025
[2]

Towards reasoning in large language models: A survey,

J. Huang and K. C.-C. Chang, “Towards reasoning in large language models: A survey,”arXiv preprint arXiv:2212.10403, 2022

work page arXiv 2022
[3]

Are large lan- guage models really good logical reasoners? a comprehensive evaluation and beyond,

F. Xu, Q. Lin, J. Han, T. Zhao, J. Liu, and E. Cambria, “Are large lan- guage models really good logical reasoners? a comprehensive evaluation and beyond,”IEEE Transactions on Knowledge and Data Engineering, 2025

work page 2025
[4]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

work page 2022
[5]

Exploring formal defeasible reasoning of large language models: A chain-of-thought approach,

Z. Li, C. Chen, M. Li, and B. Liao, “Exploring formal defeasible reasoning of large language models: A chain-of-thought approach,” Knowledge-Based Systems, p. 113564, 2025

work page 2025
[6]

Large language model based system with causal inference and chain-of-thoughts reasoning for traffic scene risk assessment,

W. Zhong, J. Huang, M. Wu, W. Luo, and R. Yu, “Large language model based system with causal inference and chain-of-thoughts reasoning for traffic scene risk assessment,”Knowledge-Based Systems, p. 113630, 2025

work page 2025
[7]

Explainable medical visual question answering via chain of evidence,

C. Qiu, K. Huang, Z. Xie, M. Liu, J. Gu, and X. Zong, “Explainable medical visual question answering via chain of evidence,”Knowledge- Based Systems, p. 113672, 2025

work page 2025
[8]

Improving hierarchical seman- tic parsing with llms: Demonstration selection and chain-of-thought prompting via semantic fragment decoding,

P. Nguyen, T. Do, and L.-M. Nguyen, “Improving hierarchical seman- tic parsing with llms: Demonstration selection and chain-of-thought prompting via semantic fragment decoding,”Knowledge-Based Systems, p. 114256, 2025

work page 2025
[9]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,”Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023

work page 2023
[10]

Graph of thoughts: Solving elaborate problems with large language models,

M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyket al., “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690

work page 2024
[11]

Atom of thoughts for markov llm test-time scaling,

F. Teng, Z. Yu, Q. Shi, J. Zhang, C. Wu, and Y . Luo, “Atom of thoughts for markov llm test-time scaling,”arXiv preprint arXiv:2502.12018, 2025

work page arXiv 2025
[12]

Chain of thoughtless- ness? an analysis of cot in planning,

K. Stechly, K. Valmeekam, and S. Kambhampati, “Chain of thoughtless- ness? an analysis of cot in planning,”Advances in Neural Information Processing Systems, vol. 37, pp. 29 106–29 141, 2024

work page 2024
[13]

Fasttree: Optimizing attention kernel and runtime for tree- structured llm inference,

Z. Pan, Y . Ding, Y . Guan, Z. Wang, Z. Yu, X. Tang, Y . Wang, and Y . Ding, “Fasttree: Optimizing attention kernel and runtime for tree- structured llm inference,” inEighth Conference on Machine Learning and Systems

work page
[14]

Large language models on graphs: A comprehensive survey,

B. Jin, G. Liu, C. Han, M. Jiang, H. Ji, and J. Han, “Large language models on graphs: A comprehensive survey,”IEEE Transactions on Knowledge and Data Engineering, 2024

work page 2024
[15]

An introduction to topological data analysis: fundamental and practical aspects for data scientists,

F. Chazal and B. Michel, “An introduction to topological data analysis: fundamental and practical aspects for data scientists,”Frontiers in artificial intelligence, vol. 4, p. 667963, 2021. 14

work page 2021
[16]

Topological Data Analysis Applications in Natural Language Processing: A Survey

A. Uchendu and T. Le, “Unveiling topological structures in text: A comprehensive survey of topological data analysis applications in nlp,” arXiv preprint arXiv:2411.10298, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

A survey on evaluation of large language models,

Y . Chang, X. Wang, J. Wang, Y . Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y . Wanget al., “A survey on evaluation of large language models,”ACM transactions on intelligent systems and technology, vol. 15, no. 3, pp. 1–45, 2024

work page 2024
[18]

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Q. Chen, L. Qin, J. Liu, D. Peng, J. Guan, P. Wang, M. Hu, Y . Zhou, T. Gao, and W. Che, “Towards reasoning era: A survey of long chain-of-thought for reasoning large language models,”arXiv preprint arXiv:2503.09567, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Towards bet- ter chain-of-thought prompting strategies: A survey,

Z. Yu, L. He, Z. Wu, X. Dai, and J. Chen, “Towards bet- ter chain-of-thought prompting strategies: A survey,”arXiv preprint arXiv:2310.04959, 2023

work page arXiv 2023
[20]

Syzygy of thoughts: Improving llm cot with the minimal free resolution,

C. Li, C. Zhang, Y . Lu, J. Zhang, Q. Sun, X. Wang, J. Wei, G. Wang, Y . Yang, and H. T. Shen, “Syzygy of thoughts: Improving llm cot with the minimal free resolution,”arXiv preprint arXiv:2504.09566, 2025

work page arXiv 2025
[21]

Language models don’t always say what they think: Unfaithful explanations in chain- of-thought prompting,

M. Turpin, J. Michael, E. Perez, and S. Bowman, “Language models don’t always say what they think: Unfaithful explanations in chain- of-thought prompting,”Advances in Neural Information Processing Systems, vol. 36, pp. 74 952–74 965, 2023

work page 2023
[22]

Musr: Testing the limits of chain-of-thought with multistep soft reasoning,

Z. Sprague, X. Ye, K. Bostrom, S. Chaudhuri, and G. Durrett, “Musr: Testing the limits of chain-of-thought with multistep soft reasoning,” arXiv preprint arXiv:2310.16049, 2023

work page arXiv 2023
[23]

Mitigating misleading chain-of-thought reasoning with selective filtering,

Y . Wu, Z. Zhang, and H. Zhao, “Mitigating misleading chain-of-thought reasoning with selective filtering,”arXiv preprint arXiv:2403.19167, 2024

work page arXiv 2024
[24]

Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,

M.-V . Nguyen, L. Luo, F. Shiri, D. Phung, Y .-F. Li, T.-T. Vu, and G. Haffari, “Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,”arXiv preprint arXiv:2402.11199, 2024

work page arXiv 2024
[25]

Dissecting logical reasoning in llms: A fine-grained evaluation and supervision study,

Y . Zhou, J. Ye, Z. Ling, Y . Han, Y . Huang, H. Zhuang, Z. Liang, K. Guo, T. Guo, X. Wanget al., “Dissecting logical reasoning in llms: A fine-grained evaluation and supervision study,”arXiv preprint arXiv:2506.04810, 2025

work page arXiv 2025
[26]

A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains,

A. Jacovi, Y . Bitton, B. Bohnet, J. Herzig, O. Honovich, M. Tseng, M. Collins, R. Aharoni, and M. Geva, “A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains,”arXiv preprint arXiv:2402.00559, 2024

work page arXiv 2024
[27]

A survey on the high- performance computation of persistent homology,

N. O. Malott, S. Chen, and P. A. Wilsey, “A survey on the high- performance computation of persistent homology,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 5, pp. 4466–4484, 2022

work page 2022
[28]

What is... persistent homology,

S. Weinberger, “What is... persistent homology,”Notices of the AMS, vol. 58, no. 1, pp. 36–39, 2011

work page 2011
[29]

Persistence-based motif dis- covery in time series,

T. Germain, C. Truong, and L. Oudre, “Persistence-based motif dis- covery in time series,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 11, pp. 6814–6827, 2024

work page 2024
[30]

Sparse-tda: Sparse realization of topological data analysis for multi-way classifica- tion,

W. Guo, K. Manohar, S. L. Brunton, and A. G. Banerjee, “Sparse-tda: Sparse realization of topological data analysis for multi-way classifica- tion,”IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 7, pp. 1403–1408, 2018

work page 2018
[31]

Persistent homology: theory and practice,

H. Edelsbrunner and D. Morozov, “Persistent homology: theory and practice,” inProceedings of the European congress of mathematics, vol. 2012, 2012

work page 2012
[32]

Persistent homology-a survey,

H. Edelsbrunner, J. Hareret al., “Persistent homology-a survey,”Con- temporary mathematics, vol. 453, no. 26, pp. 257–282, 2008

work page 2008
[33]

Persistent homology and persistent cohomology: A review,

B. A. Okediji, “Persistent homology and persistent cohomology: A review,”Earthline Journal of Mathematical Sciences, vol. 14, no. 2, pp. 349–378, 2024

work page 2024
[34]

The shape of things to come: Topological data analysis and biology, from molecules to organisms,

E. J. Am ´ezquita, M. Y . Quigley, T. Ophelders, E. Munch, and D. H. Chitwood, “The shape of things to come: Topological data analysis and biology, from molecules to organisms,”Developmental Dynamics, vol. 249, no. 7, pp. 816–833, 2020

work page 2020
[35]

Topological data analysis and its usefulness for precision medicine studies,

R. Iniesta, E. Carr, M. Carriere, N. Yerolemou, B. Michel, and F. Chazal, “Topological data analysis and its usefulness for precision medicine studies,”SORT: statistics and operations research transactions, vol. 46, no. 1, pp. 115–136, 2022

work page 2022
[36]

Topological analysis of ensembles of hydrodynamic turbulent flows an experimental study,

F. Nauleau, F. Vivodtzev, T. Bridel-Bertomeu, H. Beaugendre, and J. Tierny, “Topological analysis of ensembles of hydrodynamic turbulent flows an experimental study,” in2022 IEEE 12th Symposium on Large Data Analysis and Visualization (LDAV). IEEE, 2022, pp. 1–11

work page 2022
[37]

Topological data analysis combined with high-throughput computational screening of hydrophobic metal–organic frameworks: Application to the adsorptive separation of c3 components,

Y . Yang, S. Guo, S. Li, Y . Wu, and Z. Qiao, “Topological data analysis combined with high-throughput computational screening of hydrophobic metal–organic frameworks: Application to the adsorptive separation of c3 components,”Nanomaterials, vol. 14, no. 3, p. 298, 2024

work page 2024
[38]

Persistent homology for structural characteri- zation in disordered systems,

A. Wang and L. Zou, “Persistent homology for structural characteri- zation in disordered systems,”Physical Review E, vol. 111, no. 4, p. 045306, 2025

work page 2025
[39]

Tu,Efficient Algorithms and Applications in Topological Data Anal- ysis

J. Tu,Efficient Algorithms and Applications in Topological Data Anal- ysis. University of South Florida, 2019

work page 2019
[40]

Topological data analysis and computer science,

D. Adjei and G. A. Okyere, “Topological data analysis and computer science,”International Journal of Mathematics Trends and Technology- IJMTT, vol. 69, 2023

work page 2023
[41]

Topological data analysis and machine learning,

D. Leykam and D. G. Angelakis, “Topological data analysis and machine learning,”Advances in Physics: X, vol. 8, no. 1, p. 2202331, 2023

work page 2023
[42]

Self-refine: Iter- ative refinement with self-feedback,

A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y . Yanget al., “Self-refine: Iter- ative refinement with self-feedback,”Advances in Neural Information Processing Systems, vol. 36, pp. 46 534–46 594, 2023

work page 2023
[43]

AFlow: Automating Agentic Workflow Generation

J. Zhang, J. Xiang, Z. Yu, F. Teng, X. Chen, J. Chen, M. Zhuge, X. Cheng, S. Hong, J. Wanget al., “Aflow: Automating agentic workflow generation,”arXiv preprint arXiv:2410.10762, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,

Z. Bi, K. Han, C. Liu, Y . Tang, and Y . Wang, “Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,”arXiv preprint arXiv:2412.09078, 2024

work page arXiv 2024
[45]

Hot: High- lighted chain of thought for referencing supporting facts from inputs,

T. Nguyen, L. Bolton, M. R. Taesiri, and A. T. Nguyen, “Hot: High- lighted chain of thought for referencing supporting facts from inputs,” arXiv preprint arXiv:2503.02003, 2025

work page arXiv 2025
[46]

Instruction induction: From few examples to natural language task descriptions,

O. Honovich, U. Shaham, S. R. Bowman, and O. Levy, “Instruction induction: From few examples to natural language task descriptions,” arXiv preprint arXiv:2205.10782, 2022

work page arXiv 2022
[47]

From persona to personalization: A survey on role-playing language agents,

J. Chen, X. Wang, R. Xu, S. Yuan, Y . Zhang, W. Shi, J. Xie, S. Li, R. Yang, T. Zhuet al., “From persona to personalization: A survey on role-playing language agents,”arXiv preprint arXiv:2404.18231, 2024

work page arXiv 2024
[48]

The prompt canvas: a literature-based practitioner guide for creating effective prompts in large language models,

M. Hewing and V . Leinhos, “The prompt canvas: a literature-based practitioner guide for creating effective prompts in large language models,”arXiv preprint arXiv:2412.05127, 2024

work page arXiv 2024
[49]

Measuring Mathematical Problem Solving With the MATH Dataset

D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt, “Measuring mathematical problem solving with the math dataset,”arXiv preprint arXiv:2103.03874, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[50]

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

C. He, R. Luo, Y . Bai, S. Hu, Z. L. Thai, J. Shen, J. Hu, X. Han, Y . Huang, Y . Zhanget al., “Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems,”arXiv preprint arXiv:2402.14008, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[51]

Training Verifiers to Solve Math Word Problems

K. Cobbe, V . Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakanoet al., “Training verifiers to solve math word problems,”arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[52]

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

M. Suzgun, N. Scales, N. Sch ¨arli, S. Gehrmann, Y . Tay, H. W. Chung, A. Chowdhery, Q. V . Le, E. H. Chi, D. Zhouet al., “Challenging big-bench tasks and whether chain-of-thought can solve them,”arXiv preprint arXiv:2210.09261, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[53]

Mmlu-cf: A contamination-free multi-task language understanding benchmark,

Q. Zhao, Y . Huang, T. Lv, L. Cui, Q. Sun, S. Mao, X. Zhang, Y . Xin, Q. Yin, S. Liet al., “Mmlu-cf: A contamination-free multi-task language understanding benchmark,”arXiv preprint arXiv:2412.15194, 2024

work page arXiv 2024
[54]

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,”arXiv preprint arXiv:2308.14508, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[55]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “Hotpotqa: A dataset for diverse, explainable multi-hop question answering,”arXiv preprint arXiv:1809.09600, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[56]

musicalnotemusique: Multihop questions via single-hop question composition,

H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “musicalnotemusique: Multihop questions via single-hop question composition,”Transactions of the Association for Computational Lin- guistics, vol. 10, pp. 539–554, 2022

work page 2022
[57]

GPT-4o System Card

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[58]

Qwen2.5-Coder Technical Report

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Luet al., “Qwen2. 5-coder technical report,”arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[59]

DeepSeek-V3 Technical Report

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024