Recognition: 2 theorem links
· Lean TheoremTDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models
Pith reviewed 2026-05-15 12:18 UTC · model grok-4.3
The pith
A topological agent repairs single-round CoT chains to match the accuracy of multi-round methods without extra rounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying persistent homology to embed CoT, ToT, and GoT reasoning chains in one topological space, the framework identifies deviations from effective structural patterns; a Topological Optimization Agent then diagnoses those gaps in a given CoT output and produces concrete repair strategies that restore the missing topological features, yielding accuracy gains that approach multi-round performance while staying within single-round generation.
What carries the argument
The Topological Optimization Agent, which diagnoses deviations from desirable persistent-homology features in CoT chains and generates repair strategies to align them with the structures of stronger multi-round methods.
If this is right
- Single-round CoT can be made to exhibit the topological characteristics of multi-round reasoning without incurring multiple generation steps.
- The unified topological mapping allows direct comparison and transfer of structural strengths across CoT, ToT, and GoT paradigms.
- The optimization system produces targeted repair strategies that improve reasoning accuracy on multiple standard datasets.
- The method demonstrates a practical trade-off that favors single-round generation while approaching multi-round intelligence.
Where Pith is reading between the lines
- The same homology-based diagnosis could be tested on other structured generation tasks such as code synthesis or long-form planning where gaps also appear.
- If the topological signatures prove stable across model sizes, they might serve as a lightweight diagnostic before full inference runs.
- Extending the agent to output not just repairs but ranked alternative chains could further reduce the need for separate search procedures.
Load-bearing premise
Persistent homology features extracted from reasoning chains correspond to the logical completeness that drives downstream task accuracy, and an agent can translate those features into repairs that actually improve performance.
What would settle it
Apply the Topological Optimization Agent to CoT outputs on held-out reasoning datasets and measure no statistically significant accuracy lift compared with baseline CoT while still incurring the agent's overhead.
Figures
read the original abstract
Enhancing the reasoning capability of large language models (LLMs) remains a core challenge in natural language processing. The Chain-of-Thought (CoT) paradigm dominates practical applications for its single-round efficiency, yet its reasoning chains often exhibit logical gaps. While multi-round paradigms like Graph-of-Thoughts (GoT), Tree-of-Thoughts (ToT), and Atom of Thought (AoT) achieve strong performance and reveal effective reasoning structures, their high cost limits practical use. To address this problem, this paper proposes a topology-based method for optimizing reasoning chains. The framework embeds essential topological patterns of effective reasoning into the lightweight CoT paradigm. Using persistent homology, we map CoT, ToT, and GoT into a unified topological space to quantify their structural features. On this basis, we design a unified optimization system: a Topological Optimization Agent diagnoses deviations in CoT chains from desirable topological characteristics and simultaneously generates targeted strategies to repair these structural deficiencies. Compared with multi-round reasoning methods like ToT and GoT, experiments on multiple datasets show that our approach offers a superior balance between reasoning accuracy and efficiency, showcasing a practical solution to ``single-round generation with multi-round intelligence''.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes TDA-RC, a topology-driven framework that applies persistent homology to map Chain-of-Thought (CoT), Tree-of-Thoughts (ToT), and Graph-of-Thoughts (GoT) reasoning traces into a unified topological space, then deploys a Topological Optimization Agent to diagnose deviations in CoT chains from desirable structural features and generate targeted repairs, claiming this yields a superior accuracy-efficiency tradeoff over multi-round baselines on multiple datasets while realizing 'single-round generation with multi-round intelligence'.
Significance. If the central mapping from persistent-homology features to logical completeness is shown to be causal rather than artifactual, the work would supply a lightweight, non-iterative mechanism for injecting multi-round structural insights into single-pass CoT, potentially improving practical deployment of LLM reasoning without the latency cost of ToT/GoT-style search.
major comments (3)
- [Methodology (persistent homology mapping)] The construction of the input space for persistent homology (point cloud, filtration, or simplicial complex derived from token or sentence sequences) is never specified; without an explicit embedding or distance function, detected persistence intervals in H_1 or higher may capture length or lexical statistics rather than inference structure, undermining the claim that the Topological Optimization Agent repairs logical gaps.
- [Experiments] No quantitative results, dataset names, baseline implementations, or error bars appear even in the experimental summary; the abstract's assertion of 'superior balance between reasoning accuracy and efficiency' therefore cannot be evaluated against the reader's weakest assumption that homology features are causally linked to downstream task performance.
- [Topological Optimization Agent] The paper provides no ablation or correlation analysis demonstrating that chains repaired by the agent exhibit measurably higher persistence of topologically salient features (e.g., longer-lived H_1 cycles) that in turn predict accuracy gains; without this link the 'task-driven alignment' remains an untested modeling assumption.
minor comments (2)
- [Framework overview] Notation for the unified topological space and the agent's repair strategies is introduced without a clear table or diagram relating topological invariants to concrete editing operations.
- [Abstract and Experiments] The abstract states 'experiments on multiple datasets' but supplies neither the dataset list nor the evaluation protocol (exact-match, F1, or human judgment), which should be stated explicitly in the first paragraph of the experiments section.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript accordingly to improve clarity, rigor, and completeness.
read point-by-point responses
-
Referee: The construction of the input space for persistent homology (point cloud, filtration, or simplicial complex derived from token or sentence sequences) is never specified; without an explicit embedding or distance function, detected persistence intervals in H_1 or higher may capture length or lexical statistics rather than inference structure, undermining the claim that the Topological Optimization Agent repairs logical gaps.
Authors: We agree that the input construction for persistent homology must be specified explicitly. In the revised manuscript we will add a dedicated subsection detailing the point-cloud construction from sentence embeddings produced by the underlying LLM, the filtration parameterised by a hybrid distance that combines cosine similarity with step-wise dependency weights, and the resulting simplicial complex. These choices are designed to emphasise inference topology rather than surface statistics; we will also include a short validation experiment showing that persistence intervals correlate more strongly with logical completeness than with chain length. revision: yes
-
Referee: No quantitative results, dataset names, baseline implementations, or error bars appear even in the experimental summary; the abstract's assertion of 'superior balance between reasoning accuracy and efficiency' therefore cannot be evaluated against the reader's weakest assumption that homology features are causally linked to downstream task performance.
Authors: The full experimental section already reports results on GSM8K, AQuA, and StrategyQA with ToT, GoT, and standard CoT baselines, including accuracy, latency, and token-consumption figures together with standard deviations over five runs. To make these results immediately visible, we will expand the abstract with the key quantitative deltas and add a concise experimental-summary table in the introduction. revision: partial
-
Referee: The paper provides no ablation or correlation analysis demonstrating that chains repaired by the agent exhibit measurably higher persistence of topologically salient features (e.g., longer-lived H_1 cycles) that in turn predict accuracy gains; without this link the 'task-driven alignment' remains an untested modeling assumption.
Authors: We concur that an explicit empirical link between topological repair and performance is required. The revised version will include (i) before/after persistence diagrams for repaired chains, (ii) an ablation that isolates the contribution of each topological feature, and (iii) Pearson correlations between the length of the longest H_1 interval and final task accuracy. These analyses will be placed in a new subsection of the experiments. revision: yes
Circularity Check
No significant circularity; derivation applies external topological tools without self-referential reduction
full rationale
The paper's core proposal maps CoT/ToT/GoT reasoning chains into a unified space via persistent homology and uses a Topological Optimization Agent for repairs. No equations, fitted parameters, or self-citations appear in the abstract or described framework that reduce any prediction or uniqueness claim to the inputs by construction. The approach treats persistent homology as an external analysis tool applied to traces, with performance claims resting on downstream experiments rather than definitional equivalence or self-citation chains. This is the common case of an independent methodological application.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using persistent homology, we map CoT, ToT, and GoT into a unified topological space... Topological Optimization Agent diagnoses deviations... Fcoh from H1 persistence
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
health bands H(T)k = [Q1, Q3] from correct traces; deviation ek triggers repair
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 7 Pith papers
-
AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse
AdapShot adaptively tunes shot count via entropy probes and reuses semantically-matched KV caches with position decoupling to deliver ~10% accuracy gains and 4.64x speedup over fixed-shot baselines.
-
CAP: Controllable Alignment Prompting for Unlearning in LLMs
CAP optimizes prompts via reinforcement learning to selectively unlearn target knowledge in LLMs while preserving general capabilities, without any parameter updates and with reversible revocation.
-
CAP: Controllable Alignment Prompting for Unlearning in LLMs
CAP enables reversible unlearning of targeted knowledge in LLMs through optimized prompts generated via reinforcement learning, without any parameter updates.
-
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
-
Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs
Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.
-
CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning
CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.
-
Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt
A small language model resolves semantic risks and conflicts in prompts via multi-perspective consistency checks, yielding a 2.5-point gain in LLM reasoning performance at $0.02 cost.
Reference graph
Works this paper leans on
-
[1]
General- reasoner: Advancing llm reasoning across all domains,
X. Ma, Q. Liu, D. Jiang, G. Zhang, Z. Ma, and W. Chen, “General- reasoner: Advancing llm reasoning across all domains,”arXiv preprint arXiv:2505.14652, 2025
-
[2]
Towards reasoning in large language models: A survey,
J. Huang and K. C.-C. Chang, “Towards reasoning in large language models: A survey,”arXiv preprint arXiv:2212.10403, 2022
-
[3]
Are large lan- guage models really good logical reasoners? a comprehensive evaluation and beyond,
F. Xu, Q. Lin, J. Han, T. Zhao, J. Liu, and E. Cambria, “Are large lan- guage models really good logical reasoners? a comprehensive evaluation and beyond,”IEEE Transactions on Knowledge and Data Engineering, 2025
work page 2025
-
[4]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022
work page 2022
-
[5]
Exploring formal defeasible reasoning of large language models: A chain-of-thought approach,
Z. Li, C. Chen, M. Li, and B. Liao, “Exploring formal defeasible reasoning of large language models: A chain-of-thought approach,” Knowledge-Based Systems, p. 113564, 2025
work page 2025
-
[6]
W. Zhong, J. Huang, M. Wu, W. Luo, and R. Yu, “Large language model based system with causal inference and chain-of-thoughts reasoning for traffic scene risk assessment,”Knowledge-Based Systems, p. 113630, 2025
work page 2025
-
[7]
Explainable medical visual question answering via chain of evidence,
C. Qiu, K. Huang, Z. Xie, M. Liu, J. Gu, and X. Zong, “Explainable medical visual question answering via chain of evidence,”Knowledge- Based Systems, p. 113672, 2025
work page 2025
-
[8]
P. Nguyen, T. Do, and L.-M. Nguyen, “Improving hierarchical seman- tic parsing with llms: Demonstration selection and chain-of-thought prompting via semantic fragment decoding,”Knowledge-Based Systems, p. 114256, 2025
work page 2025
-
[9]
Tree of thoughts: Deliberate problem solving with large language models,
S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,”Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023
work page 2023
-
[10]
Graph of thoughts: Solving elaborate problems with large language models,
M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyket al., “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690
work page 2024
-
[11]
Atom of thoughts for markov llm test-time scaling,
F. Teng, Z. Yu, Q. Shi, J. Zhang, C. Wu, and Y . Luo, “Atom of thoughts for markov llm test-time scaling,”arXiv preprint arXiv:2502.12018, 2025
-
[12]
Chain of thoughtless- ness? an analysis of cot in planning,
K. Stechly, K. Valmeekam, and S. Kambhampati, “Chain of thoughtless- ness? an analysis of cot in planning,”Advances in Neural Information Processing Systems, vol. 37, pp. 29 106–29 141, 2024
work page 2024
-
[13]
Fasttree: Optimizing attention kernel and runtime for tree- structured llm inference,
Z. Pan, Y . Ding, Y . Guan, Z. Wang, Z. Yu, X. Tang, Y . Wang, and Y . Ding, “Fasttree: Optimizing attention kernel and runtime for tree- structured llm inference,” inEighth Conference on Machine Learning and Systems
-
[14]
Large language models on graphs: A comprehensive survey,
B. Jin, G. Liu, C. Han, M. Jiang, H. Ji, and J. Han, “Large language models on graphs: A comprehensive survey,”IEEE Transactions on Knowledge and Data Engineering, 2024
work page 2024
-
[15]
An introduction to topological data analysis: fundamental and practical aspects for data scientists,
F. Chazal and B. Michel, “An introduction to topological data analysis: fundamental and practical aspects for data scientists,”Frontiers in artificial intelligence, vol. 4, p. 667963, 2021. 14
work page 2021
-
[16]
Topological Data Analysis Applications in Natural Language Processing: A Survey
A. Uchendu and T. Le, “Unveiling topological structures in text: A comprehensive survey of topological data analysis applications in nlp,” arXiv preprint arXiv:2411.10298, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
A survey on evaluation of large language models,
Y . Chang, X. Wang, J. Wang, Y . Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y . Wanget al., “A survey on evaluation of large language models,”ACM transactions on intelligent systems and technology, vol. 15, no. 3, pp. 1–45, 2024
work page 2024
-
[18]
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
Q. Chen, L. Qin, J. Liu, D. Peng, J. Guan, P. Wang, M. Hu, Y . Zhou, T. Gao, and W. Che, “Towards reasoning era: A survey of long chain-of-thought for reasoning large language models,”arXiv preprint arXiv:2503.09567, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Towards bet- ter chain-of-thought prompting strategies: A survey,
Z. Yu, L. He, Z. Wu, X. Dai, and J. Chen, “Towards bet- ter chain-of-thought prompting strategies: A survey,”arXiv preprint arXiv:2310.04959, 2023
-
[20]
Syzygy of thoughts: Improving llm cot with the minimal free resolution,
C. Li, C. Zhang, Y . Lu, J. Zhang, Q. Sun, X. Wang, J. Wei, G. Wang, Y . Yang, and H. T. Shen, “Syzygy of thoughts: Improving llm cot with the minimal free resolution,”arXiv preprint arXiv:2504.09566, 2025
-
[21]
M. Turpin, J. Michael, E. Perez, and S. Bowman, “Language models don’t always say what they think: Unfaithful explanations in chain- of-thought prompting,”Advances in Neural Information Processing Systems, vol. 36, pp. 74 952–74 965, 2023
work page 2023
-
[22]
Musr: Testing the limits of chain-of-thought with multistep soft reasoning,
Z. Sprague, X. Ye, K. Bostrom, S. Chaudhuri, and G. Durrett, “Musr: Testing the limits of chain-of-thought with multistep soft reasoning,” arXiv preprint arXiv:2310.16049, 2023
-
[23]
Mitigating misleading chain-of-thought reasoning with selective filtering,
Y . Wu, Z. Zhang, and H. Zhao, “Mitigating misleading chain-of-thought reasoning with selective filtering,”arXiv preprint arXiv:2403.19167, 2024
-
[24]
Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,
M.-V . Nguyen, L. Luo, F. Shiri, D. Phung, Y .-F. Li, T.-T. Vu, and G. Haffari, “Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs,”arXiv preprint arXiv:2402.11199, 2024
-
[25]
Dissecting logical reasoning in llms: A fine-grained evaluation and supervision study,
Y . Zhou, J. Ye, Z. Ling, Y . Han, Y . Huang, H. Zhuang, Z. Liang, K. Guo, T. Guo, X. Wanget al., “Dissecting logical reasoning in llms: A fine-grained evaluation and supervision study,”arXiv preprint arXiv:2506.04810, 2025
-
[26]
A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains,
A. Jacovi, Y . Bitton, B. Bohnet, J. Herzig, O. Honovich, M. Tseng, M. Collins, R. Aharoni, and M. Geva, “A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains,”arXiv preprint arXiv:2402.00559, 2024
-
[27]
A survey on the high- performance computation of persistent homology,
N. O. Malott, S. Chen, and P. A. Wilsey, “A survey on the high- performance computation of persistent homology,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 5, pp. 4466–4484, 2022
work page 2022
-
[28]
What is... persistent homology,
S. Weinberger, “What is... persistent homology,”Notices of the AMS, vol. 58, no. 1, pp. 36–39, 2011
work page 2011
-
[29]
Persistence-based motif dis- covery in time series,
T. Germain, C. Truong, and L. Oudre, “Persistence-based motif dis- covery in time series,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 11, pp. 6814–6827, 2024
work page 2024
-
[30]
Sparse-tda: Sparse realization of topological data analysis for multi-way classifica- tion,
W. Guo, K. Manohar, S. L. Brunton, and A. G. Banerjee, “Sparse-tda: Sparse realization of topological data analysis for multi-way classifica- tion,”IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 7, pp. 1403–1408, 2018
work page 2018
-
[31]
Persistent homology: theory and practice,
H. Edelsbrunner and D. Morozov, “Persistent homology: theory and practice,” inProceedings of the European congress of mathematics, vol. 2012, 2012
work page 2012
-
[32]
H. Edelsbrunner, J. Hareret al., “Persistent homology-a survey,”Con- temporary mathematics, vol. 453, no. 26, pp. 257–282, 2008
work page 2008
-
[33]
Persistent homology and persistent cohomology: A review,
B. A. Okediji, “Persistent homology and persistent cohomology: A review,”Earthline Journal of Mathematical Sciences, vol. 14, no. 2, pp. 349–378, 2024
work page 2024
-
[34]
The shape of things to come: Topological data analysis and biology, from molecules to organisms,
E. J. Am ´ezquita, M. Y . Quigley, T. Ophelders, E. Munch, and D. H. Chitwood, “The shape of things to come: Topological data analysis and biology, from molecules to organisms,”Developmental Dynamics, vol. 249, no. 7, pp. 816–833, 2020
work page 2020
-
[35]
Topological data analysis and its usefulness for precision medicine studies,
R. Iniesta, E. Carr, M. Carriere, N. Yerolemou, B. Michel, and F. Chazal, “Topological data analysis and its usefulness for precision medicine studies,”SORT: statistics and operations research transactions, vol. 46, no. 1, pp. 115–136, 2022
work page 2022
-
[36]
Topological analysis of ensembles of hydrodynamic turbulent flows an experimental study,
F. Nauleau, F. Vivodtzev, T. Bridel-Bertomeu, H. Beaugendre, and J. Tierny, “Topological analysis of ensembles of hydrodynamic turbulent flows an experimental study,” in2022 IEEE 12th Symposium on Large Data Analysis and Visualization (LDAV). IEEE, 2022, pp. 1–11
work page 2022
-
[37]
Y . Yang, S. Guo, S. Li, Y . Wu, and Z. Qiao, “Topological data analysis combined with high-throughput computational screening of hydrophobic metal–organic frameworks: Application to the adsorptive separation of c3 components,”Nanomaterials, vol. 14, no. 3, p. 298, 2024
work page 2024
-
[38]
Persistent homology for structural characteri- zation in disordered systems,
A. Wang and L. Zou, “Persistent homology for structural characteri- zation in disordered systems,”Physical Review E, vol. 111, no. 4, p. 045306, 2025
work page 2025
-
[39]
Tu,Efficient Algorithms and Applications in Topological Data Anal- ysis
J. Tu,Efficient Algorithms and Applications in Topological Data Anal- ysis. University of South Florida, 2019
work page 2019
-
[40]
Topological data analysis and computer science,
D. Adjei and G. A. Okyere, “Topological data analysis and computer science,”International Journal of Mathematics Trends and Technology- IJMTT, vol. 69, 2023
work page 2023
-
[41]
Topological data analysis and machine learning,
D. Leykam and D. G. Angelakis, “Topological data analysis and machine learning,”Advances in Physics: X, vol. 8, no. 1, p. 2202331, 2023
work page 2023
-
[42]
Self-refine: Iter- ative refinement with self-feedback,
A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y . Yanget al., “Self-refine: Iter- ative refinement with self-feedback,”Advances in Neural Information Processing Systems, vol. 36, pp. 46 534–46 594, 2023
work page 2023
-
[43]
AFlow: Automating Agentic Workflow Generation
J. Zhang, J. Xiang, Z. Yu, F. Teng, X. Chen, J. Chen, M. Zhuge, X. Cheng, S. Hong, J. Wanget al., “Aflow: Automating agentic workflow generation,”arXiv preprint arXiv:2410.10762, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,
Z. Bi, K. Han, C. Liu, Y . Tang, and Y . Wang, “Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,”arXiv preprint arXiv:2412.09078, 2024
-
[45]
Hot: High- lighted chain of thought for referencing supporting facts from inputs,
T. Nguyen, L. Bolton, M. R. Taesiri, and A. T. Nguyen, “Hot: High- lighted chain of thought for referencing supporting facts from inputs,” arXiv preprint arXiv:2503.02003, 2025
-
[46]
Instruction induction: From few examples to natural language task descriptions,
O. Honovich, U. Shaham, S. R. Bowman, and O. Levy, “Instruction induction: From few examples to natural language task descriptions,” arXiv preprint arXiv:2205.10782, 2022
-
[47]
From persona to personalization: A survey on role-playing language agents,
J. Chen, X. Wang, R. Xu, S. Yuan, Y . Zhang, W. Shi, J. Xie, S. Li, R. Yang, T. Zhuet al., “From persona to personalization: A survey on role-playing language agents,”arXiv preprint arXiv:2404.18231, 2024
-
[48]
M. Hewing and V . Leinhos, “The prompt canvas: a literature-based practitioner guide for creating effective prompts in large language models,”arXiv preprint arXiv:2412.05127, 2024
-
[49]
Measuring Mathematical Problem Solving With the MATH Dataset
D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt, “Measuring mathematical problem solving with the math dataset,”arXiv preprint arXiv:2103.03874, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[50]
C. He, R. Luo, Y . Bai, S. Hu, Z. L. Thai, J. Shen, J. Hu, X. Han, Y . Huang, Y . Zhanget al., “Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems,”arXiv preprint arXiv:2402.14008, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[51]
Training Verifiers to Solve Math Word Problems
K. Cobbe, V . Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakanoet al., “Training verifiers to solve math word problems,”arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[52]
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
M. Suzgun, N. Scales, N. Sch ¨arli, S. Gehrmann, Y . Tay, H. W. Chung, A. Chowdhery, Q. V . Le, E. H. Chi, D. Zhouet al., “Challenging big-bench tasks and whether chain-of-thought can solve them,”arXiv preprint arXiv:2210.09261, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[53]
Mmlu-cf: A contamination-free multi-task language understanding benchmark,
Q. Zhao, Y . Huang, T. Lv, L. Cui, Q. Sun, S. Mao, X. Zhang, Y . Xin, Q. Yin, S. Liet al., “Mmlu-cf: A contamination-free multi-task language understanding benchmark,”arXiv preprint arXiv:2412.15194, 2024
-
[54]
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,”arXiv preprint arXiv:2308.14508, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[55]
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “Hotpotqa: A dataset for diverse, explainable multi-hop question answering,”arXiv preprint arXiv:1809.09600, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[56]
musicalnotemusique: Multihop questions via single-hop question composition,
H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “musicalnotemusique: Multihop questions via single-hop question composition,”Transactions of the Association for Computational Lin- guistics, vol. 10, pp. 539–554, 2022
work page 2022
-
[57]
A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
Qwen2.5-Coder Technical Report
B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Luet al., “Qwen2. 5-coder technical report,”arXiv preprint arXiv:2409.12186, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[59]
A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.