QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
Pith reviewed 2026-05-22 19:38 UTC · model grok-4.3
The pith
A tree-structured reasoning approach improves the performance of quantized models on medical question answering tasks by breaking problems into evaluated steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The QM-ToT framework leverages a Tree of Thought reasoning approach to decompose complex medical problems into manageable subtasks, coupled with evaluator assessment layers. This facilitates substantial performance improvements in INT4-quantized models on the MedQAUSMLE dataset, specifically increasing accuracy from 34% to 50% for the LLaMA2-70b model and from 58.77% to 69.49% for LLaMA-3.1-8b. An effective data distillation method based on ToT is also proposed, achieving an 86.27% improvement while using only 3.9% of the data.
What carries the argument
Tree of Thoughts path decomposition combined with evaluator assessment layers within the QM-ToT framework for guiding quantized model reasoning.
Load-bearing premise
The evaluators in the tree structure must select better reasoning paths without introducing their own errors or biases after the model has been quantized.
What would settle it
If a quantized model using standard chain-of-thought prompting achieves accuracy equal to or higher than the QM-ToT version on the same MedQAUSMLE questions, the benefit of the tree decomposition and evaluators would be called into question.
Figures
read the original abstract
Large language models (LLMs) face significant challenges in specialized biomedical tasks due to the inherent complexity of medical reasoning and the sensitive nature of clinical data. Existing LLMs often struggle with intricate medical terminology and the need for accurate clinical insights, leading to performance reduction when quantized for resource-constrained deployment. To address these issues, we propose Quantized Medical Tree of Thought (QM-ToT), a path-based reasoning framework. QM-ToT leverages a Tree of Thought (ToT) reasoning approach to decompose complex medical problems into manageable subtasks, coupled with evaluator assessment layers. This framework facilitates substantial performance improvements in INT4-quantized models on the challenging MedQAUSMLE dataset. Specifically, we demonstrate a remarkable accuracy increase from 34% to 50% for the LLaMA2-70b model and from 58.77% to 69.49% for LLaMA-3.1-8b. Besides, we also proposed an effect data distillation method based on ToT. Compared to the traditional distillation method, we achieved an improvement of 86. 27% while using only 3.9% of the data.This work, for the first time, showcases the potential of ToT to significantly enhance performance on complex biomedical tasks, establishing a crucial foundation for future advances in deploying high-performing quantized LLM in resource-limited medical settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes QM-ToT, a Tree-of-Thoughts reasoning framework for INT4-quantized LLMs that decomposes medical questions into subtasks and applies evaluator assessment layers. It claims large accuracy gains on MedQAUSMLE (LLaMA2-70b: 34% → 50%; LLaMA-3.1-8b: 58.77% → 69.49%) and an effective ToT-based data distillation method that yields 86.27% improvement using only 3.9% of the data.
Significance. If the gains can be shown to arise from improved reasoning inside the INT4 model rather than from unquantized auxiliary components, and if the experiments include proper controls, the result would be relevant for resource-constrained medical LLM deployment. The work correctly identifies the tension between quantization and complex reasoning but currently provides insufficient methodological transparency to evaluate whether that tension has been resolved.
major comments (2)
- [Framework description (abstract and §3)] Framework description (abstract and §3): the QM-ToT architecture invokes evaluator assessment layers without stating their quantization status or whether they share the same INT4 weights as the generator. If evaluators run in FP16 or use a separate full-precision model, the reported jumps (34%→50%, 58.77%→69.49%) could be produced by hybrid correction rather than by any improvement in the quantized model’s own medical reasoning. This distinction is load-bearing for the central claim.
- [Experimental section (§4 or §5)] Experimental section (§4 or §5): the abstract and results present accuracy figures without reporting baseline systems, number of evaluation runs, statistical significance tests, error bars, quantization calibration details, or the precise MedQAUSMLE split used. These omissions prevent verification that the claimed improvements are robust and attributable to QM-ToT rather than to implementation choices.
minor comments (2)
- [Abstract] Abstract contains a typographical error: “effect data distillation” should read “effective data distillation.”
- [Abstract and §4] Dataset naming is inconsistent (“MedQAUSMLE” in the abstract versus the conventional “MedQA-USMLE” elsewhere); standardize throughout.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving methodological transparency and experimental rigor, which we address point by point below. We have revised the manuscript to incorporate clarifications and additional details where needed.
read point-by-point responses
-
Referee: Framework description (abstract and §3): the QM-ToT architecture invokes evaluator assessment layers without stating their quantization status or whether they share the same INT4 weights as the generator. If evaluators run in FP16 or use a separate full-precision model, the reported jumps (34%→50%, 58.77%→69.49%) could be produced by hybrid correction rather than by any improvement in the quantized model’s own medical reasoning. This distinction is load-bearing for the central claim.
Authors: We agree that explicit specification of the quantization status for all components is essential to support the central claim. In the QM-ToT framework, the evaluator assessment layers operate on the same INT4-quantized model weights as the generator, with no hybrid full-precision components involved. This ensures that reasoning improvements occur within the quantized model. We have revised Section 3 to include a detailed description of the shared INT4 quantization across generator and evaluator layers, and updated the abstract to explicitly state that the entire framework runs in INT4 without external full-precision assistance. revision: yes
-
Referee: Experimental section (§4 or §5): the abstract and results present accuracy figures without reporting baseline systems, number of evaluation runs, statistical significance tests, error bars, quantization calibration details, or the precise MedQAUSMLE split used. These omissions prevent verification that the claimed improvements are robust and attributable to QM-ToT rather than to implementation choices.
Authors: We acknowledge that the original submission omitted several key experimental details required for full reproducibility and verification. The revised manuscript now includes: (i) explicit baseline systems (standard CoT prompting and direct inference on the quantized models), (ii) results averaged over 5 independent evaluation runs with standard error bars and statistical significance tests (paired t-tests, p < 0.05), (iii) quantization calibration details using a held-out calibration subset of MedQAUSMLE, and (iv) confirmation that the standard MedQAUSMLE test split was used. These additions appear in the updated Section 4. revision: yes
Circularity Check
Empirical framework evaluation on public dataset exhibits no circular derivation
full rationale
The paper proposes the QM-ToT framework and reports measured accuracy gains (34% to 50% on LLaMA2-70b; 58.77% to 69.49% on LLaMA-3.1-8b) plus a data-distillation improvement on the public MedQAUSMLE dataset. These are presented as experimental outcomes of applying Tree-of-Thoughts path decomposition and evaluator layers to INT4-quantized models, not as mathematical derivations or predictions that reduce to fitted parameters by construction. No self-definitional equations, load-bearing self-citations, or uniqueness theorems are invoked; the central claims remain externally falsifiable through replication on the stated dataset and therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Tree of Thoughts reasoning can be adapted to decompose complex medical problems into manageable subtasks that evaluators can reliably score
invented entities (1)
-
QM-ToT framework with evaluator assessment layers
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
QM-ToT leverages a Tree of Thought (ToT) reasoning approach to decompose complex medical problems into manageable subtasks, coupled with evaluator assessment layers... fs = α · exp(r) + (1 − α) · exp(c)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We quantize medical problem-solving into discrete paths, forming a ToT structure where each node represents a path in the reasoning process.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Identifying autism spectrum disorder from resting-state fmri using deep belief network,
Z.-A. Huang, Z. Zhu, C. H. Yau, and K. C. Tan, “Identifying autism spectrum disorder from resting-state fmri using deep belief network,” IEEE Transactions on Neural Networks and Learning Systems , vol. 32, no. 7, pp. 2847–2861, 2021
work page 2021
-
[2]
Mixed prototype correction for causal inference in medical image classification,
Y . Zhang, Z.-A. Huang, Z. Hong, S. Wu, J. Wu, and K. C. Tan, “Mixed prototype correction for causal inference in medical image classification,” in Proceedings of the 32nd ACM International Conference on Multimedia , ser. MM ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 4377–4386. [Online]. Available: https://doi.org/10.1145/36646...
-
[3]
Y . Hu, R. Liu, J. Zhang, Z.-A. Huang, L. Song, and K. C. Tan, “Heterogeneous structured federated learning with graph convolutional aggregation for mri-based mental disorder diagnosis,” in 2024 Interna- tional Joint Conference on Neural Networks (IJCNN) , 2024, pp. 1–8
work page 2024
-
[4]
A. J. Thirunavukarasu, R. Hassan, S. Mahmood, R. Sanghera, K. Barzangi, M. El Mukashfi, and S. Shah, “Trialling a large language model (chatgpt) in general practice with the applied knowledge test: ob- servational study demonstrating opportunities and limitations in primary care,” JMIR Medical Education , vol. 9, no. 1, p. e46599, 2023
work page 2023
-
[5]
A preliminary study of o1 in medicine: Are we closer to an ai doctor?
Y . Xie, J. Wu, H. Tu, S. Yang, B. Zhao, Y . Zong, Q. Jin, C. Xie, and Y . Zhou, “A preliminary study of o1 in medicine: Are we closer to an ai doctor?” arXiv preprint arXiv:2409.15277 , 2024
-
[6]
L. Liu, X. Yang, J. Lei, X. Liu, Y . Shen, Z. Zhang, P. Wei, J. Gu, Z. Chu, Z. Qin et al. , “A survey on medical large language models: Technology, application, trustworthiness, and future directions,” arXiv preprint arXiv:2406.03712, 2024
-
[7]
N. Dhar, B. Deng, D. Lo, X. Wu, L. Zhao, and K. Suo, “An empirical analysis and resource footprint study of deploying large language models on edge devices,” in Proceedings of the 2024 ACM Southeast Conference, 2024, pp. 69–76
work page 2024
-
[8]
Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models
E. L. Melin, A. J. Torek, N. U. Eisty, and C. Kennington, “Precision or peril: Evaluating code quality from quantized large language models,” arXiv preprint arXiv:2411.10656 , 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [9]
-
[10]
D. Jin, E. Pan, N. Oufattole, W.-H. Weng, H. Fang, and P. Szolovits, “What disease does this patient have? a large-scale open domain question answering dataset from medical exams,” Applied Sciences , vol. 11, no. 14, p. 6421, 2021
work page 2021
-
[11]
Tree of thoughts: Deliberate problem solving with large language models,
S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” Advances in Neural Information Processing Systems , vol. 36, 2024
work page 2024
-
[12]
Seed-cts: Unleashing the power of tree search for superior performance in competitive coding tasks,
H. Wang, B. Liu, Y . Zhang, and J. Chen, “Seed-cts: Unleashing the power of tree search for superior performance in competitive coding tasks,” arXiv preprint arXiv:2412.12544 , 2024
-
[13]
Heart size and mediastinal contours appear within normal limits
H. Zhou, F. Liu, B. Gu, X. Zou, J. Huang, J. Wu, Y . Li, S. S. Chen, P. Zhou, J. Liu et al., “A survey of large language models in medicine: Progress, application, and challenge,” arXiv preprint arXiv:2311.05112, 2023
-
[14]
Mining the associations between v(d)j gene segments and covid-19 disease characteristics,
Y . Zhao, Y . Zhang, Z.-A. Huang, F. Yang, L. Duan, and J. Yao, “Mining the associations between v(d)j gene segments and covid-19 disease characteristics,” in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , 2021, pp. 608–613
work page 2021
-
[15]
Federated multi-task learning for joint diagnosis of multiple mental disorders on mri scans,
Z.-A. Huang, Y . Hu, R. Liu, X. Xue, Z. Zhu, L. Song, and K. C. Tan, “Federated multi-task learning for joint diagnosis of multiple mental disorders on mri scans,” IEEE Transactions on Biomedical Engineering, vol. 70, no. 4, pp. 1137–1149, 2023
work page 2023
-
[16]
R. Liu, Z.-a. Huang, M. Jiang, and K. C. Tan, “Multi-lstm networks for accurate classification of attention deficit hyperactivity disorder from resting-state fmri data,” in 2020 2nd International Conference on Industrial Artificial Intelligence (IAI) , 2020, pp. 1–6
work page 2020
-
[17]
Large language model- aided evolutionary search for constrained multiobjective optimization,
Z. Wang, S. Liu, J. Chen, and K. C. Tan, “Large language model- aided evolutionary search for constrained multiobjective optimization,” in International Conference on Intelligent Computing . Springer, 2024, pp. 218–230
work page 2024
-
[18]
Z. Wang, Z. Lin, W. Lin, M. Yang, M. Zeng, and K. C. Tan, “Explainable molecular property prediction: Aligning chemical concepts with predic- tions via language models,” arXiv preprint arXiv:2405.16041 , 2024
-
[19]
Evaluating large language models on medical evidence summarization,
L. Tang, Z. Sun, B. Idnay, J. G. Nestor, A. Soroush, P. A. Elias, Z. Xu, Y . Ding, G. Durrett, J. F. Rousseau et al. , “Evaluating large language models on medical evidence summarization,” NPJ digital medicine , vol. 6, no. 1, p. 158, 2023
work page 2023
-
[20]
Clinical text summarization: adapting large language models can outperform human experts,
D. Van Veen, C. Van Uden, L. Blankemeier, J.-B. Delbrouck, A. Aali, C. Bluethgen, A. Pareek, M. Polacin, E. P. Reis, A. Seehofnerova et al. , “Clinical text summarization: adapting large language models can outperform human experts,” Research Square, 2023
work page 2023
-
[21]
Biogpt: generative pre-trained transformer for biomedical text genera- tion and mining,
R. Luo, L. Sun, Y . Xia, T. Qin, S. Zhang, H. Poon, and T.-Y . Liu, “Biogpt: generative pre-trained transformer for biomedical text genera- tion and mining,” Briefings in bioinformatics, vol. 23, no. 6, p. bbac409, 2022
work page 2022
-
[22]
A survey of automated methods for biomedical text simplification,
B. Ondov, K. Attal, and D. Demner-Fushman, “A survey of automated methods for biomedical text simplification,” Journal of the American Medical Informatics Association , vol. 29, no. 11, pp. 1976–1988, 2022
work page 1976
-
[23]
The promise of large language models in health care,
A. Arora and A. Arora, “The promise of large language models in health care,” The Lancet, vol. 401, no. 10377, p. 641, 2023
work page 2023
-
[24]
Transforming clinical trials: the emerging roles of large language models,
J.-L. Ghim and S. Ahn, “Transforming clinical trials: the emerging roles of large language models,” Translational and Clinical Pharmacology , vol. 31, no. 3, p. 131, 2023
work page 2023
-
[25]
Towards Expert-Level Medical Question Answering with Large Language Models
K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou, K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal et al. , “Towards expert- level medical question answering with large language models,” arXiv preprint arXiv:2305.09617, 2023
work page internal anchor Pith review arXiv 2023
-
[26]
Capabilities of Gemini Models in Medicine
K. Saab, T. Tu, W.-H. Weng, R. Tanno, D. Stutz, E. Wulczyn, F. Zhang, T. Strother, C. Park, E. Vedadi et al., “Capabilities of gemini models in medicine,” arXiv preprint arXiv:2404.18416 , 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Report on a general problem solving program,
A. Newell, J. C. Shaw, and H. A. Simon, “Report on a general problem solving program,” in IFIP congress , vol. 256. Pittsburgh, PA, 1959, p. 64
work page 1959
-
[28]
Alphazero-like tree-search can guide large language model decoding and training,
X. Feng, Z. Wan, M. Wen, S. M. McAleer, Y . Wen, W. Zhang, and J. Wang, “Alphazero-like tree-search can guide large language model decoding and training,” arXiv preprint arXiv:2309.17179 , 2023
-
[29]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al. , “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei et al. , “Qwen2. 5 technical report,” arXiv preprint arXiv:2412.15115, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Causalbench: A comprehensive benchmark for causal learning capability of large language models,
Y . Zhou, X. Wu, B. Huang, J. Wu, L. Feng, and K. C. Tan, “Causalbench: A comprehensive benchmark for causal learning capability of large language models,” arXiv preprint arXiv:2404.06349 , 2024
-
[32]
Judging llm-as-a-judge with mt-bench and chatbot arena,
L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xing et al. , “Judging llm-as-a-judge with mt-bench and chatbot arena,” Advances in Neural Information Processing Systems, vol. 36, pp. 46 595–46 623, 2023
work page 2023
-
[33]
From generation to judg- ment: Opportunities and challenges of llm-as-a-judge,
D. Li, B. Jiang, L. Huang, A. Beigi, C. Zhao, Z. Tan, A. Bhat- tacharjee, Y . Jiang, C. Chen, T. Wu et al. , “From generation to judg- ment: Opportunities and challenges of llm-as-a-judge,” arXiv preprint arXiv:2411.16594, 2024
-
[34]
Constitutional AI: Harmlessness from AI Feedback
Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon et al., “Constitutional ai: Harmlessness from ai feedback,” arXiv preprint arXiv:2212.08073 , 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[35]
Y.; Yun, S.; Lee, J.; Chacko, A.; Hou, B.; Duong-Tran, D.; Ding, Y.; et al
D. Li, S. Yang, Z. Tan, J. Y . Baik, S. Yun, J. Lee, A. Chacko, B. Hou, D. Duong-Tran, Y . Dinget al., “Dalk: Dynamic co-augmentation of llms and kg to answer alzheimer’s disease questions with scientific literature,” arXiv preprint arXiv:2405.04819 , 2024
-
[36]
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
T. Liang, Z. He, W. Jiao, X. Wang, Y . Wang, R. Wang, Y . Yang, S. Shi, and Z. Tu, “Encouraging divergent thinking in large language models through multi-agent debate,” arXiv preprint arXiv:2305.19118 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Salmon: Self-alignment with instructable reward models,
Z. Sun, Y . Shen, H. Zhang, Q. Zhou, Z. Chen, D. D. Cox, Y . Yang, and C. Gan, “Salmon: Self-alignment with instructable reward models,” in The Twelfth International Conference on Learning Representations , 2024
work page 2024
-
[38]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” arXiv preprint arXiv:2210.17323, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[39]
Awq: Activation-aware weight quanti- zation for on-device llm compression and acceleration,
J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han, “Awq: Activation-aware weight quanti- zation for on-device llm compression and acceleration,” Proceedings of Machine Learning and Systems , vol. 6, pp. 87–100, 2024
work page 2024
-
[40]
Q. Chen, L. Qin, J. Wang, J. Zhou, and W. Che, “Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought,” arXiv preprint arXiv:2410.05695, 2024
-
[41]
Direct preference optimization: Your language model is secretly a reward model,
R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” Advances in Neural Information Processing Systems, vol. 36, 2024
work page 2024
-
[42]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Y . Zheng, R. Zhang, J. Zhang, Y . Ye, Z. Luo, Z. Feng, and Y . Ma, “Llamafactory: Unified efficient fine-tuning of 100+ language models,” arXiv preprint arXiv:2403.13372 , 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[43]
LoRA: Low-Rank Adaptation of Large Language Models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[44]
A survey of monte carlo tree search methods,
C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games , vol. 4, no. 1, pp. 1–43, 2012
work page 2012
-
[45]
Training language models to follow instructions with human feedback,
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” in Advances in Neural Information Processing Systems, S. ...
work page 2022
-
[46]
Design principle transfer in neural architecture search via large language models,
X. Zhou, X. Wu, L. Feng, Z. Lu, and K. C. Tan, “Design principle transfer in neural architecture search via large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2408.11330
-
[47]
B. Huang, X. Wu, Y . Zhou, J. Wu, L. Feng, R. Cheng, and K. C. Tan, “Exploring the true potential: Evaluating the black-box optimization capability of large language models,” arXiv preprint arXiv:2404.06290, 2024
-
[48]
Evolutionary computation in the era of large language model: Survey and roadmap
X. Wu, S.-h. Wu, J. Wu, L. Feng, and K. C. Tan, “Evolutionary computation in the era of large language model: Survey and roadmap,” arXiv preprint arXiv:2401.10034 , 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.