pith. sign in

arxiv: 2605.20194 · v1 · pith:QYFKU64Anew · submitted 2026-04-04 · 💻 cs.CL · cs.AI· cs.LG

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

Pith reviewed 2026-05-21 09:25 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords LLM reasoningparallel processinglong document analysisbias reductionevidence anchoringomission errorconceptual abstractiontext chunking
0
0 comments X

The pith

Parallel chunk processing lets LLMs analyze long documents with far less bias from early concepts and fewer unsupported claims.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that sequential reading of long texts lets dominant early ideas crowd out other interpretations, producing omission errors, over-generalization, and weak evidence links. It proposes splitting the text into semantically coherent chunks, running independent parallel inferences on each, then merging the results with explicit evidence anchors and prioritization rules. Experiments across model sizes show this approach cuts omission error by roughly 84 percent, raises traceable evidence by up to 130 percent, and lowers unsupported claims by up to 91 percent, with smaller models gaining the largest relative improvement. A sympathetic reader would care because reliable abstraction from lengthy sources matters for applications such as policy review, scientific literature synthesis, and legal document analysis. The central mechanism is the removal of sequential context carry-over combined with forced grounding at the consolidation step.

Core claim

Texts are divided into semantically coherent chunks and processed independently in parallel to eliminate influence from earlier processing; the resulting interpretations are then consolidated through explicit evidence anchoring and prioritization that reduces dominance, over-generalization, redundancy, conceptual drift, and unsupported claims while improving traceability. Experiments with multiple model types and sizes indicate that this parallel processing significantly reduces omission error by approximately 84%, increases evidence traceability by up to 130%, and reduces unsupported claims by up to 91%.

What carries the argument

Parallel chunk-level processing followed by evidence-anchored consolidation, which isolates each segment to prevent sequential bias and then enforces explicit grounding during merging.

If this is right

  • Omission of less prominent but relevant concepts drops sharply because no single prefix can dominate the entire analysis.
  • Evidence traceability rises because each consolidated claim must link back to a specific chunk rather than an opaque merged output.
  • Unsupported or drifted claims decline when consolidation is forced to prioritize explicit evidence over synthesized generality.
  • Smaller models achieve reliability closer to larger ones, suggesting the method reduces the performance gap caused by limited context windows.
  • Scalable textual analysis becomes feasible for documents too long for single-pass processing without cumulative bias.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same chunk-and-anchor pattern could be tested on multi-turn dialogue or streaming text to see whether it prevents drift across conversation turns.
  • If chunk boundaries are chosen by topic segmentation models rather than fixed length, the reduction in omission error might increase further on highly structured documents such as scientific papers.
  • The framework implies that future LLM pipelines could treat consolidation as a separate, auditable reasoning stage rather than an implicit final generation step.

Load-bearing premise

Splitting a document into semantically coherent chunks and processing them independently removes earlier influence without creating new biases, context loss, or boundary artifacts.

What would settle it

Run the same long-document abstraction task on a fresh corpus using both the parallel method and standard sequential prompting, then measure whether omission error, unsupported claims, and traceability scores remain statistically unchanged or worsen under the parallel approach.

Figures

Figures reproduced from arXiv: 2605.20194 by Adeyemi Adeseye, Aisvarya Adeseye, Jouni Isoaho.

Figure 1
Figure 1. Figure 1: Parallel Evidence-Constrained Independent Inference (PECII) 3.1 Layer 0: Trace-Preserving Long-Form Textual Document Ingestion and Normalization This layer converts each long-form textual document into traceable textual ob￾jects. The key output is not only the extracted text, but also the mapping to the source document. This mapping is required for evidence anchoring in later lay￾ers. It is also required f… view at source ↗
Figure 2
Figure 2. Figure 2: Experiment Setting of the Study Human Ground Truth Construction: Two independent qualitative researchers conducted thematic analysis using NVivo-assisted coding. Each researcher inde￾pendently coded the full interview transcripts and generated an initial set of themes, definitions, and supporting excerpts. The researchers were blinded to each other’s coding decisions during the initial phase to avoid mutua… view at source ↗
Figure 3
Figure 3. Figure 3: Scaling of PECII Perfor￾mance Gains Across Model Sizes (Log-Scale Analysis of Improve￾ment Trends) Parallel chunk-level execution resolves this limitation by ensuring independence across analytical units. Omission error decreases by approximately 80–84% across all models, and early-chunk dominance is reduced by over 80%, indicating that bias mitigation is primar￾ily execution-driven rather than scale-drive… view at source ↗
Figure 4
Figure 4. Figure 4: Global Improvement Heatmap: Percentage Change of Models and Evaluation Metrics The global improve￾ment pattern is visible in [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

Large language models (LLMs) have been increasingly used to analyze text. However, they are often plagued with contextual reasoning limitations when analyzing long documents. When long documents are processed sequentially, early or dominant concepts can overshadow less visible but meaningful interpretations, leading to cumulative analytical bias, omission error, and over-generalization. Additionally, independently generated outputs are often merged without systematic grounding, introducing redundancy, conceptual drift, and unsupported claims. This study proposes a structured framework combining parallel chunk-level processing with evidence-anchored consolidation. Texts are first divided into semantically coherent chunks and processed independently in parallel to remove influence from earlier processing. The independently generated interpretations are then consolidated using explicit evidence anchoring and prioritization that reduces dominance and over-generalization while improving traceability. Experiments with multiple model types and sizes indicate that parallel processing significantly reduces omission error by approximately 84%, increases evidence traceability by up to 130%, and reduces unsupported claims by up to 91%. Smaller models benefited most, suggesting that efficient parallel chunking and consolidation play a critical role in achieving reliable and scalable textual analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a framework for analyzing long documents with LLMs that divides texts into semantically coherent chunks for independent parallel processing, followed by evidence-anchored consolidation to reduce sequential bias, omission errors, and unsupported claims. Experiments across multiple model types and sizes are reported to yield an approximately 84% reduction in omission error, up to 130% increase in evidence traceability, and up to 91% reduction in unsupported claims, with smaller models showing the largest gains.

Significance. If the quantitative results prove reproducible under controlled conditions that isolate chunk-boundary effects, the approach could meaningfully advance reliable LLM use on extended texts by mitigating cumulative bias and improving output grounding. The emphasis on smaller models benefiting most also suggests potential efficiency gains, though this hinges on the untested assumption that consolidation fully recovers cross-chunk relations without introducing new artifacts.

major comments (2)
  1. [Abstract] Abstract: The abstract asserts specific quantitative gains (84% omission-error reduction, 130% traceability increase, 91% unsupported-claim reduction) yet supplies no experimental design, metric definitions, baseline comparisons, statistical tests, dataset details, or model specifications. This absence directly undermines verification of the central empirical claims.
  2. [Methods] Methods / chunking and consolidation description: The claim that independent parallel processing of semantically coherent chunks removes sequential bias rests on the unexamined assumption that chunk boundaries do not sever critical cross-chunk dependencies and that the subsequent evidence-anchored consolidation reliably reconstructs them. No procedure for boundary handling, dependency preservation, or consolidation queries is provided, leaving open the possibility that measured improvements partly reflect reduced interference rather than genuine bias resilience.
minor comments (1)
  1. [Abstract] Abstract: Consider adding one sentence on the datasets or document domains used in the experiments to allow readers to gauge generalizability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate revisions to be incorporated in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts specific quantitative gains (84% omission-error reduction, 130% traceability increase, 91% unsupported-claim reduction) yet supplies no experimental design, metric definitions, baseline comparisons, statistical tests, dataset details, or model specifications. This absence directly undermines verification of the central empirical claims.

    Authors: The abstract provides a concise summary of the primary findings as is conventional. Full experimental details—including model specifications across multiple LLM types and sizes, dataset descriptions for long-document analysis, precise metric definitions for omission error, evidence traceability, and unsupported claims, baseline comparisons between sequential and parallel processing, and statistical tests—are presented in the Methods and Results sections. To improve immediate verifiability, we will revise the abstract to include a brief clause referencing the experimental framework (e.g., 'via controlled experiments on diverse LLMs and corpora with defined metrics and baselines'). revision: yes

  2. Referee: [Methods] Methods / chunking and consolidation description: The claim that independent parallel processing of semantically coherent chunks removes sequential bias rests on the unexamined assumption that chunk boundaries do not sever critical cross-chunk dependencies and that the subsequent evidence-anchored consolidation reliably reconstructs them. No procedure for boundary handling, dependency preservation, or consolidation queries is provided, leaving open the possibility that measured improvements partly reflect reduced interference rather than genuine bias resilience.

    Authors: We agree that greater explicitness on these procedural elements would strengthen the manuscript. The Methods section describes semantic chunking and evidence-anchored consolidation, but we will expand it with a dedicated subsection detailing the chunk-boundary algorithm (embedding-based semantic coherence detection), explicit mechanisms for preserving and reconstructing cross-chunk dependencies during consolidation, and the precise query templates used for evidence anchoring. The reported gains in traceability and unsupported-claim reduction, which exceed what would be expected from interference reduction alone, provide empirical support that the improvements reflect genuine bias resilience; we will add a short discussion clarifying this distinction. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results independent of derivation chain

full rationale

The paper describes a procedural framework for parallel chunk-level LLM processing followed by evidence-anchored consolidation and reports experimental outcomes on omission error, traceability, and unsupported claims. No equations, fitted parameters, or mathematical derivations appear in the provided text. Central claims rest on measured performance deltas across model sizes rather than any self-definitional reduction, fitted-input prediction, or self-citation load-bearing step. The absence of a derivation chain that reduces to its own inputs by construction makes the circularity score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract describes an empirical prompting and post-processing framework with no mathematical axioms, free parameters, or newly postulated entities.

pith-pipeline@v0.9.0 · 5724 in / 1080 out tokens · 56362 ms · 2026-05-21T09:25:33.328397+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

  1. [1]

    In: 2023 IEEE International Conference on Big Data (BigData), pp

    Maragheh, R.Y., Fang, C., Irugu, C.C., Parikh, P., Cho, J., Xu, J., Sukumar, S., Pa- tel, M., Korpeoglu, E., Kumar, S., Achan, K.: LLM-TAKE: Theme-aware keyword extraction using large language models. In: 2023 IEEE International Conference on Big Data (BigData), pp. 4318–4324. IEEE (2023). doi:10.1109/BigData59044. 2023.10386476 22 A. Adeseye et al

  2. [2]

    arXiv preprint arXiv:2503.03666 (2025)

    Opie lka, G., Rosenbusch, H., Stevenson, C.E.: Analogical reasoning inside large language models: concept vectors and the limits of abstraction. arXiv preprint arXiv:2503.03666 (2025). https://arxiv.org/abs/2503.03666

  3. [3]

    Procedia Comput

    Adeseye, A., Isoaho, J., Mohammad, T.: LLM-assisted qualitative data analysis: security and privacy concerns in gamified workforce studies. Procedia Comput. Sci. 257, 60–67 (2025). doi:10.1016/j.procs.2025.03.011

  4. [4]

    In: Gupta, V.B., Shandilya, S.K., Ortiz-Rodr´ ıguez, F., Martinez- Rodriguez, J.L

    Godbole, A., George, J.G., Shandilya, S.: Leveraging long-context large lan- guage models for multi-document understanding and summarization in enterprise applications. In: Gupta, V.B., Shandilya, S.K., Ortiz-Rodr´ ıguez, F., Martinez- Rodriguez, J.L. (eds.) Business Intelligence, Computational Mathematics, and Data Analytics, pp. 208–224. Springer Nature...

  5. [5]

    IEEE Access 12, 26839–26874 (2024)

    Raiaan, M.A.K., Mukta, M.S.H., Fatema, K., Fahad, N.M., Sakib, S., Mim, M.M.J., Ahmad, J., Ali, M.E., Azam, S.: A review on large language models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 12, 26839–26874 (2024). doi:10.1109/ACCESS.2024.3365742

  6. [6]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Lin, J., Yin, H., Ping, W., Molchanov, P., Shoeybi, M., Han, S.: VILA: On pre- training for visual language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 26689–26699 (2024)

  7. [7]

    Liang, J., Lin, Y., Liu, Y., Huang, Y., Wu, H.: Pre-training large language mod- els based on Transformer architecture for building industry application: a review. Build. Simul. 18(11), 2875–2898 (2025). doi:10.1007/s12273-025-1324-9

  8. [8]

    Sebastian, R., Kottekkadan, N.N., Thomas, T.K., Niyas, M.K.K.: Generative AI tools (ChatGPT*) in social science research. J. Inf. Commun. Ethics Soc. 23(2), 284–290 (2025). doi:10.1108/JICES-10-2024-0145

  9. [9]

    Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: how language models use long contexts. Trans. Assoc. Comput. Linguist. 12, 157–173 (2024). doi:10.1162/tacl a 00638

  10. [10]

    In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

    An, S., Ma, Z., Lin, Z., Zheng, N., Lou, J.-G., Chen, W.: Make your LLM fully utilize the context. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Systems, vol. 37, pp. 62160–62188. Curran Associates, Inc. (2024). doi:10.52202/079017-1986

  11. [11]

    In: 2025 IEEE 5th International Conference on Human-Machine Systems (ICHMS), pp

    Adeseye, A., Isoaho, J., Tahir, M.: Systematic prompt framework for qualitative data analysis: designing system and user prompts. In: 2025 IEEE 5th International Conference on Human-Machine Systems (ICHMS), pp. 229–234. IEEE (2025). doi: 10.1109/ICHMS65439.2025.11154183

  12. [12]

    Wiley (2024)

    Firstova, K., Ramirez, E., Castillo, T., Arvidsson, K., Larsen, A.: Investigating contextual layer fusion in recent open source large language models for context retention and comprehension. Wiley (2024). doi:10.22541/au.173084396.61519632/ v1

  13. [13]

    arXiv preprint arXiv:2508.05305 (2025)

    Dragunov, N., Rahmatullaev, T., Goncharova, E., Kuznetsov, A., Razzhigaev, A.: SONAR-LLM: autoregressive transformer that thinks in sentence embeddings and speaks in tokens. arXiv preprint arXiv:2508.05305 (2025). https://arxiv.org/abs/ 2508.05305

  14. [14]

    In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

    Yadkori, Y.A., Kuzborskij, I., Gy¨ orgy, A., Szepesv´ ari, C.: To believe or not to believe your LLM: iterative prompting for estimating epistemic uncertainty. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Systems, vol. 37, pp. 58077– 58117. Curran Associates, Inc....

  15. [15]

    InsideOut: Measuring and Mitigating Insider-Outsider Bias in Interview Script Generation

    Wan, Y., Chen, X., Chang, K.-W.: Which cultural lens do models adopt? On cultural positioning bias and agentic mitigation in LLMs. arXiv preprint arXiv:2509.21080 (2025). https://arxiv.org/abs/2509.21080

  16. [16]

    In: Arabnia, H.R., Deligian- nidis, L., Amirian, S., Ghareh Mohammadi, F., Shenavarmasouleh, F

    Adeseye, A., Isoaho, J., Tahir, M.: Performance evaluation of LLM hallucination reduction strategies for reliable qualitative analysis. In: Arabnia, H.R., Deligian- nidis, L., Amirian, S., Ghareh Mohammadi, F., Shenavarmasouleh, F. (eds.) AI Revolution: Research, Ethics and Society, pp. 142–156. Springer Nature Switzer- land, Cham (2026). doi:10.1007/978-...

  17. [17]

    Large language models for intelligent data stewardship in enterprises: architectures, provenance, and evidence-mapped governance. Int. J. Comput. Technol. Electron. Commun. 7(1), 8210–8219 (2024). doi:10.15680/IJCTECE.2024.0701007

  18. [18]

    In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25), Art

    Kambhamettu, H., Flores, J., Head, A.: Traceable texts and their effects: a study of summary-source links in AI-generated summaries. In: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25), Art. 538. ACM, New York (2025). doi:10.1145/3706599.3719830

  19. [19]

    In: 2025 25th International Conference on Control Systems and Computer Science (CSCS), pp

    Ioan, A., Rosner, D., Radovici, A.: Generative AI and inter-rater reliability: LLM consistency in coding orders of worth in digital political debates. In: 2025 25th International Conference on Control Systems and Computer Science (CSCS), pp. 633–640. IEEE (2025). doi:10.1109/CSCS66924.2025.00099

  20. [20]

    In: 2025 IEEE Nordic Circuits and Systems Conference (NorCAS), pp

    Adeseye, A., Isoaho, J., Virtanen, S., Tahir, M.: Efficient prompt design for resource-constrained deployment of local LLMs. In: 2025 IEEE Nordic Circuits and Systems Conference (NorCAS), pp. 1–7. IEEE (2025). doi:10.1109/NorCAS66540. 2025.11231309

  21. [21]

    (eds.) Proceedings of the 62nd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers), pp

    Lu, S., Bigoulaeva, I., Sachdeva, R., Tayyar Madabushi, H., Gurevych, I.: Are emer- gent abilities in large language models just in-context learning? In: Ku, L.-W., Mar- tins, A., Srikumar, V. (eds.) Proceedings of the 62nd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers), pp. 5098–5139. As- sociation for Computati...

  22. [22]

    Smart Health 36, 100577 (2025)

    Zagar, P., Ravi, V., Aalami, L., Krusche, S., Aalami, O., Schmiedmayer, P.: Dy- namic fog computing for enhanced LLM execution in medical applications. Smart Health 36, 100577 (2025). doi:10.1016/j.smhl.2025.100577

  23. [23]

    ACM Trans

    Lu, K., Wei, Q., Lin, Y., Liu, P., Wang, H., Wan, J., Yao, T., Wu, H., Wang, D.: Q- Infer: towards efficient GPU-CPU collaborative LLM inference via sparsity-aware dynamic scheduling. ACM Trans. Archit. Code Optim. 22(4), Art. 168 (2025). doi:10.1145/3764589

  24. [24]

    ACM Comput

    Zhang, H., Song, H., Li, S., Zhou, M., Song, D.: A survey of controllable text generation using transformer-based pre-trained language models. ACM Comput. Surv. 56(3), Art. 64 (2023). doi:10.1145/3617680

  25. [25]

    ACM Comput

    Shen, L., Sun, Y., Yu, Z., Ding, L., Tian, X., Tao, D.: On efficient training of large-scale deep learning models. ACM Comput. Surv. 57(3), Art. 57 (2024). doi: 10.1145/3700439

  26. [26]

    ACM Comput

    Menghani, G.: Efficient deep learning: a survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 55(12), Art. 259 (2023). doi:10. 1145/3578938

  27. [28]

    Al Nazi, Z., Hossain, M.R., Al Mamun, F.: Evaluation of open and closed-source LLMs for low-resource language with zero-shot, few-shot, and chain-of-thought prompting. Nat. Lang. Process. J. 10, 100124 (2025). doi:10.1016/j.nlp.2024.100124

  28. [29]

    In: Bouamor, H., Pino, J., Bali, K

    Rawte, V., Chakraborty, S., Pathak, A., Sarkar, A., Tonmoy, S.M.T.I., Chadha, A., Sheth, A., Das, A.: The troubling emergence of hallucination in large lan- guage models: an extensive definition, quantification, and prescriptive remedia- tions. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Con- ference on Empirical Methods in Natural ...

  29. [30]

    In: Bouamor, H., Pino, J., Bali, K

    Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, X., Wen, J.-R.: Evaluating object hallu- cination in large vision-language models. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), pp. 292–305. Association for Computational Linguis- tics (2023). doi:10.18653/v1/2023.em...

  30. [31]

    In: Zaharia, M., Joshi, G., Lin, Y

    Yang, S., Guo, J., Tang, H., Hu, Q., Xiao, G., Tang, J., Lin, Y., Liu, Z., Lu, Y., Han, S.: LServe: efficient long-sequence LLM serving with unified sparse attention. In: Zaharia, M., Joshi, G., Lin, Y. (eds.) Proceedings of Machine Learning and Systems, vol. 7. MLSys (2025)

  31. [32]

    In: Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing (PODC ’24), pp

    Jacobs, S.A., Tanaka, M., Zhang, C., Zhang, M., Aminadabi, R.Y., Song, S.L., Rajbhandari, S., He, Y.: System optimizations for enabling training of extreme long sequence transformer models. In: Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing (PODC ’24), pp. 121–130. ACM, New York (2024). doi:10.1145/3662158.3662806

  32. [33]

    In: 2024 8th International Symposium on Innovative Approaches in Smart Technologies (ISAS), pp

    Tural, B., ¨Orpek, Z., Destan, Z.: Retrieval-augmented generation (RAG) and LLM integration. In: 2024 8th International Symposium on Innovative Approaches in Smart Technologies (ISAS), pp. 1–5. IEEE (2024). doi:10.1109/ISAS64331.2024. 10845308

  33. [34]

    Campbell, S.D., Sharpe, S.A.: Anchoring bias in consensus forecasts and its effect on market prices. J. Financ. Quant. Anal. 44(2), 369–390 (2009). doi:10.1017/ S0022109009090127

  34. [35]

    Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

    Chen, Q., Qin, L., Liu, J., Peng, D., Guan, J., Wang, P., Hu, M., Zhou, Y., Gao, T., Che, W.: Towards reasoning era: a survey of long chain-of-thought for reasoning large language models. arXiv preprint arXiv:2503.09567 (2025). https://arxiv.org/ abs/2503.09567

  35. [36]

    Abdel Latif, Y.: Hallucinations in large language models and their influence on legal reasoning: examining the risks of AI-generated factual inaccuracies in judicial processes. J. Comput. Intell. Mach. Reason. Decis.-Mak. 10(2), 10–20 (2025). https: //morphpublishing.com/index.php/JCIMRD/article/view/2025-02-07

  36. [37]

    In: Arai, K

    Banerjee, S., Agarwal, A., Singla, S.: LLMs will always hallucinate, and we need to live with this. In: Arai, K. (ed.) Intelligent Systems and Applications, pp. 624–648. Springer Nature Switzerland, Cham (2025). doi:10.1007/978-3-031-99965-9 39

  37. [38]

    The law and NLP: Bridg- ing disciplinary disconnects

    Dai, S.-C., Xiong, A., Ku, L.-W.: LLM-in-the-loop: leveraging large language model for thematic analysis. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 9993–10001. Associ- ation for Computational Linguistics (2023). doi:10.18653/v1/2023.findings-emnlp. 669