pith. sign in

arxiv: 2605.29430 · v1 · pith:LKCNEAE5new · submitted 2026-05-28 · 💻 cs.AI · cs.CL

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Pith reviewed 2026-06-29 07:28 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords interactive ASRsemantic error rateagentic correctionmulti-turn refinementspeech recognitionLLM-based evaluationsemantic evaluation metric
0
0 comments X

The pith

An agentic closed-loop ASR system uses multi-turn semantic correction to reduce meaning errors beyond what single-pass or token metrics achieve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current ASR systems process speech in one pass and rely on word-level error rates, yet human conversation fixes misunderstandings through back-and-forth clarification. The paper reframes ASR as an interactive multi-turn task and builds Agentic ASR, which adds semantic correction, intent routing, and reasoning-based editing after an initial recognition pass. It also defines S²ER, an LLM-driven metric that scores sentence-level meaning errors instead of token matches, plus a simulation system for testing. Across multilingual, named-entity, and code-switching data, the iterative loop lowers semantic mistakes, with the reductions appearing larger under S²ER than under WER or CER. Human alignment checks support the semantic judge and the overall framework.

Core claim

Formulating speech recognition as a multi-turn refinement task and equipping it with a closed-loop Agentic ASR framework that combines a single-pass front-end with semantic correction, intent routing, and reasoning-based editing produces consistent reductions in semantic errors; these gains are substantially larger when measured by the new LLM-based Sentence-level Semantic Error Rate than by conventional token-level metrics, and the semantic judge aligns with human judgments.

What carries the argument

Agentic ASR, a closed-loop framework that adds semantic correction, intent routing, and reasoning-based editing to a single-pass ASR front-end.

If this is right

  • Iterative interaction reduces semantic errors on multilingual, named-entity-intensive, and code-switching benchmarks.
  • Improvements appear larger under S²ER than under token-level metrics such as WER or CER.
  • Ablation studies confirm that each component of the agentic loop contributes to the observed error reduction.
  • The Interactive Simulation System enables reproducible benchmarking without repeated human annotation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same correction loop could be attached to other front-end recognizers such as handwriting or video captioning systems.
  • Tighter coupling between the semantic judge and downstream LLM agents might allow the system to request clarification only on intent-critical errors.
  • The simulation environment could be used to train lightweight correction policies that run without calling a full LLM at inference time.

Load-bearing premise

An LLM can judge whether two sentences convey the same intended meaning without introducing systematic bias or hallucination.

What would settle it

A controlled study in which human listeners rate pairs of transcripts for semantic fidelity and the rankings produced by S²ER disagree with the human rankings on a statistically significant fraction of cases.

Figures

Figures reproduced from arXiv: 2605.29430 by Kai Yu, Peng Wang, Qinyuan Chen, Wupeng Wang, Xiangang Li, Xie Chen, Xinjian Zhao, Xipeng Qiu, Yanqiao Zhu, Zhifu Gao, Zixuan Jiang.

Figure 1
Figure 1. Figure 1: Comparison between daily human communication, the traditional ASR paradigm, and the proposed Agentic ASR paradigm. In natural conversations, misunderstandings can be progressively corrected through multi-turn interactions. In contrast, conventional ASR systems operate in a one￾shot, open-loop manner, where recognition errors (e.g., confusing “Megan” with “Morgan”) cannot be effectively corrected once produ… view at source ↗
Figure 2
Figure 2. Figure 2: Agentic ASR framework. At turn t, an ASR front-end first produces a hypothesis Ht from user speech input It. An LLM module then performs semantic correction and intent routing into three intent types: confirmation, new input, and correction. For correction intents, a structured Locate–Reason– Modify pipeline identifies the editable span, infers the intended edit from instruction and history, and applies th… view at source ↗
Figure 3
Figure 3. Figure 3: Two illustrative cases comparing S 2ER with token-level metrics. In Case A, several mismatches involve only filler or discourse words, leading to high WER but preserved meaning. In Case B, a single local substitution corrupts a key entity, yielding lower WER but a semantic failure. B. S 2ER versus token-level metrics Token-level metrics such as WER and CER measure surface-form mismatch, but they do not dis… view at source ↗
Figure 4
Figure 4. Figure 4: Interactive Simulation System (ISS) for automatic multi-round [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance trends of the proposed Agentic ASR framework from [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation on different base ASR models under the same proposed Agentic ASR framework. Three representative shared benchmarks are shown: [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Mean Pearson correlation with human reference scores under different [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

Automatic speech recognition (ASR) is a core component of human--computer interaction and an increasingly important front-end for LLM-based assistants and agents. However, most current ASR systems still follow a single-pass paradigm, which is poorly aligned with human communication, where misunderstandings are resolved through iterative clarification and refinement. This mismatch makes it difficult to correct meaning-critical errors once they occur. Meanwhile, token-level metrics such as WER or CER cannot adequately reflect such a problem. To address these limitations, we formulate \emph{Interactive ASR} as a multi-turn refinement task and propose \textbf{Agentic ASR}, a closed-loop framework that combines a single-pass ASR front-end with semantic correction, intent routing, and reasoning-based editing. We further introduce the \textbf{Sentence-level Semantic Error Rate} ($S^2ER$), an LLM-based semantic evaluation metric, together with an \textbf{Interactive Simulation System} for scalable and reproducible benchmarking. Experiments on multilingual, named-entity-intensive, and code-switching benchmarks show that iterative interaction consistently reduces semantic errors, with much larger gains in $S^2ER$ than in conventional token-level metrics. Human--AI alignment and ablation studies further validate the reliability of the semantic judge and the robustness of the proposed framework. The code is available at: https://interactiveasr.github.io/ and the live demo is available at https://i-asr.sjtuxlance.com/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce an Agentic ASR framework for interactive, multi-turn speech recognition that uses LLM-based semantic correction, intent routing, and reasoning-based editing to correct meaning-critical errors. It proposes the S²ER metric, an LLM-based sentence-level semantic error rate, and an Interactive Simulation System for benchmarking. Experiments on multilingual, named-entity-intensive, and code-switching benchmarks show that iterative interaction reduces semantic errors, with larger gains in S²ER than in WER or CER. Human-AI alignment and ablation studies are provided to validate the approach, and code is made available.

Significance. If the results hold, this approach could significantly advance ASR systems towards more human-like interactive paradigms, improving semantic accuracy in downstream LLM applications. The open-sourcing of code and the live demo are positive for reproducibility and further research. The introduction of a semantic metric addresses a known limitation of token-level metrics.

major comments (2)
  1. [S²ER Metric and Human-AI Alignment Studies] S²ER definition: the metric is defined via an LLM judge for semantic fidelity, yet the correction stage also relies on LLM semantic correction, intent routing, and reasoning-based editing. The manuscript must explicitly report whether the same model family and prompt style are used for both, along with judge-correction divergence rates and the precise setup of the human-AI alignment study (including model identity). This directly affects whether the larger S²ER gains reflect genuine semantic recovery or shared inductive bias.
  2. [Experiments] Experimental results: the abstract states 'consistent reductions' and 'much larger gains in S²ER' but supplies no numerical values, error bars, confidence intervals, or statistical tests. The full paper must include these quantities (with per-benchmark breakdowns) for the central claim that iterative interaction yields substantially larger semantic improvements than token-level metrics to be verifiable.
minor comments (2)
  1. Provide the exact prompts and model versions used for the LLM judge and correction agent in an appendix to support reproducibility.
  2. Clarify how the Interactive Simulation System generates multi-turn interactions and whether it introduces any distributional shift relative to real user corrections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and commitments to revisions that strengthen the presentation without altering the core contributions.

read point-by-point responses
  1. Referee: [S²ER Metric and Human-AI Alignment Studies] S²ER definition: the metric is defined via an LLM judge for semantic fidelity, yet the correction stage also relies on LLM semantic correction, intent routing, and reasoning-based editing. The manuscript must explicitly report whether the same model family and prompt style are used for both, along with judge-correction divergence rates and the precise setup of the human-AI alignment study (including model identity). This directly affects whether the larger S²ER gains reflect genuine semantic recovery or shared inductive bias.

    Authors: We agree that explicit reporting on model usage for the S²ER judge versus the correction components is necessary for transparency. The revised manuscript will add a dedicated subsection detailing the exact model families, prompt styles, and configurations employed in each stage. We will also report judge-correction divergence rates computed on a held-out validation set and expand the description of the human-AI alignment study to include all relevant setup details and model identities. These additions will enable readers to evaluate potential inductive bias concerns directly. revision: yes

  2. Referee: [Experiments] Experimental results: the abstract states 'consistent reductions' and 'much larger gains in S²ER' but supplies no numerical values, error bars, confidence intervals, or statistical tests. The full paper must include these quantities (with per-benchmark breakdowns) for the central claim that iterative interaction yields substantially larger semantic improvements than token-level metrics to be verifiable.

    Authors: The full manuscript already presents numerical results and per-benchmark breakdowns in the experiments section. To further improve verifiability as requested, the revised version will augment the results with error bars, confidence intervals, and statistical tests (such as paired significance tests) across all benchmarks. This will directly support the claim of larger semantic gains relative to token-level metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines S²ER as an LLM-based metric and reports experimental reductions in it versus token-level metrics, with explicit mention of separate human-AI alignment and ablation studies to validate the judge. No equations, self-citations, or derivations reduce the central claims to fitted inputs, self-definitions, or author-prior ansatzes by construction. The framework's use of LLMs for both correction and evaluation is acknowledged but does not create a load-bearing circular step under the enumerated patterns, as the validation steps are presented as external checks. The derivation remains self-contained against the reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review; ledger populated from stated components. The framework introduces two new entities whose independent evidence rests on the paper's own experiments.

axioms (1)
  • domain assumption LLM can serve as a reliable proxy for human semantic judgment in error evaluation
    Invoked to define S²ER and to claim validation via human-AI alignment studies
invented entities (2)
  • Agentic ASR closed-loop framework no independent evidence
    purpose: Combine single-pass ASR with semantic correction, intent routing, and reasoning-based editing
    New proposed system architecture
  • S²ER metric no independent evidence
    purpose: LLM-based sentence-level semantic error rate replacing token metrics
    New evaluation method

pith-pipeline@v0.9.1-grok · 5818 in / 1181 out tokens · 22824 ms · 2026-06-29T07:28:03.827937+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 20 canonical work pages · 5 internal anchors

  1. [1]

    Automatic recognition of spoken digits,

    K. H. Davis, R. Biddulph, and S. Balashek, “Automatic recognition of spoken digits,”The Journal of the Acoustical Society of America, vol. 24, no. 6, pp. 637–642, 11 1952

  2. [2]

    Slm: Bridge the thin gap between speech and text foundation models,

    M. Wang, W. Han, I. Shafran, Z. Wu, C.-C. Chiu, Y . Cao, N. Chen, Y . Zhang, H. Soltau, P. K. Rubensteinet al., “Slm: Bridge the thin gap between speech and text foundation models,” in2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023, pp. 1–8

  3. [3]

    Qwen3-ASR Technical Report

    X. Shi, X. Wang, Z. Guo, Y . Wang, P. Zhang, X. Zhang, Z. Guo, H. Hao, Y . Xi, B. Yanget al., “Qwen3-asr technical report,”arXiv preprint arXiv:2601.21337, 2026

  4. [4]

    Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,

    W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016, pp. 4960–4964

  5. [5]

    Robust speech recognition via large-scale weak supervi- sion,

    A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervi- sion,” inInternational conference on machine learning. PMLR, 2023, pp. 28 492–28 518

  6. [6]

    Grounding in communication,

    H. H. Clark and S. E. Brennan, “Grounding in communication,” in Perspectives on Socially Shared Cognition, L. B. Resnick, J. M. Levine, and S. D. Teasley, Eds. Washington, DC: American Psychological Association, 1991, pp. 127–149

  7. [7]

    The preference for self- correction in the organization of repair in conversation,

    E. A. Schegloff, G. Jefferson, and H. Sacks, “The preference for self- correction in the organization of repair in conversation,”Language, vol. 53, no. 2, pp. 361–382, 1977

  8. [8]

    How to evaluate asr output for named entity recognition?

    M. Jannet, O. Galibert, M. Adda-Decker, and S. Rosset, “How to evaluate asr output for named entity recognition?” inProc. Interspeech 2015, 09 2015, pp. 1289–1293

  9. [9]

    Is word error rate a good indicator for spoken language understanding accuracy,

    Y .-y. Wang, “Is word error rate a good indicator for spoken language understanding accuracy,” in2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721), 2003

  10. [10]

    Jelinek,Statistical methods for speech recognition

    F. Jelinek,Statistical methods for speech recognition. MIT Press, 1997

  11. [11]

    Semantic-wer: A unified metric for the evaluation of asr transcript for end usability,

    S. Roy, “Semantic-wer: A unified metric for the evaluation of asr transcript for end usability,”arXiv preprint arXiv:2106.02016, 2021

  12. [12]

    Semantic distance: A new metric for asr perfor- mance analysis towards spoken language understanding,

    S. Kim, A. Arora, D. Le, C.-F. Yeh, C. Fuegen, O. Kalinli, and M. L. Seltzer, “Semantic distance: A new metric for asr perfor- mance analysis towards spoken language understanding,”arXiv preprint arXiv:2104.02138, 2021

  13. [13]

    Automatic estimation of word significance oriented for speech-based information retrieval,

    T. Shichiri, H. Nanjo, and T. Yoshimi, “Automatic estimation of word significance oriented for speech-based information retrieval,” inProceed- ings of the Third International Joint Conference on Natural Language Processing: Volume-I, 2008

  14. [14]

    Heval: A new hybrid evaluation metric for automatic speech recognition tasks,

    Z. Sasindran, H. Yelchuri, T. V . Prabhakar, and S. Rao, “Heval: A new hybrid evaluation metric for automatic speech recognition tasks,” in 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, pp. 1–7

  15. [15]

    BERTScore: Evaluating Text Generation with BERT

    T. Zhang, V . Kishore, F. Wu, K. Q. Weinberger, and Y . Artzi, “Bertscore: Evaluating text generation with bert,”arXiv preprint arXiv:1904.09675, 2020

  16. [16]

    Laser: An llm-based asr scoring and evaluation rubric,

    A. Parulekar and P. Jyothi, “Laser: An llm-based asr scoring and evaluation rubric,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 24 773–24 782

  17. [17]

    An approach to measuring the performance of automatic speech recognition (asr) models in the context of large language model (llm) powered applications,

    S. Pulikodan, S. K, P. K. Ghosh, V . Sanka, and N. Desai, “An approach to measuring the performance of automatic speech recognition (asr) models in the context of large language model (llm) powered applications,” arXiv preprint arXiv:2507.16456, 2025

  18. [18]

    Multimodal error correction for speech user interfaces,

    B. Suhm, B. Myers, and A. Waibel, “Multimodal error correction for speech user interfaces,”ACM transactions on computer-human interaction (TOCHI), vol. 8, no. 1, pp. 60–98, 2001

  19. [19]

    V oice typing: a new speech interaction model for dictation on touchscreen devices,

    A. Kumar, T. Paek, and B. Lee, “V oice typing: a new speech interaction model for dictation on touchscreen devices,” inProceedings of the 30th ACM SIGCHI Conference on Human Factors in Computing Systems. ACM, 2012, pp. 2277–2286

  20. [20]

    Ef- ficient speech transcription through respeaking

    M. Sperber, G. Neubig, C. F ¨ugen, S. Nakamura, and A. Waibel, “Ef- ficient speech transcription through respeaking.” inInterspeech, 2013, pp. 1087–1091

  21. [21]

    The gift of feedback: Improving asr model quality by learning from user corrections through federated learning,

    L. Zhou, Y . Ding, M. Chen, H. Zhang, R. Prabhavalkar, D. Guliani, G. Motta, and R. Mathews, “The gift of feedback: Improving asr model quality by learning from user corrections through federated learning,” arXiv preprint arXiv:2310.00141, 2023

  22. [22]

    React: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” inThe eleventh international conference on learning representations, 2022

  23. [23]

    Large language models are state-of-the- art evaluators of translation quality,

    T. Kocmi and C. Federmann, “Large language models are state-of-the- art evaluators of translation quality,”arXiv preprint arXiv:2302.14520, 2023

  24. [24]

    Judging llm-as-a-judge with mt-bench and chatbot arena,

    L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xinget al., “Judging llm-as-a-judge with mt-bench and chatbot arena,”Advances in neural information processing systems, vol. 36, pp. 46 595–46 623, 2023

  25. [25]

    G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

    Y . Liu, D. Iter, Y . Xu, S. Wang, R. Xu, and C. Zhu, “G-eval: Nlg evaluation using gpt-4 with better human alignment,”arXiv preprint arXiv:2303.16634, 2023

  26. [26]

    Evaluating speech recognition perfor- mance towards large language model based voice assistants,

    Z. Liu, S. Kim, and O. Kalinli, “Evaluating speech recognition perfor- mance towards large language model based voice assistants,” inProc. Interspeech 2024, 2024

  27. [27]

    Large language models as a proxy for human evaluation in assessing the comprehensibility of disordered speech transcription,

    K. Tomanek, J. Tobin, S. Venugopalan, R. Cave, K. Seaver, J. R. Green, and R. Heywood, “Large language models as a proxy for human evaluation in assessing the comprehensibility of disordered speech transcription,” inICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 10 846– 10 850

  28. [28]

    Evaluat- ing large language models at evaluating instruction following,

    Z. Zeng, J. Yu, T. Gao, Y . Meng, T. Goyal, and D. Chen, “Evaluat- ing large language models at evaluating instruction following,”arXiv preprint arXiv:2310.07641, 2024

  29. [29]

    Judging the judges: A systematic investigation of position bias in pairwise comparative assessments by LLMs

    L. Shi, C. Ma, W. Liang, X. Diao, W. Ma, and S. V osoughi, “Judging the judges: A systematic study of position bias in llm-as-a-judge,”arXiv preprint arXiv:2406.07791, 2025

  30. [30]

    Aishell- ner: Named entity recognition from chinese speech,

    B. Chen, G. Xu, X. Wang, P. Xie, M. Zhang, and F. Huang, “Aishell- ner: Named entity recognition from chinese speech,”arXiv preprint arXiv:2202.08533, 2022

  31. [31]

    Code-switching in end-to-end automatic speech recognition: A system- atic literature review,

    M. T. Agro, A. Kulkarni, K. Kadaoui, Z. Talat, and H. Aldarmaki, “Code-switching in end-to-end automatic speech recognition: A system- atic literature review,”arXiv preprint arXiv:2507.07741, 2025

  32. [32]

    Qwen3 Technical Report

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

  33. [33]

    Indextts: An industrial-level controllable and efficient zero-shot text-to-speech system.arXiv preprint arXiv:2502.05512, 2025

    W. Deng, S. Zhou, J. Shu, J. Wang, and L. Wang, “Indextts: An industrial-level controllable and efficient zero-shot text-to-speech sys- tem,”arXiv preprint arXiv:2502.05512, 2025

  34. [34]

    Gigaspeech: An evolving, multi- domain asr corpus with 10,000 hours of transcribed audio,

    G. Chen, S. Chai, G. Wang, J. Du, W.-Q. Zhang, C. Weng, D. Su, D. Povey, J. Trmal, J. Zhanget al., “Gigaspeech: An evolving, multi- domain asr corpus with 10,000 hours of transcribed audio,”arXiv preprint arXiv:2106.06909, 2021

  35. [35]

    Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition,

    B. Zhang, H. Lv, P. Guo, Q. Shao, C. Yang, L. Xie, X. Xu, H. Bu, X. Chen, C. Zenget al., “Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition,” inICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6182–6186

  36. [36]

    AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

    H. Bu, J. Du, X. Na, B. Wu, and H. Zheng, “Aishell-1: An open- source mandarin speech corpus and a speech recognition baseline,”arXiv preprint arXiv:1709.05522, 2017

  37. [37]

    The asru 2019 mandarin-english code- switching speech recognition challenge: Open datasets, tracks, methods and results,

    X. Shi, Q. Feng, and L. Xie, “The asru 2019 mandarin-english code- switching speech recognition challenge: Open datasets, tracks, methods and results,”arXiv preprint arXiv:2007.05916, 2020. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

  38. [38]

    Cs-dialogue: A 104-hour dataset of spontaneous mandarin-english code-switching dialogues for speech recognition,

    J. Zhou, Y . Guo, S. Zhao, H. Sun, H. Wang, J. He, A. Kong, S. Wang, X. Yang, Y . Wang, Y . Lin, and Y . Qin, “Cs-dialogue: A 104-hour dataset of spontaneous mandarin-english code-switching dialogues for speech recognition,”arXiv preprint arXiv:2502.18913, 2025

  39. [39]

    Zechner and K

    K. Zechner and K. Evanini, Eds.,Automated Speaking Assessment: Using Language Technologies to Score Spontaneous Speech, 1st ed. Routledge, 2019

  40. [40]

    Automated speech scoring system under the lens: Evaluating and interpreting models,

    A. Biswaset al., “Automated speech scoring system under the lens: Evaluating and interpreting models,”arXiv preprint arXiv:2111.15156, 2021

  41. [41]

    Vii. note on regression and inheritance in the case of two parents,

    K. Pearson, “Vii. note on regression and inheritance in the case of two parents,”Proceedings of the Royal Society of London, vol. 58, no. 347- 352, pp. 240–242, 12 1895

  42. [42]

    Fireredasr: Open-source industrial-grade mandarin speech recognition models from encoder- decoder to llm integration,

    K.-T. Xu, F.-L. Xie, X. Tang, and Y . Hu, “Fireredasr: Open-source industrial-grade mandarin speech recognition models from encoder- decoder to llm integration,”arXiv preprint arXiv:2501.14350, 2025