ASTRA: A Scalable Next-Generation ATCO Training Simulator with Autonomous Simpilots
Pith reviewed 2026-06-27 01:02 UTC · model grok-4.3
The pith
ASTRA adapts speech recognition to cut word error rates on Singaporean aviation speech to 23.45 percent while adding AI scoring of trainee communications.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ASTRA is an end-to-end training simulator that automates simpilot roles through a pipeline that transcribes ATCO speech, interprets instructions, and generates appropriate pilot and ATCO responses using locally adapted voice models. The fine-tuned ASR pipeline reduces WER to 23.45 percent. Beyond traffic simulation, ASTRA incorporates an AI-assisted performance evaluation framework that assesses trainee radiotelephony communications across accuracy, brevity, and completeness, achieving post-optimization scores of 91.7 percent, 88.2 percent, and 86.9 percent.
What carries the argument
The fine-tuned Automatic Speech Recognition pipeline combined with an AI-assisted performance evaluation framework that scores radiotelephony communications on accuracy, brevity, and completeness.
Load-bearing premise
The locally adapted ASR and voice models plus the post-optimization evaluation framework will maintain reported performance when deployed in actual Singaporean operational contexts rather than the development data.
What would settle it
Deploy the full ASTRA system on new live recordings from Singapore ATCO training sessions and measure whether WER remains near 23.45 percent and evaluation scores stay above 86 percent across the three metrics.
read the original abstract
Air Traffic Control Operators (ATCOs) are vital in ensuring the safe, orderly, and efficient flow of air traffic, yet training capacity is constrained by reliance on specialized human trainers known as simpilots, who must role-play both pilots and ATCOs in a simulated airspace. Existing automated solutions rely on Western-centric speech models that perform poorly in Singaporean operational contexts, with off-the-shelf systems exhibiting Word Error Rates (WER) of up to 107.80% on Singaporean-accented aviation speech. We introduce ASTRA, an end-to-end training simulator that automates these simpilot roles through a pipeline that transcribes ATCO speech, interprets instructions, and generates appropriate pilot and ATCO responses using locally adapted voice models. Our fine-tuned Automatic Speech Recognition (ASR) pipeline reduces WER to 23.45%, substantially outperforming existing approaches in this domain. Beyond traffic simulation, ASTRA incorporates an AI-assisted performance evaluation framework that assesses trainee radiotelephony communications across accuracy, brevity, and completeness, achieving post-optimization scores of 91.7%, 88.2%, and 86.9%, respectively. Built on open-source foundations such as DSPy and Unsloth, this approach enables scalable, standardized ATCO assessment while reducing instructor workload.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ASTRA, an end-to-end ATCO training simulator that automates simpilot roles via an ASR pipeline for transcribing Singaporean-accented radiotelephony, instruction interpretation, and response generation with locally adapted voice models. It claims the fine-tuned ASR achieves 23.45% WER (vs. 107.80% for off-the-shelf systems) and that an AI-assisted evaluation framework scores trainee communications at 91.7% accuracy, 88.2% brevity, and 86.9% completeness after post-optimization, all built on open-source tools like DSPy and Unsloth.
Significance. If the performance claims are substantiated with proper held-out validation, the work would offer a practical advance in scalable, standardized ATCO training that addresses accent-specific challenges in aviation speech and reduces instructor workload.
major comments (2)
- [Abstract] Abstract: the central performance claims (WER reduced to 23.45%, evaluation scores of 91.7/88.2/86.9) are presented without any information on dataset size, train/test splits, whether the test set is held-out from the adaptation data, baseline system details beyond the single off-the-shelf WER figure, or statistical significance; these omissions make the outperformance claim impossible to assess and are load-bearing for the paper's main contribution.
- [Abstract] Abstract: the phrase 'post-optimization' for the evaluation framework is undefined; no description is given of the optimization procedure, the data on which scores were computed, or any external validation set from live Singaporean operations, leaving open the possibility that reported numbers reflect in-sample performance rather than generalization.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract. We agree that the performance claims require additional context to be properly assessed and that the term 'post-optimization' needs definition. We will revise the abstract and supporting sections to address these points directly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claims (WER reduced to 23.45%, evaluation scores of 91.7/88.2/86.9) are presented without any information on dataset size, train/test splits, whether the test set is held-out from the adaptation data, baseline system details beyond the single off-the-shelf WER figure, or statistical significance; these omissions make the outperformance claim impossible to assess and are load-bearing for the paper's main contribution.
Authors: We agree that the abstract as currently written does not provide enough information for readers to evaluate the claims. The full manuscript reports dataset details, train/test splits, held-out status, additional baselines, and significance testing in the Experiments section. In the revised version we will add concise statements to the abstract covering dataset size, confirmation that the test set is held-out, reference to multiple baselines, and mention of statistical significance so that the outperformance claim can be assessed from the abstract alone. revision: yes
-
Referee: [Abstract] Abstract: the phrase 'post-optimization' for the evaluation framework is undefined; no description is given of the optimization procedure, the data on which scores were computed, or any external validation set from live Singaporean operations, leaving open the possibility that reported numbers reflect in-sample performance rather than generalization.
Authors: We will define 'post-optimization' explicitly in the revised abstract as the result of DSPy prompt optimization applied to the evaluator. We will state that the reported scores were obtained on the held-out test portion of our Singaporean aviation speech corpus and briefly describe the optimization procedure. We note that live operational recordings from Singapore ATC are not available to us for external validation due to access and privacy constraints; the held-out test set therefore serves as the primary evidence of generalization within the collected domain. revision: yes
Circularity Check
No circularity; empirical results reported directly from fine-tuning
full rationale
The manuscript contains no equations, derivations, or load-bearing self-citations. Performance numbers (WER 23.45%, evaluation scores 91.7/88.2/86.9) are presented as measured outcomes of the described fine-tuning and post-optimization steps on the authors' data. No step reduces a claimed prediction or uniqueness result to a fitted parameter or prior self-citation by construction. The paper is self-contained against external benchmarks in the sense that its claims rest on reported empirical measurements rather than internal redefinitions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2025 , file =
John Kelly , date =. 2025 , file =
2025
-
[2]
Chris Isidore , date=
-
[3]
Justin Ong Guang-Xi , date =
-
[4]
FAA air traffic overtime costs soar as hiring lags, report says , author=
-
[5]
Hofbauer, Konrad and Petrik, Stefan and Hering, Horst , booktitle=
-
[6]
arXiv preprint arXiv:2006.10304 , year=
Automatic speech recognition benchmark for air-traffic communications , author=. arXiv preprint arXiv:2006.10304 , year=
-
[7]
2022 IEEE Spoken Language Technology Workshop (SLT) , pages=
How does pre-trained wav2vec 2.0 perform on domain-shifted asr? an extensive benchmark on air traffic control communications , author=. 2022 IEEE Spoken Language Technology Workshop (SLT) , pages=. 2023 , organization=
2022
-
[8]
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=
CB-Whisper: Contextual biasing Whisper using open-vocabulary keyword-spotting , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=
2024
-
[9]
Interspeech , volume=
Adding user feedback to enhance CB-Whisper , author=. Interspeech , volume=
-
[10]
Brudnicki, Dan and Ethier, Bob and Chastain, Kerri , year=
-
[11]
2021 , volume=
Lin, Yi and Wu, YuanKai and Guo, Dongyue and Zhang, Pan and Yin, Changyu and Yang, Bo and Zhang, Jianwei , journal=. 2021 , volume=
2021
-
[12]
Prasad, Amrutha and Zuluaga-Gomez, Juan and Motlicek, Petr and Sarfjoo, Saeed and Nigmatulina, Iuliia and Vesely, Karel , journal=
-
[13]
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle=
-
[14]
2023 , publisher=
Zuluaga-Gomez, Juan and Prasad, Amrutha and Nigmatulina, Iuliia and Motlicek, Petr and Kleinert, Matthias , journal=. 2023 , publisher=
2023
-
[15]
AIAA AVIATION 2023 Forum , pages=
SafeAeroBERT: Towards a safety-informed aerospace-specific language model , author=. AIAA AVIATION 2023 Forum , pages=
2023
-
[16]
2024 , publisher=
Jiang, Peiyuan and Zeng, Chen and Pan, Weijun and Han, Boyuan and Zhang, Jian , journal=. 2024 , publisher=
2024
-
[17]
arXiv preprint arXiv:2409.09717 , year=
Andriu. arXiv preprint arXiv:2409.09717 , year=
-
[18]
Air Traffic Control – Chapter 5, Section 4: Transfer of Radar Identification , author=. n.d. , url=
-
[19]
2014 , url=
Target Generation Facility (TGF) Simulation Pilot Operations Guide Fifteenth Edition , author=. 2014 , url=
2014
-
[20]
Gheorghe Comanici and et al. , year=. 2507.06261 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
and Moazam, Hanna and Miller, Heather and Zaharia, Matei and Potts, Christopher , journal=
Khattab, Omar and Singhvi, Arnav and Maheshwari, Paridhi and Zhang, Zhiyuan and Santhanam, Keshav and Vardhamanan, Sri and Haq, Saiful and Sharma, Ashutosh and Joshi, Thomas T. and Moazam, Hanna and Miller, Heather and Zaharia, Matei and Potts, Christopher , journal=
-
[22]
Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
-
[23]
2023 , organization=
Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya , booktitle=. 2023 , organization=
2023
-
[24]
Sekoyan, Monica and Koluguri, Nithin Rao and Tadevosyan, Nune and Zelasko, Piotr and Bartley, Travis and Karpov, Nikolay and Balam, Jagadeesh and Ginsburg, Boris , journal=
-
[25]
He, Yingxu and Liu, Zhuohan and Sun, Shuo and Wang, Bin and Zhang, Wenyu and Zou, Xunlong and Chen, Nancy F and Aw, Ai Ti , journal=
-
[26]
Wang, Bin and Zou, Xunlong and Sun, Shuo and Zhang, Wenyu and He, Yingxu and Liu, Zhuohan and Wei, Chengwei and Chen, Nancy F and Aw, AiTi , journal=
-
[27]
2025 , url=
Anonymous , booktitle=. 2025 , url=
2025
-
[28]
Hoekstra and Patrick Jonk and de Vries , Vincent
van Doorn , Jan and Junzi Sun and J.M. Hoekstra and Patrick Jonk and de Vries , Vincent. 2024 International Conference on Research in Air Transportation (ICRAT) , year=
2024
-
[29]
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pages=
Tree-constrained pointer generator for end-to-end contextual speech recognition , author=. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pages=. 2021 , organization=
2021
-
[30]
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition , author=. Proc. Interspeech 2022 , pages=
2022
-
[31]
IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=
Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator , author=. IEEE/ACM Transactions on Audio, Speech, and Language Processing , volume=. 2022 , publisher=
2022
-
[32]
Proceedings of the 20th International Workshop on Multimedia Signal Processing (MMSP) , year=
A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement , author=. Proceedings of the 20th International Workshop on Multimedia Signal Processing (MMSP) , year=
-
[33]
Can Contextual Biasing Remain Effective with Whisper and GPT-2? , author=. Proc. Interspeech 2023 , pages=
2023
-
[34]
Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function , author=. Proc. Interspeech 2025 , pages=
2025
-
[35]
2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP) , pages=
Contextual biasing to improve domain-specific custom vocabulary audio transcription without explicit fine-tuning of whisper model , author=. 2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP) , pages=. 2024 , organization=
2024
-
[36]
arXiv preprint arXiv:2211.04054 , year=
Zuluaga-Gomez, Juan and Vesel. arXiv preprint arXiv:2211.04054 , year=
-
[37]
2025 , organization=
Wee, Marcus Yu Zhe and Wong, Justin Juin Hng and Lim, Lynus and Tan, Joe Yu Wei and Gupta, Prannaya and Lim, Dillion and Tew, En Hao and Han, Aloysius Keng Siew and Lim, Yong Zhi , booktitle=. 2025 , organization=
2025
-
[38]
2021 , volume=
Kleinert, Matthias and Helmke, Hartmut and Shetty, Shruthi and Ohneiser, Oliver and Ehr, Heiko and Prasad, Amrutha and Motlicek, Petr and Harfmann, Julia , booktitle=. 2021 , volume=
2021
-
[39]
Hartmut Helmke, Matthias Kleinert, Oliver Ohneiser , title =
-
[40]
XTTS: a massively multilingual zero-shot text-to-speech model,
Casanova, Edresson and Davis, Kelly and G. arXiv preprint arXiv:2406.04904 , year=
-
[41]
arXiv preprint arXiv:2509.15969 , year=
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency , author=. arXiv preprint arXiv:2509.15969 , year=
-
[42]
2021 , organization=
Kim, Jaehyeon and Kong, Jungil and Son, Juhee , booktitle=. 2021 , organization=
2021
-
[43]
Advances in neural information processing systems , volume=
Fastspeech: Fast, robust and controllable text to speech , author=. Advances in neural information processing systems , volume=
-
[44]
2025 , volume=
Ohneiser, Oliver and Ahmed, Umair , journal=. 2025 , volume=
2025
-
[45]
Journal of Machine Learning Research , volume=
Scaling speech technology to 1,000+ languages , author=. Journal of Machine Learning Research , volume=
-
[46]
Zhou, Xuehao and Zhang, Mingyang and Zhou, Yi and Wu, Zhizheng and Li, Haizhou , title =. 2024 , issue_date =. doi:10.1109/TASLP.2024.3363414 , journal =
-
[47]
Chu, Min and Li, Chun and Peng, Hu and Chang, Eric , year =
-
[48]
Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu and others , journal=
-
[49]
Hu, Junjie and Xia, Mengzhou and Neubig, Graham and Carbonell, Jaime G , booktitle=
-
[50]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
Mitigating hallucinations in lm-based tts models via distribution alignment using gflownets , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
2025
-
[51]
Kirkland, Ambika and Mehta, Shivam and Lameris, Harm and Henter, Gustav and Szekely, Eva and Gustafson, Joakim , year =
-
[52]
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pages=
Salesky, Elizabeth and M. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pages=. 2021 , organization=
2021
-
[53]
2025 , organization=
Wang, Siyin and Yu, Wenyi and Yang, Yudong and Tang, Changli and Li, Yixuan and Zhuang, Jimin and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others , booktitle=. 2025 , organization=
2025
-
[54]
2024 , month=
Nvidia Developer , author=. 2024 , month=
2024
-
[55]
Bohan Li and Zhihan Li and Haoran Wang and Hanglei Zhang and Yiwei Guo and Hankun Wang and Xie Chen and Kai Yu , year=. 2506.22023 , archivePrefix=
-
[56]
Application of rule-based expert system in ATC simulator evaluation system , year=
Wu, Xisheng , booktitle=. Application of rule-based expert system in ATC simulator evaluation system , year=. doi:10.1109/ICVRIS51417.2020.00059 , organization=
-
[57]
and Artzi, Yoav , booktitle=
Zhang, Tianyi and Kishore, Varsha and Wu, Felix and Weinberger, Kilian Q. and Artzi, Yoav , booktitle=. 2020 , url=
2020
-
[58]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , doi =
Chiang, Wei-Lin and Gonzalez, Joseph and Li, Dacheng and Zhuohan, Li and Lin, Zi and Sheng, Ying and Stoica, Ion and Wu, Zhanghao and Xing, Eric and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao , year =. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , doi =
-
[59]
Ergonomics and Human Factors 2024 , year=
Identifying Human Performance Metrics in Air Traffic Control , author=. Ergonomics and Human Factors 2024 , year=
2024
-
[60]
Recommendations for next generation air traffic control training , year=
Updegrove, Jessica and Jafer, Shafagh , booktitle=. Recommendations for next generation air traffic control training , year=. doi:10.1109/DASC.2017.8102129 , organization=
-
[61]
2021 , howpublished =
Silero Team , title =. 2021 , howpublished =
2021
-
[62]
2023 , howpublished =
SYSTRAN , title =. 2023 , howpublished =
2023
-
[63]
2020 , howpublished =
Klein, Guillaume , title =. 2020 , howpublished =
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.