WhisperRT -- Turning Whisper into a Causal Streaming Model
Pith reviewed 2026-05-18 22:13 UTC · model grok-4.3
The pith
Whisper can be turned into a causal streaming ASR model by making its encoder process audio chunks incrementally and fine-tuning decoder alignment for token timing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a transformer encoder-decoder like Whisper can be converted to a low-latency streaming model: the encoder is made causal to process audio incrementally, the decoder conditions on partial encoder states to generate tokens aligned with available context, explicit synchronization between encoded frames and token emissions is enforced, and fine-tuning of the alignment mechanism is performed to offset inherent latency. An updated inference procedure then supports greedy and beam-search decoding shown to be locally optimal. Experiments on chunk sizes under 300 milliseconds indicate the fine-tuned version outperforms existing non-fine-tuned streaming methods in most cases.
What carries the argument
Causal encoder combined with decoder conditioning on partial states and explicit frame-token synchronization, refined by alignment fine-tuning.
If this is right
- The fine-tuned model outperforms non-fine-tuned streaming approaches on most low-latency chunks under 300 milliseconds.
- The method operates at lower complexity than the compared streaming baselines.
- Greedy and beam-search decoding become available and locally optimal under the updated inference.
- Released training code, inference code, and fine-tuned models allow direct reuse and extension.
Where Pith is reading between the lines
- The same causal-encoder and synchronization steps could be applied to other large offline encoder-decoder ASR models beyond Whisper.
- Live applications such as real-time captioning or voice interfaces could adopt the approach to reduce end-to-end delay.
- Further tests on multilingual or noisy data would clarify whether the reported gains require additional per-domain fine-tuning.
Load-bearing premise
Fine-tuning the encoder-decoder alignment will create a stable low-latency system whose gains persist across different acoustic conditions and languages without new errors that cancel the benefits.
What would settle it
A direct comparison in which the fine-tuned model shows higher word error rates or greater instability than non-fine-tuned baselines when evaluated on acoustic conditions or languages outside the fine-tuning data.
Figures
read the original abstract
Automatic Speech Recognition (ASR) has seen remarkable progress, with models like OpenAI Whisper and NVIDIA Canary achieving state-of-the-art (SOTA) performance in offline transcription. However, these models are not designed for streaming (online or real-time) transcription, due to limitations in their architecture and training methodology. We propose a method to turn the transformer encoder-decoder model into a low-latency streaming model. The encoder is made causal to process audio incrementally, while the decoder conditions on partial encoder states to generate tokens aligned with the available temporal context. This requires explicit synchronization between encoded input frames and token emissions. Since tokens are produced only after sufficient acoustic evidence is observed, an inherent latency arises, necessitating fine-tuning of the encoder-decoder alignment mechanism. We propose an updated inference mechanism that utilizes the fine-tuned causal encoder and decoder to yield greedy and beam-search decoding, and is shown to be locally optimal. Experiments on low-latency chunk sizes (less than 300 msec) show that our fine-tuned model outperforms existing non-fine-tuned streaming approaches in most cases, while using a lower complexity. We release our training and inference code, along with the fine-tuned models, to support further research and development in streaming ASR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a method to convert the non-causal Whisper encoder-decoder ASR model into a low-latency causal streaming system. The encoder is made causal for incremental chunk processing, explicit frame-token synchronization is imposed so the decoder conditions only on available partial encoder states, and the alignment is fine-tuned to mitigate inherent latency. An updated inference procedure supporting greedy and beam-search decoding is presented and claimed to be locally optimal. The central result is that the resulting fine-tuned model outperforms existing non-fine-tuned streaming baselines on chunk sizes below 300 ms while using lower complexity; code and models are released.
Significance. If the reported gains can be attributed to the causal adaptation and synchronization mechanism rather than fine-tuning alone, the approach would provide a practical route for adapting strong offline models such as Whisper to real-time ASR with modest added latency and complexity. The public release of training/inference code and fine-tuned checkpoints is a clear strength that supports reproducibility and follow-on work in streaming ASR.
major comments (2)
- [Experiments] Experiments section: The headline claim that the fine-tuned causal model outperforms existing non-fine-tuned streaming approaches at <300 ms chunks is load-bearing, yet the comparison does not report whether the baselines received equivalent fine-tuning on the same alignment data or training distribution. Without such controls or an ablation isolating the synchronization mechanism from the fine-tuning step, it remains unclear whether performance deltas arise from the proposed causal construction or simply from fine-tuning itself.
- [Method / Experiments] The abstract and method description state that fine-tuning is required to handle the latency induced by synchronization, but no quantitative analysis (e.g., latency-accuracy trade-off curves or alignment error metrics before/after fine-tuning) is referenced to show that the fine-tuned alignment remains stable across acoustic conditions or languages.
minor comments (2)
- [Experiments] The claim of 'lower complexity' is stated without accompanying FLOPs, parameter counts, or runtime tables comparing the proposed model to the baselines.
- [Experiments] Dataset splits, exact chunk sizes tested, number of runs, and error bars or statistical tests are not mentioned in the abstract and should be added to the experimental section for verifiability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work converting Whisper to a causal streaming model. We address the major comments below and will incorporate revisions to strengthen the experimental controls and analysis as outlined.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The headline claim that the fine-tuned causal model outperforms existing non-fine-tuned streaming approaches at <300 ms chunks is load-bearing, yet the comparison does not report whether the baselines received equivalent fine-tuning on the same alignment data or training distribution. Without such controls or an ablation isolating the synchronization mechanism from the fine-tuning step, it remains unclear whether performance deltas arise from the proposed causal construction or simply from fine-tuning itself.
Authors: We agree that the current comparison is between our fine-tuned causal model and published non-fine-tuned streaming baselines, which may not isolate the contributions fully. The baselines are existing methods without our encoder causality and decoder synchronization mechanism, and our headline result is that the full proposed pipeline (causality + synchronization + fine-tuning) outperforms them at low latency. To address the concern directly, we will add an ablation in the revised manuscript applying equivalent fine-tuning to the baseline models on the same alignment data and training distribution where feasible, allowing clearer isolation of the synchronization mechanism's effect. revision: yes
-
Referee: [Method / Experiments] The abstract and method description state that fine-tuning is required to handle the latency induced by synchronization, but no quantitative analysis (e.g., latency-accuracy trade-off curves or alignment error metrics before/after fine-tuning) is referenced to show that the fine-tuned alignment remains stable across acoustic conditions or languages.
Authors: We acknowledge the absence of explicit quantitative analysis on the fine-tuning step in the current manuscript. In the revision, we will add latency-accuracy trade-off curves for the model before and after fine-tuning across chunk sizes. We will also include alignment error metrics (e.g., average token emission latency and WER deltas) evaluated before/after fine-tuning on multiple languages and acoustic conditions from our test sets to demonstrate stability. revision: yes
Circularity Check
No circularity; empirical adaptation and comparison are self-contained
full rationale
The paper describes a practical engineering adaptation: rendering the Whisper encoder causal, enforcing explicit frame-token synchronization, fine-tuning the resulting alignment for latency, and updating inference for greedy/beam search. The headline results consist of direct experimental comparisons on low-latency chunks against existing non-fine-tuned streaming baselines. No derivation chain, equation, or first-principles claim reduces to its own inputs by construction; there are no fitted parameters renamed as predictions, no self-citation load-bearing uniqueness theorems, and no ansatz smuggled through prior work. The method is validated against external benchmarks rather than tautologically defined by its own outputs, so the reported performance deltas stand as independent empirical evidence.
Axiom & Free-Parameter Ledger
free parameters (1)
- chunk size
axioms (2)
- domain assumption A transformer encoder can be made strictly causal by appropriate attention masking while retaining useful representations.
- domain assumption Fine-tuning on aligned partial encoder states will reduce the inherent token-emission latency without catastrophic accuracy loss.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We modify the original non-causal encoder to operate causally and fine-tune both the encoder and decoder using Low-Rank Adaptation (LoRA) on a weakly aligned dataset.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2. ... [˜ZT ]t = [˜Zkτ ]t
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
wav2vec 2.0: A framework for self-supervised learning of speech representations, 2020
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations, 2020
work page 2020
-
[2]
Streaming decoder-only automatic speech recognition with discrete speech units: A pilot study, 2024
Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, and Lei Xie. Streaming decoder-only automatic speech recognition with discrete speech units: A pilot study, 2024
work page 2024
-
[3]
Developing real-time streaming transformer transducer for speech recognition on large-scale dataset
Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, and Jinyu Li. Developing real-time streaming transformer transducer for speech recognition on large-scale dataset. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 5904–5908. IEEE, 2021. 13
work page 2021
-
[4]
On-device streaming discrete speech units
Kwanghee Choi, Masao Someki, Emma Strubell, and Shinji Watan- abe. On-device streaming discrete speech units. arXiv preprint arXiv:2506.01845, 2025
-
[5]
Fu, Stefano Ermon, Atri Rudra, and Christopher R´e
Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher R´e. Flashattention: Fast and memory-efficient exact attention with io- awareness, 2022
work page 2022
-
[6]
John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, and Nancy L. Dahlgren. TIMIT acoustic-phonetic continuous speech corpus, 1993. LDC93S1
work page 1993
-
[7]
Sequence transduction with recurrent neural networks, 2012
Alex Graves. Sequence transduction with recurrent neural networks, 2012
work page 2012
-
[8]
Alex Graves, Santiago Fern ´andez, Faustino Gomez, and J ¨urgen Schmid- huber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning , pages 369–376, 2006
work page 2006
-
[9]
Streaming end-to-end speech recognition for mobile devices
Yanzhang He, Tara N Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, et al. Streaming end-to-end speech recognition for mobile devices. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 6381–6385. IEEE, 2019
work page 2019
-
[10]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[11]
Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997
work page 1997
-
[12]
Hubert: Self- supervised speech representation learning by masked prediction of hidden units, 2021
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: Self- supervised speech representation learning by masked prediction of hidden units, 2021
work page 2021
-
[13]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021
work page 2021
-
[14]
Word level timestamp generation for automatic speech recognition and translation, 2025
Ke Hu, Krishna Puvvada, Elena Rastorgueva, Zhehuai Chen, He Huang, Shuoyang Ding, Kunal Dhawan, Hainan Xu, Jagadeesh Balam, and Boris Ginsburg. Word level timestamp generation for automatic speech recognition and translation, 2025
work page 2025
-
[15]
Efficient streaming llm for speech recognition
Junteng Jia, Gil Keren, Wei Zhou, Egor Lakomkin, Xiaohui Zhang, Chunyang Wu, Frank Seide, Jay Mahadeokar, and Ozlem Kalinli. Efficient streaming llm for speech recognition. arXiv preprint arXiv:2410.03752 , 2024
-
[16]
Xlsr-transducer: Streaming asr for self-supervised pretrained models, 2024
Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Esa ´u Villatoro- Tello, Iuliia Thorbecke, Petr Motlicek, Manjunath K E, and Aravind Ganapathiraju. Xlsr-transducer: Streaming asr for self-supervised pretrained models, 2024
work page 2024
-
[17]
Knowledge distillation from offline to streaming rnn transducer for end-to-end speech recognition
Gakuto Kurata and George Saon. Knowledge distillation from offline to streaming rnn transducer for end-to-end speech recognition. In Interspeech, pages 2117–2121, 2020
work page 2020
-
[18]
Learning small- size dnn with output-distribution-based criteria
Jinyu Li, Rui Zhao, Jui-Ting Huang, and Yifan Gong. Learning small- size dnn with output-distribution-based criteria. In interspeech, pages 1910–1914, 2014
work page 1910
-
[19]
Pytorch distributed: Experiences on accelerating data parallel training, 2020
Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, and Soumith Chintala. Pytorch distributed: Experiences on accelerating data parallel training, 2020
work page 2020
-
[20]
Low-latency sequence- to-sequence speech recognition and translation by partial hypothesis selection
Danni Liu, Gerasimos Spanakis, and Jan Niehues. Low-latency sequence- to-sequence speech recognition and translation by partial hypothesis selection. arXiv preprint arXiv:2005.11185 , 2020
-
[21]
Decoupled weight decay regularization, 2019
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019
work page 2019
-
[22]
Knowledge distillation for small-footprint highway networks
Liang Lu, Michelle Guo, and Steve Renals. Knowledge distillation for small-footprint highway networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 4820–4824. IEEE, 2017
work page 2017
-
[23]
Turning whisper into real-time transcription system
Dominik Mach ´aˇcek, Raj Dabre, and Ond ˇrej Bojar. Turning whisper into real-time transcription system. arXiv preprint arXiv:2307.14743 , 2023
-
[24]
Montreal forced aligner: Trainable text-speech alignment using kaldi
Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger. Montreal forced aligner: Trainable text-speech alignment using kaldi. In Interspeech, volume 2017, pages 498–502, 2017
work page 2017
-
[25]
Streaming automatic speech recognition with the transformer model
Niko Moritz, Takaaki Hori, and Jonathan Le. Streaming automatic speech recognition with the transformer model. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6074–6078. IEEE, 2020
work page 2020
-
[26]
Triggered attention for end-to-end speech recognition
Niko Moritz, Takaaki Hori, and Jonathan Le Roux. Triggered attention for end-to-end speech recognition. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 5666–5670. IEEE, 2019
work page 2019
-
[27]
Dual causal/non- causal self-attention for streaming end-to-end speech recognition
Niko Moritz, Takaaki Hori, and Jonathan Le Roux. Dual causal/non- causal self-attention for streaming end-to-end speech recognition. arXiv preprint arXiv:2107.01269, 2021
-
[28]
Librispeech: an asr corpus based on public domain audio books
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5206–5210. IEEE, 2015
work page 2015
-
[29]
Robust speech recognition via large-scale weak supervision, 2022
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision, 2022
work page 2022
-
[30]
Kanishka Rao, Ha s ¸im Sak, and Rohit Prabhavalkar. Exploring archi- tectures, data and units for streaming end-to-end speech recognition with rnn-transducer. In 2017 IEEE automatic speech recognition and understanding workshop (ASRU) , pages 193–199. IEEE, 2017
work page 2017
-
[31]
Dynamic-programming approach to continuous speech recognition
Hiroaki Sakoe. Dynamic-programming approach to continuous speech recognition. In 1971 Proc. the International Congress of Acoustics, Budapest, 1971
work page 1971
-
[32]
Dynamic programming algorithm optimization for spoken word recognition
Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing , 26(1):43–49, 2003
work page 2003
-
[33]
wav2vec: Unsupervised pre-training for speech recognition, 2019
Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. wav2vec: Unsupervised pre-training for speech recognition, 2019
work page 2019
-
[34]
Bidirectional recurrent neural networks
Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing , 45(11):2673–2681, 1997
work page 1997
-
[35]
Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, and Mike Seltzer. Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 6783–6787. IEEE, 2021
work page 2021
-
[36]
Transformer transducer: One model unifying streaming and non- streaming speech recognition
Anshuman Tripathi, Jaeyoung Kim, Qian Zhang, Han Lu, and Hasim Sak. Transformer transducer: One model unifying streaming and non- streaming speech recognition. arXiv preprint arXiv:2010.03192 , 2020
-
[37]
Decoder-only architecture for streaming end-to-end speech recognition, 2024
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe. Decoder-only architecture for streaming end-to-end speech recognition, 2024
work page 2024
-
[38]
Streaming transformer asr with blockwise synchronous beam search
Emiru Tsunoo, Yosuke Kashiwagi, and Shinji Watanabe. Streaming transformer asr with blockwise synchronous beam search. In 2021 IEEE Spoken Language Technology Workshop (SLT), pages 22–29. IEEE, 2021
work page 2021
-
[39]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023
work page 2023
-
[40]
Low latency end-to-end streaming speech recognition with a scout network
Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, and Ming Zhou. Low latency end-to-end streaming speech recognition with a scout network. arXiv preprint arXiv:2003.10369 , 2020
-
[41]
Simul-whisper: Attention-guided streaming whisper with truncation detection
Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, and Jian Li. Simul-whisper: Attention-guided streaming whisper with truncation detection. arXiv preprint arXiv:2406.10052 , 2024
-
[42]
Efficient whisper on streaming speech
Rongxiang Wang, Zhiming Xu, and Felix Xiaozhu Lin. Efficient whisper on streaming speech. arXiv preprint arXiv:2412.11272 , 2024
-
[43]
Streaming transformer-based acoustic models using self- attention with augmented memory
Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, and Frank Zhang. Streaming transformer-based acoustic models using self- attention with augmented memory. arXiv preprint arXiv:2005.08042 , 2020
-
[44]
Transformer-transducer: End-to-end speech recognition with self-attention
Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, and Michael L Seltzer. Transformer-transducer: End-to-end speech recognition with self-attention. arXiv preprint arXiv:1910.12977 , 2019
-
[45]
Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, and Shankar Kumar. Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 7829–7833. IEEE, 2020
work page 2020
-
[46]
Haoran Zhou, Xingchen Song, Brendan Fahy, Qiaochu Song, Binbin Zhang, Zhendong Peng, Anshul Wadhawan, Denglin Jiang, Apurv Verma, Vinay Ramesh, Srivas Prasad, and Michele M. Franceschini. Adapting whisper for streaming speech recognition via two-pass decoding, 2025. APPENDIX A THEOREMS PROOFS Theorem 1. Let kτ < T , where k is the frame index and τ is the...
work page 2025
-
[47]
Iterate through sample points: The quickbrown 0.25 0.51 0.9 The quickbrown 0.25 0.51 0.9 The quickbrown 0.25 0.51 0.9 fox 1.22 jumps 1.5 <EOT> <EOT> <EOT> Whisper + LoRA Layers Encoder + Blockwise Masked Self-Attention Decoder streaming log-mel 2D Conv + GeLU
-
[48]
Sample random points given the chunk size, and calculate target labels per sample point. Fig. 7: Fine-tuning process illustration. The above example demonstrates an encoder that uses a chunk size of size 300 msec. Using such method makes training more efficient, since there is no need to go through each possible frame in the streaming process. Assuming th...
-
[49]
(52) or P (yi−m = j | y<i−m, Xkτ ) ≥ P (yi−m = j | y<i−m, X(k−1)τ) (53) holds
If yi−m = j is stable, either: j = arg max u∈V P (yi−m = u | y<i−m, Xkτ ) . (52) or P (yi−m = j | y<i−m, Xkτ ) ≥ P (yi−m = j | y<i−m, X(k−1)τ) (53) holds. Either way, yi−m token is a token with higher probability than the last frame. Thus, ρCW k+1 ≥ ρG k+1 Theorem 4. Let T be the input sequence length to the encoder, d the embedding dimension, and τ the c...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.