arxiv: 2604.05591 · v1 · submitted 2026-04-07 · 💻 cs.CE · cs.AI· cs.CL· cs.CY· cs.ET

Recognition: 1 theorem link

· Lean Theorem

AI-Driven Modular Services for Accessible Multilingual Education in Immersive Extended Reality Settings: Integrating Speech Processing, Translation, and Sign Language Rendering

N.D. Tantaroudas , A.J. McCracken , I. Karachalios , E. Papatheou

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:48 UTC · model grok-4.3

classification 💻 cs.CE cs.AIcs.CLcs.CYcs.ET

keywords modular AI platformXR educationmultilingual translationsign language renderingaccessible educationspeech processingvirtual realityAI services

0 comments

The pith

A modular AI platform combines six services to deliver real-time accessible multilingual education with sign language in immersive XR.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a system that links automatic speech recognition, translation, speech synthesis, emotion detection, dialogue summarization, and International Sign rendering into one XR application. Recorded sign gestures are turned into hand landmarks and then into 3D avatar animations. Separate technical tests on each service show acceptable latency and translation scores, which the authors take as evidence that the full setup can run live and support inclusive language learning. Readers might care because the design uses existing AI pieces in a way that can scale to different languages and user needs without rebuilding everything.

Core claim

These findings establish the viability of orchestrating cross-modal AI services within XR settings for accessible, multilingual language instruction. The modular design permits independent scaling and adaptation to varied educational contexts, providing a foundation for equitable learning solutions aligned with European Union digital accessibility goals.

What carries the argument

The modular platform that runs and connects six AI components—speech recognition, translation, synthesis, emotion classification, summarization, and mapping International Sign gestures to VR avatars—while validating each through isolated benchmarks.

If this is right

Components can be updated or swapped independently for new languages or user groups.
The system supports real-time XR use based on the reported latency and BLEU results.
It creates a path for inclusive education that reaches both multilingual speakers and sign language users.
The approach aligns with existing digital accessibility requirements in education.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Full classroom trials would be needed to check whether the technical numbers translate into actual language learning gains.
The same modular combination of existing models could apply to accessibility in training simulations or remote collaboration.
More sign language data could improve the gesture-to-avatar step for greater naturalness.

Load-bearing premise

That separate benchmarks of each AI component on latency and accuracy are enough to confirm the integrated system will perform well in real-time XR without full end-to-end user testing.

What would settle it

Running the complete platform with actual learners in XR and measuring combined response times, sign rendering accuracy, translation quality in conversation, and learning gains; results below practical thresholds would show the viability claim does not hold.

Figures

Figures reproduced from arXiv: 2604.05591 by A.J. McCracken, E. Papatheou, I. Karachalios, N.D. Tantaroudas.

**Figure 2.** Figure 2: Speech-to-text transcription utilising a Whisper AI wrapper [ [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Text-to-text translation workflow employing Meta’s NLLB model [ [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Emoticon-based sentiment feedback in the XR environment. The avatar delivers [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: API request and response format for the RoBERTa-based sentiment analysis module. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: API request and response format for the meeting summarisation module based on the [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Visual sequence of real-time avatar animation driven by extracted gesture landmarks [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Immersive VR-based language learning system. A 3D avatar stands in a virtual class [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: MVP demonstration showing multilingual avatar interaction in VR. The avatar delivers [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

read the original abstract

This work introduces a modular platform that brings together six AI services, automatic speech recognition via OpenAI Whisper, multilingual translation through Meta NLLB, speech synthesis using AWS Polly, emotion classification with RoBERTa, dialogue summarisation via flan t5 base samsum, and International Sign (IS) rendering through Google MediaPipe. A corpus of IS gesture recordings was processed to derive hand landmark coordinates, which were subsequently mapped onto three dimensional avatar animations inside a virtual reality (VR) environment. Validation comprised technical benchmarking of each AI component, including comparative assessments of speech synthesis providers and multilingual translation models (NLLB 200 and EuroLLM 1.7B variants). Technical evaluations confirmed the suitability of the platform for real time XR deployment. Speech synthesis benchmarking established that AWS Polly delivers the lowest latency at a competitive price point. The EuroLLM 1.7B Instruct variant attained a higher BLEU score, surpassing NLLB. These findings establish the viability of orchestrating cross modal AI services within XR settings for accessible, multilingual language instruction. The modular design permits independent scaling and adaptation to varied educational contexts, providing a foundation for equitable learning solutions aligned with European Union digital accessibility goals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper wires together six standard AI services into a VR platform for multilingual education with International Sign avatars, but the component benchmarks do not show the full system meets real-time XR requirements.

read the letter

The paper's core is an engineering integration: it links Whisper for speech recognition, NLLB or EuroLLM for translation, AWS Polly for synthesis, RoBERTa for emotion, Flan-T5 for summarization, and MediaPipe for turning recorded IS gestures into 3D avatar animations inside VR. They built a small corpus of gestures, extracted landmarks, and mapped them to the avatar. The new part is this exact combination aimed at accessible XR language teaching, plus the side-by-side model comparisons that led them to pick EuroLLM for better BLEU scores and Polly for lower latency at similar cost.

Referee Report

2 major / 2 minor

Summary. The paper introduces a modular platform integrating six AI services—automatic speech recognition via OpenAI Whisper, multilingual translation via Meta NLLB (and EuroLLM variants), speech synthesis via AWS Polly, emotion classification via RoBERTa, dialogue summarization via flan-t5-base-samsum, and International Sign rendering via Google MediaPipe—for accessible multilingual education in immersive XR/VR settings. It describes processing a corpus of IS gesture recordings into hand landmark coordinates mapped to 3D avatar animations and validates the approach through separate technical benchmarks of each component, including latency comparisons for synthesis providers and BLEU score comparisons for translation models, concluding that the platform is suitable for real-time XR deployment and supports equitable learning aligned with EU digital accessibility goals.

Significance. The modular orchestration of cross-modal AI services for XR-based language instruction represents a practical engineering contribution toward accessible education tools. If end-to-end integration and real-time performance were demonstrated, the work could serve as a foundation for scalable, adaptable systems in educational contexts. The explicit component benchmarking and focus on International Sign rendering are positive elements, but the absence of integrated system validation substantially reduces the strength of the viability claims.

major comments (2)

[Abstract] Abstract: The claim that 'Technical evaluations confirmed the suitability of the platform for real time XR deployment' rests solely on isolated component benchmarks (AWS Polly latency, EuroLLM BLEU scores, MediaPipe landmark mapping). No end-to-end pipeline latency from ASR input through translation, synthesis, summarization, emotion detection, and IS avatar rendering; no synchronization error between audio and 3D hand landmarks; and no orchestration overhead are reported, leaving the leap to real-time XR viability unsubstantiated.
[Abstract] Abstract (and validation description): The manuscript provides no user studies, educational effectiveness metrics, or specified real-time performance thresholds (e.g., maximum acceptable end-to-end latency for immersive XR) to support the stronger claims of enabling 'accessible, multilingual language instruction' and 'equitable learning solutions'. Component-level metrics alone do not establish overall platform suitability for the intended educational use case.

minor comments (2)

[Abstract] Abstract: Inconsistent terminology—'real time' appears without hyphenation while 'real-time' is conventional; standardize throughout.
[Abstract] Abstract: The IS gesture corpus is referenced but no details on its size, recording conditions, or exact mapping procedure to 3D avatars are supplied, hindering reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, acknowledging limitations where they exist and outlining targeted revisions to strengthen the presentation of our technical contribution.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'Technical evaluations confirmed the suitability of the platform for real time XR deployment' rests solely on isolated component benchmarks (AWS Polly latency, EuroLLM BLEU scores, MediaPipe landmark mapping). No end-to-end pipeline latency from ASR input through translation, synthesis, summarization, emotion detection, and IS avatar rendering; no synchronization error between audio and 3D hand landmarks; and no orchestration overhead are reported, leaving the leap to real-time XR viability unsubstantiated.

Authors: We agree that the abstract overstates the evidence by claiming confirmation of real-time XR suitability based solely on component benchmarks. The manuscript does not report integrated end-to-end latency, synchronization metrics, or orchestration overhead. We will revise the abstract to state that the benchmarks demonstrate the potential of individual components for low-latency operation in XR contexts. We will add an explicit limitations paragraph noting the absence of full-pipeline measurements and provide an aggregated latency estimate derived from the reported component times, along with a discussion of synchronization strategies for future integrated testing. revision: yes
Referee: [Abstract] Abstract (and validation description): The manuscript provides no user studies, educational effectiveness metrics, or specified real-time performance thresholds (e.g., maximum acceptable end-to-end latency for immersive XR) to support the stronger claims of enabling 'accessible, multilingual language instruction' and 'equitable learning solutions'. Component-level metrics alone do not establish overall platform suitability for the intended educational use case.

Authors: The manuscript is a technical engineering contribution centered on modular AI service integration and component benchmarking; it does not include user studies or educational outcome metrics. We will revise the abstract and conclusion to moderate the language, framing the work as providing a technical foundation for accessible multilingual XR education rather than claiming validated educational effectiveness or equitable learning solutions. We will also reference established XR latency thresholds from the literature (e.g., sub-100 ms end-to-end for immersion) and map our component results against them. Conducting user studies lies outside the current scope and would require separate ethical and resource considerations, which we will note as a future direction. revision: partial

Circularity Check

0 steps flagged

No circularity; viability claims rest on external service benchmarks

full rationale

The paper describes integration of six pre-existing AI services (OpenAI Whisper, Meta NLLB, AWS Polly, RoBERTa, flan-t5, MediaPipe) and reports separate technical benchmarks drawn from those external providers. No equations, derivations, fitted parameters, or internal predictions appear anywhere in the manuscript. The assertion that 'technical evaluations confirmed the suitability of the platform for real time XR deployment' is grounded in cited third-party metrics (latency, BLEU scores) rather than any self-referential construction or renaming of results. The work therefore contains no load-bearing steps that reduce to their own inputs by definition or self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the work assembles existing commercial and open-source AI services plus a processed gesture corpus without postulating new quantities or assumptions beyond standard model usage.

pith-pipeline@v0.9.0 · 5553 in / 1193 out tokens · 62553 ms · 2026-05-10T18:48:18.518774+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The proposed platform unifies modular AI-driven services... six AI services: automatic speech recognition via OpenAI Whisper, multilingual translation through Meta NLLB, speech synthesis using AWS Polly, emotion classification with RoBERTa, dialogue summarisation via flan-t5-base-samsum, and International Sign (IS) rendering through Google MediaPipe... Technical evaluations confirmed the platform’s suitability for real-time XR deployment.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 39 canonical work pages · 3 internal anchors

[1]

Shliakhtina, T

O. Shliakhtina, T. Kyselova, S. Mudra, Y. Talalay, and A. Oleksiienko. The effectiveness of the grammar translation method for learning English in higher education institutions. Eduweb, 17 0 (3), 2023. doi:10.46502/issn.1856-7576/2023.17.03.12

work page doi:10.46502/issn.1856-7576/2023.17.03.12 2023
[2]

H. Wu, H. Su, M. Yan, and Q. Zhuang. Perceptions of grammar-translation method and communicative language teaching method used in English classrooms. Journal of English Language Teaching and Applied Linguistics, 5 0 (2), 2023 a . doi:10.32996/jeltal.2023.5.2.12

work page doi:10.32996/jeltal.2023.5.2.12 2023
[3]

Divekar et al

R. Divekar et al. Foreign language acquisition via artificial intelligence and extended reality: Design and evaluation. Computer Assisted Language Learning, 35 0 (9): 0 2332--2360, 2021. doi:10.1080/09588221.2021.1879162

work page doi:10.1080/09588221.2021.1879162 2021
[4]

Tegoan, S

N. Tegoan, S. Wibowo, and S. Grandhi. Application of the extended reality technology for teaching new languages: A systematic review. Applied Sciences, 11 0 (23): 0 11360, 2021. doi:10.3390/app112311360

work page doi:10.3390/app112311360 2021
[5]

Panagiotidis

P. Panagiotidis. Virtual reality applications and language learning. International Journal for Cross-Disciplinary Subjects in Education, 12: 0 4447--4454, 2021. doi:10.20533/ijcdse.2042.6364.2021.0543

work page doi:10.20533/ijcdse.2042.6364.2021.0543 2021
[6]

Godwin-Jones

R. Godwin-Jones. Presence and agency in real and virtual spaces: The promise of extended reality for language learning. Language Learning & Technology, 27 0 (3): 0 6--26, 2023. https://hdl.handle.net/10125/73529

2023
[7]

Zhi and L

Y. Zhi and L. Wu. Extended reality in language learning: A cognitive affective model of immersive learning perspective. Frontiers in Psychology, 14, 2023. doi:10.3389/fpsyg.2023.1109025

work page doi:10.3389/fpsyg.2023.1109025 2023
[8]

https://immerseme.co/

ImmerseMe VR . https://immerseme.co/. Accessed: 2025

2025
[9]

https://www.mondly.com/

MondlyAR . https://www.mondly.com/. Accessed: 2025

2025
[10]

Garcia, A

C. Garcia, A. Guzman, and D. S\' a nchez Ruano. Binding AI and XR in design education: Challenges and opportunities with emerging technologies. In Proceedings of the 26th International Conference on Engineering and Product Design Education (EPDE), pages 247--251, 2024. doi:10.35199/EPDE.2024.42

work page doi:10.35199/epde.2024.42 2024
[11]

Hartholt, E

A. Hartholt, E. Fast, A. Reilly, W. Whitcup, M. Liewer, and S. Mozgai. Ubiquitous virtual humans: A multi-platform framework for embodied AI agents in XR . In 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), pages 308--3084, 2019. doi:10.1109/AIVR46125.2019.00072

work page doi:10.1109/aivr46125.2019.00072 2019
[12]

2024), 6204–6224

R. Zhang, D. Zou, and G. Cheng. Concepts, affordances, and theoretical frameworks of mixed reality enhanced language learning. Interactive Learning Environments, 32 0 (7): 0 3624--3637, 2023. doi:10.1080/10494820.2023.2187421

work page doi:10.1080/10494820.2023.2187421 2023
[13]

C. L. Taborda, H. Nguyen, and P. Bourdot. Engagement and attention in XR for learning: Literature review. In Virtual Reality and Mixed Reality. EuroXR 2024. Lecture Notes in Computer Science, volume 15445. Springer, Cham, 2025. doi:10.1007/978-3-031-78593-1_13

work page doi:10.1007/978-3-031-78593-1_13 2024
[14]

Taborri, P

J. Taborri, P. Fornai, E. Yeguas-Bolivar, M. D. Redel-Macias, M. Hilzensauer, A. Pecher, M. Leisenberg, A. Melis, and S. Rossi. The use of artificial intelligence for sign language recognition in education: From a literature overview to the ISENSE project. In 2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and...

work page doi:10.1109/metroxraine58569.2023.10405716 2023
[15]

Strobel, T

G. Strobel, T. Schoormann, L. Banh, and F. M\" o ller. Artificial intelligence for sign language translation -- a design science research study. Communications of the Association for Information Systems, 53, 2023. doi:10.17705/1cais.05303

work page doi:10.17705/1cais.05303 2023
[16]

Chaudhary, T

L. Chaudhary, T. Ananthanarayana, E. Hoq, and I. Nwogu. SignNet II : A transformer-based two-way sign language translation model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45: 0 12896--12907, 2022. doi:10.1109/TPAMI.2022.3232389

work page doi:10.1109/tpami.2022.3232389 2022
[17]

P. A. Rodr\' i guez-Correa, A. Valencia-Arias, O. N. Pati\ n o Toro, Y. Oblitas D\' i az, and R. Teodori de la Puente. Benefits and development of assistive technologies for deaf people's communication: A systematic review. Frontiers in Education, 8, 2023. doi:10.3389/feduc.2023.1121597

work page doi:10.3389/feduc.2023.1121597 2023
[18]

EUD position paper: International sign language

European Union of the Deaf . EUD position paper: International sign language. https://eud.eu/eud/position-papers/international-signs/, 2018

2018
[19]

A. Yin, T. Zhong, L. H. Tang, W. Jin, T. Jin, and Z. Zhao. Gloss attention for gloss-free sign language translation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2551--2562, 2023. doi:10.1109/CVPR52729.2023.00251

work page doi:10.1109/cvpr52729.2023.00251 2023
[20]

Sylaiou, E

S. Sylaiou, E. Gkagka, C. Fidas, E. Vlachou, G. Lampropoulos, A. Plytas, and V. Nomikou. Use of XR technologies for fostering visitors' experience and inclusion at industrial museums. In Proceedings of the 2nd International Conference of the ACM Greek SIGCHI Chapter (CHI-GREECE '23), pages 1--5, 2023. doi:10.1145/3609987.3610008

work page doi:10.1145/3609987.3610008 2023
[21]

Hirzle, F

T. Hirzle, F. M\" u ller, F. Draxler, M. Schmitz, P. Knierim, and K. Hornb k. When XR and AI meet -- a scoping review on extended reality and artificial intelligence. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), pages 1--45, 2023. doi:10.1145/3544548.3581072

work page doi:10.1145/3544548.3581072 2023
[22]

N. D. Tantaroudas, A. J. McCracken, I. Karachalios, and E. Papatheou. AI -based services to support language-learning for deaf and hearing individuals in immersive XR settings. In Extended Reality. XR Salento 2025. Lecture Notes in Computer Science, volume 15743. Springer, Cham, 2026 a . doi:10.1007/978-3-031-97781-7_17

work page doi:10.1007/978-3-031-97781-7_17 2025
[23]

N. D. Tantaroudas, A. J. McCracken, I. Karachalios, and E. Papatheou. Enhancing accessibility and inclusivity in business meetings through AI -driven extended reality solutions. In Extended Reality. XR Salento 2025. Lecture Notes in Computer Science, volume 15743. Springer, Cham, 2026 b . doi:10.1007/978-3-031-97781-7_6

work page doi:10.1007/978-3-031-97781-7_6 2025
[24]

N. D. Tantaroudas, A. J. McCracken, I. Karachalios, V. Pastrikakis, and E. Papatheou. Transforming career development through immersive and data-driven solutions. In Extended Reality. XR Salento 2025. Lecture Notes in Computer Science, volume 15742. Springer, Cham, 2026 c . doi:10.1007/978-3-031-97778-7_7

work page doi:10.1007/978-3-031-97778-7_7 2025
[25]

N. D. Tantaroudas, A. J. McCracken, I. Karachalios, and E. Papatheou. INTERACT : AI -powered extended reality platform for inclusive communication with real-time sign language translation and sentiment analysis. Open Research Europe, 6: 0 71, 2026 d . doi:10.12688/openreseurope.23201.1. version 1; peer review: awaiting peer review

work page doi:10.12688/openreseurope.23201.1 2026
[26]

N. D. Tantaroudas, A. J. McCracken, I. Karachalios, and E. Papatheou. AI -based services for inclusive language learning in immersive XR environments: Speech translation, and sign language integration. Open Research Europe, 6: 0 72, 2026 e . doi:10.12688/openreseurope.23214.1. version 1; peer review: awaiting peer review

work page doi:10.12688/openreseurope.23214.1 2026
[27]

Robust Speech Recognition via Large-Scale Weak Supervision

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever. Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023), PMLR 202, pages 28492--28518, 2023. doi:10.48550/arXiv.2212.04356

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.04356 2023
[28]

No Language Left Behind: Scaling Human-Centered Machine Translation

NLLB Team et al. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022. doi:10.48550/arXiv.2207.04672

work page internal anchor Pith review doi:10.48550/arxiv.2207.04672 2022
[29]

Y. Liu, J. Zhu, J. Zhang, and C. Zong. Bridging the modality gap for speech-to-text translation. arXiv preprint arXiv:2010.14920, 2020. doi:10.48550/arXiv.2010.14920

work page doi:10.48550/arxiv.2010.14920 2010
[30]

M. S. Anwar, B. Shi, V. Goswami, W. Hsu, J. M. Pino, and C. Wang. MuAViC : A multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation. arXiv preprint arXiv:2303.00628, 2023. doi:10.48550/arXiv.2303.00628

work page doi:10.48550/arxiv.2303.00628 2023
[31]

N. C. Camg\" o z, O. Koller, S. Hadfield, and R. Bowden. Sign language transformers: Joint end-to-end sign language recognition and translation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10020--10030, 2020. doi:10.1109/CVPR42600.2020.01004

work page doi:10.1109/cvpr42600.2020.01004 2020
[32]

B. Zhou, Z. Chen, A. Clap\' e s, J. Wan, Y. Liang, and S. Escalera. Gloss-free sign language translation: Improving from visual-language pretraining. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 20814--20824, 2023. doi:10.1109/ICCV51070.2023.01908

work page doi:10.1109/iccv51070.2023.01908 2023
[33]

Assran, Q

J. Zheng, Y. Wang, C. Tan, S. Li, G. Wang, and J. Xia. CVT-SLR : Contrastive visual-textual transformation for sign language recognition with variational alignment. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23141--23150, 2023. doi:10.1109/CVPR52729.2023.02216

work page doi:10.1109/cvpr52729.2023.02216 2023
[34]

X. Wu, X. Luo, Z. Song, Y. Bai, B. Zhang, and G. Zhang. Ultra-robust and sensitive flexible strain sensor for real-time and wearable sign language translation. Advanced Functional Materials, 33, 2023 b . doi:10.1002/adfm.202303504

work page doi:10.1002/adfm.202303504 2023
[35]

MediaPipe: A Framework for Building Perception Pipelines

C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M.-G. Yong, J. Lee, W.-T. Chang, W. Hua, M. Georg, and M. Grundmann. MediaPipe : A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019. doi:10.48550/arXiv.1906.08172

work page internal anchor Pith review doi:10.48550/arxiv.1906.08172 1906
[36]

Subramanian, B

B. Subramanian, B. Olimov, S. M. Naik, et al. An integrated MediaPipe -optimized GRU model for Indian sign language recognition. Scientific Reports, 12: 0 11964, 2022. doi:10.1038/s41598-022-15998-7

work page doi:10.1038/s41598-022-15998-7 2022
[37]

https://github.com/bishal7679/ASL-Transformer

ASL Transformer . https://github.com/bishal7679/ASL-Transformer. Accessed: 2025

2025
[38]

https://www.spreadthesign.com/en.gb/search/

SpreadTheSign . https://www.spreadthesign.com/en.gb/search/. Accessed: 2025

2025
[39]

Srivastava, S

S. Srivastava, S. Singh, Pooja, et al. Continuous sign language recognition system using deep learning with MediaPipe holistic. Wireless Personal Communications, 137: 0 1455--1468, 2024. doi:10.1007/s11277-024-11356-0

work page doi:10.1007/s11277-024-11356-0 2024
[40]

https://web.archive.org/web/20150711105152/http://www.handspeak.com/world/isl/index.php?id=151

HandSpeak -- International Sign Language . https://web.archive.org/web/20150711105152/http://www.handspeak.com/world/isl/index.php?id=151. Accessed: 2025

work page arXiv 2025
[41]

Available: https://doi.org/10.1145/3551349.3559555

T. Ahmed and P. Devanbu. Few-shot training LLMs for project-specific code-summarization. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022. doi:10.1145/3551349.3559555

work page doi:10.1145/3551349.3559555 2022
[42]

Boros and M

K. Boros and M. Oyamada. Towards large language model organization: A case study on abstractive summarization. In 2023 IEEE International Conference on Big Data (BigData), pages 6109--6112, Sorrento, Italy, 2023. doi:10.1109/BigData59044.2023.10386199

work page doi:10.1109/bigdata59044.2023.10386199 2023
[43]

Bozkir, S

E. Bozkir, S. \" O zdel, K. H. C. Lau, M. Wang, H. Gao, and E. Kasneci. Embedding large language models into extended reality: Opportunities and challenges for inclusion, engagement, and privacy. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (CUI '24), pages 1--7, 2024. doi:10.1145/3640794.3665563

work page doi:10.1145/3640794.3665563 2024
[44]

Ramprasad, E

S. Ramprasad, E. Ferracane, and Z. Lipton. Analyzing LLM behavior in dialogue summarization: Unveiling circumstantial hallucination trends. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), pages 12549--12561, 2024. doi:10.18653/v1/2024.acl-long.677

work page doi:10.18653/v1/2024.acl-long.677 2024
[45]

Barbieri, J

F. Barbieri, J. Camacho-Collados, L. Espinosa-Anke, and L. Neves. TweetEval : Unified benchmark and comparative evaluation for tweet classification. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1644--1650, 2020. doi:10.18653/v1/2020.findings-emnlp.148

work page doi:10.18653/v1/2020.findings-emnlp.148 2020
[46]

https://github.com/coqui-ai/TTS

Coqui TTS : High-quality text-to-speech synthesis for researchers and developers. https://github.com/coqui-ai/TTS. Accessed: 2025

2025
[47]

Piper: A fast, local neural text to speech system

Rhasspy contributors . Piper: A fast, local neural text to speech system. https://github.com/rhasspy/piper, 2023

2023
[48]

TTS latency benchmark

Picovoice . TTS latency benchmark. https://picovoice.ai/docs/benchmark/tts-latency/, 2024

2024
[49]

P. Schmid. flan-t5-base-samsum. Hugging Face model repository. https://huggingface.co/philschmid/flan-t5-base-samsum, 2022

2022
[50]

Bradski and A

G. Bradski and A. Kaehler. Learning OpenCV : Computer Vision with the OpenCV Library . O'Reilly Media, Sebastopol, CA, 2008

2008