pith. sign in

arxiv: 2512.07571 · v2 · submitted 2025-12-08 · 💻 cs.CL · cs.MM

A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification

Pith reviewed 2026-05-17 00:40 UTC · model grok-4.3

classification 💻 cs.CL cs.MM
keywords audiolanguagelargemethodmodelspeechclassificationsimple
0
0 comments X

The pith

Lasso-selected speech tokens enhance text LLMs for multimodal classification by reducing long audio sequences to task-relevant features via self-supervised adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The core idea starts with a speech tokenizer that turns audio into many discrete tokens from a large vocabulary. These tokens are turned into a simple bag-of-words count vector that is combined with the text. Lasso regression then picks out only the audio tokens that matter most for the target classification task, throwing away the rest to keep the input short. The language model is next adapted by training it to predict the selected audio tokens in a self-supervised way, so it learns to treat them as part of its vocabulary. Finally the model is fine-tuned on the actual task such as spotting fallacies in arguments or classifying emotions. The authors also test a random token selection baseline and find it still helps the text-only model. This pipeline is tested on fallacy detection datasets and a standard affective computing benchmark, showing gains over text-only models, larger speech-language models, and other ways of adding audio features.

Core claim

By applying a simple lasso-based feature selection on multimodal Bag-of-Words representation, we retain only the most important audio tokens for the task, and adapt the language model to them with a self-supervised language modeling objective, before fine-tuning it on the downstream task. We show this helps to improve the performances compared to an unimodal model, to a bigger SpeechLM or to integrating audio via a learned representation.

Load-bearing premise

That the lasso-selected subset of audio tokens plus self-supervised adaptation is sufficient to capture the speech information that actually helps the classification task without discarding critical cues or adding noise that hurts performance.

Figures

Figures reproduced from arXiv: 2512.07571 by Jose Guillen, Nicolas Calbucura, Valentin Barriere.

Figure 1
Figure 1. Figure 1: The Step 1 of our method consists in audio token selection pipeline based on an ℓ1 logistic regression using Bag-of-Word representation. This results on fewer Audio Tokens selected for a specific task. tive Fallacy Detection (AFD) tasks using datasets from Mancini et al. (2024a). As baseline, we con￾sider the results presented in Mancini et al. (2025), which showed a strong text dominance over audio and di… view at source ↗
Figure 2
Figure 2. Figure 2: The Step 2 and Step 3 consist in the pretraining the audio tokens embeddings followed by the fine-tuning of the multimodal LLM on the downstream task. respective embeddings, using a causal language modeling cross-entropy loss LCLM (see Equation 1). All the weights of the LLM stay frozen except for the embeddings of the audio tokens in order to not mess with the model’s initial representations (showed in [… view at source ↗
Figure 3
Figure 3. Figure 3: Task prompt for the Qwen2-Audio model for In-Context-Learning [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

This paper presents a simple method that allows to easily enhance textual pre-trained large language models with speech information, when fine-tuned for a specific classification task. A classical issue with the fusion of many embeddings from audio with text is the large length of the audio sequence compared to the text one. Our method benefits from an existing speech tokenizer trained for Audio Speech Recognition that output long sequences of tokens from a large vocabulary, making it difficult to integrate it at low cost in a large language model. By applying a simple lasso-based feature selection on multimodal Bag-of-Words representation, we retain only the most important audio tokens for the task, and adapt the language model to them with a self-supervised language modeling objective, before fine-tuning it on the downstream task. We show this helps to improve the performances compared to an unimodal model, to a bigger SpeechLM or to integrating audio via a learned representation. We demonstrate its effectiveness on Argumentative Fallacy Detection and Classification tasks where audio was previously believed counterproductive, and affective computing tasks on a widely-used dataset. We also provide an in-depth analysis of the method, showing that even a random audio token selection helps enhancing the unimodal model. Our code is available [online](https://github.com/salocinc/EACL26SpeechTokFallacy/).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that a simple pipeline—converting ASR token sequences to a multimodal Bag-of-Words count vector, applying lasso feature selection to retain task-relevant audio tokens, performing self-supervised language-model adaptation on the selected tokens, and then fine-tuning—improves classification performance over a text-only baseline, a larger SpeechLM, and learned audio representations. The approach is evaluated on argumentative fallacy detection (where audio was previously thought counterproductive) and affective computing tasks, with an additional analysis showing that even random audio-token selection yields gains over the unimodal model. Code is released.

Significance. If the empirical gains are robust, the method offers a lightweight way to inject speech information into existing text LLMs without managing long audio sequences or training new fusion modules, which could be practically useful for classification tasks that benefit from paralinguistic cues. The public code release and the observation that random selection also helps are positive contributions that facilitate reproducibility and further analysis.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): performance improvements are asserted without reporting dataset sizes, number of runs, statistical significance tests, or full baseline implementation details (e.g., how the larger SpeechLM and learned-representation baselines were trained or adapted). These omissions make it impossible to verify that the reported gains are reliable and attributable to the proposed pipeline rather than implementation differences.
  2. [§3 and analysis section] §3 (Method) and analysis section: the lasso operates on a Bag-of-Words representation that discards token order, co-occurrence, and timing. The paper itself reports that random token selection also improves over the unimodal baseline; this raises the possibility that gains arise from the adaptation step or from simply adding any additional tokens rather than from the lasso-selected speech content. A direct ablation comparing lasso-selected tokens against random and against frequency-based selection on the same downstream metrics is needed to support the central claim that lasso retains the “most important” audio tokens.
minor comments (2)
  1. [§3] Notation for the multimodal BoW vector and the lasso objective should be introduced with explicit equations rather than prose descriptions.
  2. [§4] Figure captions and table headers should explicitly state the evaluation metric (accuracy, F1, etc.) and whether results are averaged over multiple seeds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): performance improvements are asserted without reporting dataset sizes, number of runs, statistical significance tests, or full baseline implementation details (e.g., how the larger SpeechLM and learned-representation baselines were trained or adapted). These omissions make it impossible to verify that the reported gains are reliable and attributable to the proposed pipeline rather than implementation differences.

    Authors: We agree that these experimental details are essential for assessing reliability and reproducibility. In the revised manuscript we will add explicit dataset sizes to §4, report all results as averages over five random seeds with standard deviations, include statistical significance tests (paired t-tests and McNemar’s test where appropriate), and expand the baseline descriptions to specify training procedures, hyperparameters, and adaptation steps for the larger SpeechLM and learned-representation baselines. revision: yes

  2. Referee: [§3 and analysis section] §3 (Method) and analysis section: the lasso operates on a Bag-of-Words representation that discards token order, co-occurrence, and timing. The paper itself reports that random token selection also improves over the unimodal baseline; this raises the possibility that gains arise from the adaptation step or from simply adding any additional tokens rather than from the lasso-selected speech content. A direct ablation comparing lasso-selected tokens against random and against frequency-based selection on the same downstream metrics is needed to support the central claim that lasso retains the “most important” audio tokens.

    Authors: We acknowledge that the existing analysis already demonstrates gains from random selection, which suggests that the self-supervised adaptation step itself contributes to performance. To isolate the contribution of lasso selection and strengthen the central claim, we will add a direct ablation study comparing lasso-selected tokens, random selection, and frequency-based selection on identical downstream metrics. We will also clarify in §3 that while the Bag-of-Words representation discards order and timing, the subsequent language-model adaptation allows contextual modeling of the retained tokens; this limitation will be discussed explicitly. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical pipeline with independent evaluation

full rationale

The paper describes a straightforward sequence of standard components—ASR tokenization into a large vocabulary, conversion to multimodal Bag-of-Words counts, lasso-based feature selection on those counts, self-supervised language modeling adaptation on the selected tokens, and downstream fine-tuning—followed by empirical comparisons to unimodal baselines, larger SpeechLMs, and learned audio representations. No equations or derivations are presented that reduce the reported performance gains to a fitted parameter or self-referential definition by construction. The additional analysis that random token selection also yields gains is offered as an empirical observation rather than a core claim that collapses into the selection step. All load-bearing assertions remain externally falsifiable through the reported task accuracies on fallacy detection and affective computing datasets.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the standard assumption that lasso regression reliably identifies task-relevant features from high-dimensional bag-of-words vectors and that self-supervised language modeling on the selected tokens produces useful multimodal representations.

free parameters (1)
  • Lasso regularization strength
    Controls how many audio tokens are retained; value chosen to balance relevance and sequence length.
axioms (1)
  • standard math Lasso regression selects the most predictive features from a high-dimensional multimodal bag-of-words vector.
    Invoked when reducing the long audio token sequence to a small set of important tokens.

pith-pipeline@v0.9.0 · 5531 in / 1219 out tokens · 82006 ms · 2026-05-17T00:40:50.343395+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MultiLinguahah : A New Unsupervised Multilingual Acoustic Laughter Segmentation Method

    cs.CL 2026-05 unverdicted novelty 6.0

    An unsupervised multilingual laughter segmentation method using Isolation Forest on BYOL-A audio representations outperforms existing supervised methods on non-English datasets.

  2. MultiLinguahah : A New Unsupervised Multilingual Acoustic Laughter Segmentation Method

    cs.CL 2026-05 unverdicted novelty 5.0

    An unsupervised multilingual laughter segmentation technique using Isolation Forest on BYOL-A representations outperforms state-of-the-art supervised detectors on non-English audio datasets.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, and 3 others

    Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, and 3 others. 2016. TensorFlow: A system for large-scale machine learning ...

  2. [2]

    Jean Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, and 7 others. 2022. Flamingo: a Visual Language Model fo...

  3. [3]

    Valentin Barriere and Alexandra Balahur. 2023. https://www.mdpi.com/2227-7390/11/9/2161 Multilingual Multi-target Stance Recognition in Online Public Consultations . MDPI Mathematics -- Special issue on Human Language Technollogy, 11(9):2161

  4. [4]

    Valentin Barriere and Guillaume Jacquet. 2022. CoFE : A New Dataset of Intra-Multilingual Multi-target Stance Classification from an Online European Participatory Democracy Platform . AACL-IJCNLP

  5. [5]

    Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. 2023. https://arxiv.org/abs/2209.03143 Audiolm: a language modeling approach to audio generation . Preprint, arXiv:2209.03143

  6. [6]

    Eva Cant \' i n and Adriana Chust. 2025. https://doi.org/10.18653/v1/2025.argmining-1.36 Argumentative Fallacy Detection in Political Debates . In Proceedings of the 12th Argument mining Workshop, pages 369--373, Vienna, Austria. Association for Computational Linguistics

  7. [7]

    Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, and Jingren Zhou. 2024. http://arxiv.org/abs/2407.10759 Qwen2-Audio Technical Report . pages 1--16

  8. [8]

    Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sravan Bodapati, Sundararajan Srinivasan, Kyu J Han, and Katrin Kirchhoff. 2024. http://arxiv.org/abs/2405.08295 SpeechVerse: A Large-scale Generalizable Audio...

  9. [9]

    Alexandre D \' e fossez, Laurent Mazar \' e , Manu Orsini, Amélie Royer, Patrick P \' e rez, Hervé J \' e gou, Edouard Grave, and Neil Zeghidour. 2024. http://arxiv.org/abs/2410.00037 Moshi: a speech-text foundation model for real-time dialogue . pages 1--67

  10. [10]

    Gilles Degottex, John Kane, Thomas Drugman, Tuomo Raitio, and Stefan Scherer. 2014. https://doi.org/10.1109/ICASSP.2014.6853739 COVAREP - A collaborative voice analysis repository for speech technologies . In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pages 960--964

  11. [11]

    Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi. 2022. https://arxiv.org/abs/2210.13438 High fidelity neural audio compression . Preprint, arXiv:2210.13438

  12. [12]

    Townsend, Thibaud Gruber, and Carel P

    Marlen Fr \" o hlich, Christine Sievers, Simon W. Townsend, Thibaud Gruber, and Carel P. van Schaik. 2019. https://doi.org/10.1111/brv.12535 Multimodal communication and language origins: integrating gestures and vocalizations . Biological Reviews, 94(5):1809--1829

  13. [13]

    Pierpaolo Goffredo, Shohreh Haddadan, Vorakit Vorakitphan, Elena Cabrio, and Serena Villata. 2022. https://doi.org/10.24963/ijcai.2022/575 Fallacious Argument Classification in Political Debates . IJCAI International Joint Conference on Artificial Intelligence, pages 4143--4149

  14. [14]

    R. Gray. 1984. https://doi.org/10.1109/MASSP.1984.1162229 Vector quantization . IEEE ASSP Magazine, 1(2):4--29

  15. [15]

    Shohreh Haddadan, Elena Cabrio, and Serena Villata. 2019. https://doi.org/10.18653/v1/p19-1463 Yes, we can! Mining arguments in 50 years of US presidential campaign debates . ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 4684--4690

  16. [16]

    Wei Ning Hsu, Benjamin Bolte, Yao Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. https://doi.org/10.1109/TASLP.2021.3122291 HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units . IEEE/ACM Transactions on Audio Speech and Language Processing, 29(Cv):3451--3460

  17. [17]

    Youngmin Kim, Jiwan Chung, Jisoo Kim, Sunghyun Lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, and Youngjae Yu. 2025. https://doi.org/10.18653/v1/2025.acl-long.112 Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues . pages 2247--2265

  18. [18]

    Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, and Bryan Catanzaro. 2024. Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities . Proceedings of Machine Learning Research, 235:25125--25148

  19. [19]

    Marco Lippi and Paolo Torroni. 2016. https://doi.org/10.1609/aaai.v30i1.10384 Argument mining from speech: Detecting claims in political debates . 30th AAAI Conference on Artificial Intelligence, AAAI 2016, pages 2979--2985

  20. [20]

    Wenrui Liu, Qian Chen, Wen Wang, Yafeng Chen, Jin Xu, Zhifang Guo, Guanrou Yang, Weiqin Li, Xiaoda Yang, Tao Jin, Minghui Fang, Jialong Zuo, Bai Jionghao, and Zemin Liu. 2025. https://arxiv.org/abs/2505.24496 Speech token prediction via compressed-to-fine language modeling for speech generation . Preprint, arXiv:2505.24496

  21. [21]

    Eleonora Mancini, Federico Ruggeri, Stefano Colamonaco, Andrea Zecca, Samuele Marro, and Paolo Torroni. 2024 a . https://github.com/lt-nlp-lab-unibo/mamkit MAMKit: A Comprehensive Multimodal Argument Mining Toolkit . In Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024), pages 69--82

  22. [22]

    Eleonora Mancini, Federico Ruggeri, Andrea Galassi, and Paolo Torroni. 2022. https://aclanthology.org/2022.argmining-1.15/ Multimodal Argument Mining: A Case Study in Political Debates . Proceedings of the 9th Workshop on Argument Mining, pages 158--170

  23. [23]

    Eleonora Mancini, Federico Ruggeri, and Paolo Torroni. 2024 b . Multimodal Fallacy Classification in Political Debates . EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2:170--178

  24. [24]

    Eleonora Mancini, Federico Ruggeri, Serena Villata, and Paolo Torroni. 2025. https://doi.org/10.18653/v1/2025.argmining-1.35 Overview of MM-ArgFallacy2025 on Multimodal Argumentative Fallacy Detection and Classification in Political Debates . In Proceedings of the 12th Argument mining Workshop, pages 358--368

  25. [25]

    Middleton, Matt Ryan, Masood Gheasi, Timothy J

    Rafael Mestre, Stuart E. Middleton, Matt Ryan, Masood Gheasi, Timothy J. Norman, and Jiatong Zhu. 2023. https://doi.org/10.18653/v1/2023.findings-eacl.21 Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates . EACL 2023 - 17th Conference of the European Chapter of the Association for Computationa...

  26. [26]

    Middleton, Matt Ryan, Jiatong Zhu, and Timothy J

    Rafael Mestre, Razvan Milicin, Stuart E. Middleton, Matt Ryan, Jiatong Zhu, and Timothy J. Norman. 2021. https://doi.org/10.18653/v1/2021.argmining-1.8 M-Arg: Multimodal Argument Mining Dataset for Political Debates with Audio and Transcripts . 8th Workshop on Argument Mining, ArgMining 2021 - Proceedings, (2014):78--88

  27. [27]

    Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, and Kunio Kashino. 2023. https://doi.org/10.1109/TASLP.2022.3221007 BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations . IEEE/ACM Transactions on Audio Speech and Language Processing, 31:137--151

  28. [28]

    Alessio Pittiglio. 2025. https://doi.org/10.18653/v1/2025.argmining-1.39 Leveraging Context for Multimodal Fallacy Classification in Political Debates . In Proceedings of the 12th Argument mining Workshop, pages 388--397, Vienna, Austria. Association for Computational Linguistics

  29. [29]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. http://arxiv.org/abs/2103.00020 Learning Transferable Visual Models From Natural Language Supervision

  30. [30]

    Abdullah Tahir, Imaan Ibrar, Huma Ameer, Mehwish Fatima, and Seemab Latif. 2025. https://doi.org/10.18653/v1/2025.argmining-1.38 Prompt-Guided Augmentation and Multi-modal Fusion for Argumentative Fallacy Classification in Political Debates . In Proceedings of the 12th Argument mining Workshop, pages 381--387, Vienna, Austria. Association for Computationa...

  31. [31]

    Hugo Thimonier, Antony Perzo, and Renaud Seguier. 2025. http://arxiv.org/abs/2508.14130 EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition . pages 1--18

  32. [32]

    Vasuki and P.T

    A. Vasuki and P.T. Vanathi. 2006. https://doi.org/10.1109/MP.2006.1664069 A review of vector quantization techniques . IEEE Potentials, 25(4):39--47

  33. [33]

    Alessandro Vinciarelli, Maja Pantic, and Hervé Bourlard. 2009. https://doi.org/10.1016/j.imavis.2008.11.007 Social signal processing: Survey of an emerging domain . Image and Vision Computing, 27(12):1743--1759

  34. [34]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. 2019. http://arxiv.org/abs/1910.03771 HuggingFace's Transformers: State-of-the-art Natural Language Processing

  35. [35]

    Amir Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. http://aclweb.org/anthology/P18-1208 Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph . Proceedings of ACL, pages 2236--2246

  36. [36]

    Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, and Xipeng Qiu. 2024. SPEECHTOKENIZER: UNIFIED SPEECH TOKENIZER FOR SPEECH LANGUAGE MODELS . In 12th International Conference on Learning Representations, ICLR 2024, pages 1--21

  37. [37]

    online" 'onlinestring :=

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

  38. [38]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...