arxiv: 2512.21204 · v2 · submitted 2025-12-24 · 💻 cs.CL · cs.AI

Recognition: no theorem link

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Mahi Luthra , Jiayi Shen , Maxime Poli , Angelo Ortiz , Yosuke Higuchi , Youssef Benchekroun , Martin Gleize , Charles-Eric Saint-James

show 9 more authors

Dongyan Lin Phillip Rust Angel Villar Surya Parimi Vanessa Stark Rashel Moritz Juan Pino Yann LeCun Emmanuel Dupoux

Authors on Pith no claims yet

Pith reviewed 2026-05-16 19:54 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords speech representation learningfew-shot adaptationmeta-learningbi-level optimizationphonemic discriminabilityspoken language modelingdata efficiency

0 comments

The pith

SpidR-Adapt adapts universal speech models to new languages using less than one hour of unlabeled audio.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current self-supervised speech models require far more data than human infants to acquire basic language units, creating a clear efficiency gap. This paper treats adaptation to a new language as a meta-learning problem and solves it with a bi-level optimization setup called MAdaPT. A first-order heuristic keeps the computation feasible, while alternating self-supervised and supervised signals stabilizes training. The result is a single model that rapidly improves phoneme discrimination and language modeling scores on the target language.

Core claim

SpidR-Adapt formulates low-resource speech representation learning as a meta-learning problem through a multi-task adaptive pre-training protocol expressed as bi-level optimization; this is made practical by the first-order bi-level optimization heuristic together with interleaved supervision that alternates self-supervised and supervised objectives, yielding models that surpass in-domain toplines on phonemic and language-modeling metrics after exposure to under one hour of target audio.

What carries the argument

The first-order bi-level optimization heuristic inside the multi-task adaptive pre-training protocol.

Load-bearing premise

The bi-level optimization framework with its first-order heuristic can be trained stably across languages without language-specific tuning or instability.

What would settle it

Train SpidR-Adapt on a held-out language using less than one hour of its unlabeled audio and measure whether ABX, sWUGGY, sBLIMP, and tSC scores exceed those of models trained directly on that language.

Figures

Figures reproduced from arXiv: 2512.21204 by Angelo Ortiz, Angel Villar, Charles-Eric Saint-James, Dongyan Lin, Emmanuel Dupoux, Jiayi Shen, Juan Pino, Mahi Luthra, Martin Gleize, Maxime Poli, Phillip Rust, Rashel Moritz, Surya Parimi, Vanessa Stark, Yann LeCun, Yosuke Higuchi, Youssef Benchekroun.

**Figure 1.** Figure 1: Overview of SpidR-Adapt for few-shot speech adaptation. It consists of three main phrases: (1) meta-initialization performs multi-task pre-training with interleaved supervision, learning a robust initialization ϕ0 from a mixture of source domains. (2) meta-training through MAdaPT-FOBLO optimizes this initialization for fast adaption to Dℓ. Each worker conducts inner-loop adaptation with active forgetting (… view at source ↗

**Figure 2.** Figure 2: Data-efficiency of SpidR-Adapt on new languages across different adaptation data scales. We report ABX scores (lower is better) averaged across three test languages (French, German, English) for two initialization strategies (a) self-supervision [SSL] and (b) interleaved-supervision [SSL/SL]. Each sub-figure compares our approach with the baselines: In-Domain Mono-Task-PT, the oracle method pertained on 6k… view at source ↗

**Figure 3.** Figure 3: Learning rate scheduler for FOBLO. We use blue and orange to represent the learning rate for selfsupervised inner-steps and supervised outer-steps, respectively. The overall training has 200,000 steps. The learning rate scheduler alternates between inner-loop and outer-loop steps within each episode, with resets every 2,000 steps. The inner-loop uses a constant rate after a warmup, while the outer-loop fo… view at source ↗

**Figure 4.** Figure 4: Layer-wise analysis on the model’s discriminability over phonemes. We present the ABX scores averaged over the corresponding new languages, and across the two within- and across-speaker conditions: (a) 5 development and (b) 3 test languages. We report results for our proposed MAdaPT-FOBLO method with two types of meta-initialization, Multi-Task-PT[SSL] and Multi-Task-PT[SSL/SL]. The optimal layer for ABX p… view at source ↗

read the original abstract

Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation of speech units to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and downstream spoken language modeling scores (sWUGGY, sBLIMP, tSC), surpassing in-domain toplines after training on less than 1h of target-language audio and delivering $100\times$ greater data efficiency than standard multi-task training. These findings highlight a practical, architecture-agnostic path toward biologically inspired, data-efficient representations. We open-source the training code and model checkpoints at https://github.com/facebookresearch/spidr-adapt.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SpidR-Adapt puts a bi-level meta-learning wrapper around speech pretraining with a first-order heuristic and interleaved objectives, claiming 100x data efficiency on held-out languages, but the stability of that heuristic is the part that still needs checking.

read the letter

The paper's core move is to treat adaptation to a new language as a meta-learning task: an outer loop optimizes a shared initialization while an inner loop adapts on tiny target data. They replace the usual second-order costs with FOBLO, a first-order approximation, and stabilize training by alternating self-supervised and supervised signals before the meta phase. The reported outcome is that under one hour of unlabeled audio the adapted model beats in-domain toplines on ABX phoneme discrimination and on sWUGGY, sBLIMP, and tSC scores, while using far less data than plain multi-task training. They release code and checkpoints, which is useful for anyone who wants to test the numbers directly.

Referee Report

2 major / 1 minor

Summary. The paper introduces SpidR-Adapt, a meta-learning method for few-shot adaptation of speech representations to new languages. It casts low-resource adaptation as a bi-level optimization problem solved via the first-order bi-level optimization (FOBLO) heuristic, stabilized by interleaved self-supervised and supervised initialization. The method is evaluated on phonemic discriminability (ABX) and downstream spoken language modeling tasks (sWUGGY, sBLIMP, tSC), claiming to surpass in-domain toplines with <1h of target audio and 100× data efficiency over standard multi-task training.

Significance. If the empirical claims hold under rigorous verification, the work offers a practical route to data-efficient universal speech models that narrow the gap with human infant acquisition. The architecture-agnostic framing and open-sourced code/checkpoints would make the contribution immediately usable for low-resource language applications.

major comments (2)

[§3.2] §3.2 (FOBLO heuristic): the manuscript introduces FOBLO as a first-order approximation to bi-level optimization but supplies no convergence analysis, optimization trajectory plots, or sensitivity study with respect to inner-loop step size and outer-loop learning rate. The central claim that fixed hyperparameters suffice for stable meta-training across held-out languages therefore rests on an unverified assumption.
[§4] §4 (experimental protocol): the reported gains on ABX, sWUGGY, sBLIMP and tSC are presented without ablation isolating the interleaved initialization from the FOBLO component, without per-language hyperparameter sweeps, and without statistical significance tests or variance across random seeds. These omissions make it impossible to attribute the 100× efficiency claim unambiguously to the proposed framework.

minor comments (1)

[Abstract] The abstract states '100× greater data efficiency' without defining the precise baseline (hours of data, compute, or wall-clock time) or the exact multi-task training protocol used for comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major point below and outline the revisions we will make.

read point-by-point responses

Referee: [§3.2] §3.2 (FOBLO heuristic): the manuscript introduces FOBLO as a first-order approximation to bi-level optimization but supplies no convergence analysis, optimization trajectory plots, or sensitivity study with respect to inner-loop step size and outer-loop learning rate. The central claim that fixed hyperparameters suffice for stable meta-training across held-out languages therefore rests on an unverified assumption.

Authors: We acknowledge the absence of formal convergence analysis for the FOBLO heuristic, which we present as a practical first-order approximation rather than a theoretically guaranteed solver. Stability is demonstrated empirically through consistent gains on held-out languages, but we agree this is insufficient. In the revision we will add optimization trajectory plots for representative meta-training runs and a sensitivity study over inner-loop step sizes and outer-loop learning rates, confirming that the chosen fixed hyperparameters remain effective across the evaluated language set. revision: yes
Referee: [§4] §4 (experimental protocol): the reported gains on ABX, sWUGGY, sBLIMP and tSC are presented without ablation isolating the interleaved initialization from the FOBLO component, without per-language hyperparameter sweeps, and without statistical significance tests or variance across random seeds. These omissions make it impossible to attribute the 100× efficiency claim unambiguously to the proposed framework.

Authors: We agree that dedicated ablations are needed to isolate the interleaved initialization from the FOBLO component and will add them in the revision. We will also report performance across multiple random seeds with variance estimates and include statistical significance tests for the primary comparisons. Regarding per-language hyperparameter sweeps, the method is intentionally designed to use fixed hyperparameters to demonstrate universality and data efficiency without language-specific tuning; a full per-language sweep would undermine this claim. We will instead add a limited sensitivity analysis on a representative subset of languages. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the meta-learning protocol or empirical claims

full rationale

The paper frames SpidR-Adapt as a new meta-learning protocol (MAdaPT with FOBLO heuristic and interleaved initialization) whose central claims rest on empirical results: rapid ABX gains and downstream sWUGGY/sBLIMP/tSC improvements after <1 h target audio, outperforming in-domain toplines and standard multi-task training by 100× data efficiency. No equations, derivations, or fitted parameters are presented that reduce the reported gains to quantities defined by construction from the inputs. The evaluation uses held-out languages and fixed hyperparameters, with the method described as architecture-agnostic; the derivation chain is therefore self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach rests on standard assumptions of self-supervised speech learning and meta-learning stability.

axioms (1)

domain assumption Self-supervised speech objectives produce useful representations that can be adapted via meta-learning
Implicit in the use of existing SSL pretraining as the base for adaptation

invented entities (1)

FOBLO heuristic no independent evidence
purpose: Approximate solution to bi-level optimization for scalable meta-training
New optimization technique introduced to avoid heavy computation

pith-pipeline@v0.9.0 · 5608 in / 1298 out tokens · 20898 ms · 2026-05-16T19:54:18.293056+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Divyanshu Aggarwal, Ashutosh Sathe, and Sunayana Sitaram. 2024. Exploring pretraining via active forgetting for improving cross lingual transfer for decoder language models. arXiv preprint arXiv:2410.16168

work page arXiv 2024
[2]

Emily Ahn and Eleanor Chodroff. 2022. Vox C ommunis: A corpus for cross-linguistic phonetic analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5286--5294

work page 2022
[3]

Tyers, and Gregor Weber

Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. 2020. https://aclanthology.org/2020.lrec-1.520/ Common voice: A massively-multilingual speech corpus . In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 42...

work page 2020
[4]

Can Balioglu, Martin Gleize, Artyom Kozhevnikov, Ilia Kulikov, Tuan Tran, and Julien Yao. 2023. http://github.com/facebookresearch/fairseq2 fairseq2

work page 2023
[5]

Elika Bergelson, Andrei Amatuni, Shannon Dailey, Sharath Koorathota, and Shaelise Tor. 2019. https://doi.org/10.1111/desc.12715 Day by day, hour by hour: Naturalistic language input to infants . Developmental Science, 22(1):e12715

work page doi:10.1111/desc.12715 2019
[6]

Elika Bergelson and Daniel Swingley. 2012. https://doi.org/10.1073/pnas.1113380109 At 6--9 months, human infants know the meanings of many common words . Proceedings of the National Academy of Sciences, 109(9):3253--3258

work page doi:10.1073/pnas.1113380109 2012
[7]

Zalan Borsos, Matt Sharifi, Damien Vincent, Olivier Pietquin, Antoine Caillon, David Grangier, Andriy Zabolotskiy, Neil Zeghidour, and Marco Tagliasacchi. 2022. https://arxiv.org/abs/2209.03143 Audiolm: a language modeling approach to audio generation . arXiv preprint arXiv:2209.03143

work page arXiv 2022
[8]

Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, and Furu Wei. 2022. https://doi.org/10.1109/JSTSP.2022.3188113 WavLM : Large-Scale Self-Supervised Pre-Training for Full Stack Speech Proce...

work page doi:10.1109/jstsp.2022.3188113 2022
[9]

Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, and Mikel Artetxe. 2023. https://papers.nips.cc/paper/2023/hash/6450ea28ebbc8437bc38775157818172-Abstract-Conference.html Improving language plasticity via pretraining with active forgetting . In Advances in Neural Information Processing Systems (Ne...

work page 2023
[10]

Yu Chen, Tom Ko, and Jian Wang. 2021. https://doi.org/10.21437/Interspeech.2021-147 A meta-learning approach for user-defined spoken term classification with varying classes and examples . In Proceedings of Interspeech 2021, pages 4224--4228

work page doi:10.21437/interspeech.2021-147 2021
[11]

Margaret Cychosz, Anele Villanueva, and Adriana Weisleder. 2021. https://doi.org/10.1044/2021_JSLHR-20-00755 Efficient estimation of children's language exposure in two bilingual communities . Journal of Speech, Language, and Hearing Research, 64(10):3843--3866

work page doi:10.1044/2021_jslhr-20-00755 2021
[12]

Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen De Seyssel, Patricia Roz \'e , Morgane Rivi \`e re, Eugene Kharitonov, and Emmanuel Dupoux. 2021. https://doi.org/10.21437/Interspeech.2021-1755 The Zero Resource Speech Challenge 2021: Spoken Language Modelling . In Interspeech 2021, pages 1574--1578

work page doi:10.21437/interspeech.2021-1755 2021
[13]

Ewan Dunbar, Xuan Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, and Emmanuel Dupoux. 2017. The zero resource speech challenge 2017. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 323--330. IEEE

work page 2017
[14]

Emmanuel Dupoux. 2018. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner. Cognition, 173:43--59

work page 2018
[15]

Eimas, Eugene R

Peter D. Eimas, Eugene R. Siqueland, Peter Jusczyk, and James Vigorito. 1971. https://doi.org/10.1126/science.171.3968.303 Speech perception in infants . Science, 171(3968):303--306

work page doi:10.1126/science.171.3968.303 1971
[16]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126--1135. PMLR

work page 2017
[17]

Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, and Yossi Adi. 2023. https://doi.org/10.18653/v1/2023.iwslt-1.46 Augmentation invariant discrete representation for generative spoken language modeling . In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 465--477

work page doi:10.18653/v1/2023.iwslt-1.46 2023
[18]

Mark Hallap, Emmanuel Dupoux, and Ewan Dunbar. 2023. https://doi.org/10.21437/Interspeech.2023-1862 Evaluating context-invariance in unsupervised speech representations . In INTERSPEECH 2023 , pages 2973--2977

work page doi:10.21437/interspeech.2023-1862 2023
[19]

Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, and 1 others. 2023. https://proceedings.neurips.cc/paper_files/paper/2023/hash/c859b99b5d717c9035e79d43dfd69435-Abstract-Conference.html Textually pretrained speech language models . Advances in Neural Informa...

work page 2023
[20]

Jui-Yang Hsu, Yuan-Jui Chen, and Hung-yi Lee. 2020. https://doi.org/10.1109/ICASSP40776.2020.9053112 Meta-learning for end-to-end low-resource speech recognition . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7844--7848

work page doi:10.1109/icassp40776.2020.9053112 2020
[21]

Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. https://doi.org/10.1109/TASLP.2021.3122291 HuBERT : Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units . IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451--3460

work page doi:10.1109/taslp.2021.3122291 2021
[22]

J. Kahn, M. Rivière, W. Zheng, E. Kharitonov, Q. Xu, P.E. Mazaré, J. Karadayi, V. Liptchinsky, R. Collobert, C. Fuegen, T. Likhomanenko, G. Synnaeve, A. Joulin, A. Mohamed, and E. Dupoux. 2020. https://doi.org/10.1109/ICASSP40776.2020.9052942 Libri-light: A benchmark for ASR with limited or no supervision . In ICASSP 2020 - 2020 IEEE International Confere...

work page doi:10.1109/icassp40776.2020.9052942 2020
[23]

Patricia K. Kuhl. 2004. https://api.semanticscholar.org/CorpusID:205500033 Early language acquisition: cracking the speech code . Nature Reviews Neuroscience, 5:831--843

work page 2004
[24]

Kushal Lakhotia, Eugene Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Abdelrahman Mohamed, and Emmanuel Dupoux. 2021. https://doi.org/10.1162/tacl_a_00430 On Generative Spoken Language Modeling from Raw Audio . Transactions of the Association for Computational Linguistics, 9:1336--1354

work page doi:10.1162/tacl_a_00430 2021
[25]

Michael McAuliffe, Matthew Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger. 2017. Montreal forced aligner: Trainable text-speech alignment using kaldi. In Proceedings of Interspeech 2017, pages 498--502

work page 2017
[26]

Nasrin Mostafazadeh, Michael Roth, Annie Louis, Nathanael Chambers, and James Allen. 2017. https://doi.org/10.18653/v1/W17-0906 LSDS em 2017 shared task: The story cloze test . In Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pages 46--51

work page doi:10.18653/v1/w17-0906 2017
[27]

Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, and Emmanuel Dupoux. 2020. https://arxiv.org/abs/2011.11588 The zero resource speech benchmark 2021: Metrics and baselines for unsupervised spoken language modeling . Preprint, arXiv:2011.11588

work page arXiv 2020
[28]

Alex Nichol, Joshua Achiam, and John Schulman. 2018. https://arxiv.org/abs/1803.02999 On first-order meta-learning algorithms . Preprint, arXiv:1803.02999

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Angelo Ortiz Tandazo, Manel Khentout, Youssef Benchekroun, Thomas Hueber, and Emmanuel Dupoux. 2025. https://arxiv.org/abs/2512.19612 MauBERT : Universal phonetic inductive biases for few-shot acoustic units discovery . Preprint, arXiv:2512.19612

work page arXiv 2025
[30]

Maxime Poli, Emmanuel Chemla, and Emmanuel Dupoux. 2024. https://doi.org/10.18653/v1/2024.emnlp-main.302 Improving spoken language modeling with phoneme classification: A simple fine-tuning approach . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5284--5292

work page doi:10.18653/v1/2024.emnlp-main.302 2024
[31]

Maxime Poli, Emmanuel Chemla, and Emmanuel Dupoux. 2025 a . fastabx: A library for efficient computation of abx discriminability. arXiv preprint arXiv:2505.02692

work page arXiv 2025
[32]

Maxime Poli, Manel Khentout, Angelo Ortiz Tandazo, Ewan Dunbar, and Emmanuel Dupoux. 2026. The phoneme discovery benchmark. Under review

work page 2026
[33]

Maxime Poli, Mahi Luthra, Youssef Benchekroun, Yosuke Higuchi, Martin Gleize, Jiayi Shen, Robin Algayres, Yu-An Chung, Mido Assran, Juan Pino, and Emmanuel Dupoux. 2025 b . https://arxiv.org/abs/2512.20308?brid=l6IIz4mf9dBF1qUY7Sv55w Spidr: Learning fast and stable linguistic units for spoken language models without supervision . Transactions on Machine L...

work page arXiv 2025
[34]

Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding, and Yue Zhao. 2024. Metaood: Automatic selection of ood detection models. arXiv preprint arXiv:2410.03074

work page arXiv 2024
[35]

Yun Qu, Cheems Wang, Yixiu Mao, Yiqin Lv, and Xiangyang Ji. 2025. Fast and robust: Task sampling with posterior and diversity synergies for adaptive decision-makers in randomized environments. In Forty-second International Conference on Machine Learning

work page 2025
[36]

Thomas Schatz. 2016. ABX-discriminability measures and applications. Ph.D. thesis, Universit \'e Paris 6 (UPMC)

work page 2016
[37]

Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, and Emmanuel Dupoux. 2021. https://aclanthology.org/2021.acl-long.80 V ox P opuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation . In Proceedings of the 59th Annual Meeting of ...

work page 2021
[38]

Qi Wang, Zehao Xiao, Yixiu Mao, Yun Qu, Jiayi Shen, Yiqin Lv, and Xiangyang Ji. 2025. Model predictive task sampling for efficient and robust adaptation. arXiv preprint arXiv:2501.11039

work page arXiv 2025
[39]

Werker and Richard C

Janet F. Werker and Richard C. Tees. 1984. https://api.semanticscholar.org/CorpusID:52905538 Cross-language speech perception: Evidence for perceptual reorganization during the first year of life . Infant Behavior & Development, 7:49--63

work page 1984
[40]

Lynne A. Werner. 2007. Infant hearing and perceptual development. In Robert F. Burkard, Manuel Don, and Jos J. Eggermont, editors, Auditory Evoked Potentials: Basic Principles and Clinical Application, pages 509--528. Lippincott Williams & Wilkins, Philadelphia, PA

work page 2007
[41]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. https://arxiv.org/abs/2205.01068 OPT : Open pre-trained transformer languag...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[42]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[43]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page