Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Ga\v{s}per Begu\v{s}; Thomas Lu; Zili Wang

arxiv: 2305.01626 · v4 · submitted 2023-05-02 · 💻 cs.CL · cs.AI· cs.SD· eess.AS

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Ga\v{s}per Begu\v{s} , Thomas Lu , Zili Wang This is my paper

Pith reviewed 2026-05-24 09:05 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.SDeess.AS

keywords spontaneous concatenationunsupervised speech modelssyntax from acousticscompositionalitydisinhibition mechanismraw audio inputgenerative adversarial networksword embedding in sequences

0 comments

The pith

Unsupervised convolutional networks trained only on single spoken words begin generating concatenated two- and three-word sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that ciwGAN and fiwGAN models, which rely on convolutional neural networks, when trained exclusively on acoustic recordings of isolated words, start producing outputs that concatenate two or even three words. This happens even though the training data contains no examples of multi-word speech. A sympathetic reader would care because the result models how the elementary syntactic operation of concatenation could arise directly from raw speech input in a fully unsupervised setting. The work further shows that models trained on pairs of words embed them into novel unobserved combinations and that the outputs contain early indicators of compositionality.

Core claim

Trained exclusively on single-word acoustic data, the models generate novel outputs consisting of two or three concatenated words; networks trained on two words produce embeddings into unobserved combinations; the concatenated outputs contain precursors to compositionality. The authors formalize a neural mechanism called disinhibition that outlines a possible pathway toward concatenation and compositionality in both artificial and biological systems.

What carries the argument

Spontaneous concatenation, the emergence of multi-word outputs from networks whose training data contained only isolated words.

If this is right

Basic syntactic operations such as concatenation can emerge from raw acoustic input without any supervised multi-word examples.
Networks learn to place words into novel combinations never observed during training.
Concatenated outputs already exhibit precursors to compositionality.
The disinhibition mechanism supplies a concrete artificial and biological pathway that can be used to generate testable predictions about spoken-language processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If spontaneous concatenation generalizes beyond the reported architectures, unsupervised exposure to single-word speech alone may suffice to bootstrap initial syntactic structure in both machines and biological systems.
The same disinhibition-style mechanism could be searched for in other generative models that operate on raw audio.
Neuroimaging or behavioral experiments could test whether disinhibition-like processes appear during human processing of concatenated speech.

Load-bearing premise

Acoustic inspection of the generated waveforms reliably distinguishes true word concatenations from synthesis artifacts or other training effects.

What would settle it

Quantitative acoustic comparison or blind listening tests that determine whether the generated waveforms match the spectral and temporal signatures of two separate words spoken in sequence versus blended or artifactual sounds.

Figures

Figures reproduced from arXiv: 2305.01626 by Ga\v{s}per Begu\v{s}, Thomas Lu, Zili Wang.

**Figure 2.** Figure 2: The suit year (left) output and the rag year (right) from the one-second one-word model. All spectrograms are created in Praat (Boersma and Weenink, 2015). left-aligned, the Generator never accesses the data directly and the Discriminator only sees single words. To show that this concatenation is not an idiosyncratic property of one model and that it is indeed the negative values that encode concatenated o… view at source ↗

**Figure 3.** Figure 3: The three-word concatentated output box under water. Independently, the second word (under ) is somewhat difficult to analyze, but given only five training words, it is clearly the closest output to under [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The average number of words generated by the two one-word two-second models are plotted as a function [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: (top) Predicted values of the logistic regression mixed effects model with the proportion of two-word outputs (one-word output = failure, two-word output or more = a success) as the dependent variable and sum of bits 1–5 as the predictor with 95% confidence limits (Experiment 2 in [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Counts of word types as transcribed by fine-tuned Whisper. (a) Counts of 7 most frequent outputs that [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: The frequency of observed suit/greasy pairs is plotted as a function of the latent code, for the fiwGAN two-word model. Samples were generated by varying each bit between the values [-13, -8, -3, -2, -1, 0, 1, 2, 3, 8, 13], taking all possible permutations (113 ). Each bitstring was tested with 10 sets of latent space values, and annotations were performed automatically using Whisper. 11 [PITH_FULL_IMAGE:… view at source ↗

**Figure 8.** Figure 8: The suit greasy water output from the two-second two-word model (top). The greasy greasy output from the two-second one-word model (bottom). words, suggesting that the model systematically learned this behavior. This is significant because repetition or reduplication is one of the most common processes in human language and language acquisition (Berent et al., 2016; Dolatian and Heinz, 2020). 3.5. Single b… view at source ↗

**Figure 9.** Figure 9: Waveform and spectrograms (0–5kHz) of three generated outputs with latent codes [0, -10, -10, 0, 0], [0, [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Waveforms and spectrograms (0-5 kHz) of five generated outputs with latent code variables as described [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: The structure of the Generator. Each variable in the latent space (5 code variables [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Averaged values of the Dense layer output before (Avg. Dense) and after ReLU (Avg. ReLU) for positive [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

read the original abstract

Computational models of syntax are predominantly text-based. Here we propose that the most basic first step in the evolution of syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary suboperations of syntax -- concatenation. We introduce \textit{spontaneous concatenation}: a phenomenon where a ciwGAN/fiwGAN models (based on convolutional neural networks) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the training data. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. We also show that the concatenated outputs contain precursors to compositionality. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution in the brain from raw acoustic inputs. We also propose and formalize a neural mechanism called \textit{disinhibition} that outlines a possible artificial and biological neural pathway towards concatenation and compositionality and suggests our modeling is useful for generating testable predictions for biological and artificial neural processing of spoken language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags an unreported spontaneous concatenation behavior in ciwGANs on single-word speech data, but the claim rests on unquantified listening tests that leave room for acoustic artifacts.

read the letter

The main thing here is that ciwGAN and fiwGAN models, trained only on isolated word recordings, appear to generate outputs that sound like two or three words run together. The authors say this happens across several independent runs and that the training data had no multi-word examples. They also note some precursors to compositionality and sketch a disinhibition mechanism. That observation is new for these specific architectures on raw audio, and the replication across hyperparameter sets is a positive step. The work sits at the intersection of unsupervised speech modeling and ideas about how syntax might emerge, which is a reasonable angle to explore. What the paper does cleanly is lay out the training setup and the basic claim without overclaiming prior results in the cited GAN literature. The soft spots are more substantial. The abstract and description give no numbers on how often the concatenations occur, no forced-alignment scores, no ASR transcription accuracy, and no controls that rule out vocoder blending or duration artifacts that CNN generators can produce. The training-data purity claim is stated but not backed by any check for session-level co-articulation or latent leakage. Without those checks the central empirical result stays qualitative and harder to evaluate. The proposed neural mechanism is presented as a formalization but does not appear to be derived from the model equations or tested against the outputs. This is the kind of paper that could interest people working on data-efficient speech models or computational approaches to language evolution, provided the listening results hold up under quantitative scrutiny. It is coherent on its own terms and engages the relevant literature without obvious internal contradictions, so it deserves a serious referee to check the full data and any additional controls the authors may have run.

Referee Report

3 major / 1 minor

Summary. The paper claims that ciwGAN/fiwGAN models (CNN-based GANs) trained unsupervised solely on acoustic recordings of individual words spontaneously generate outputs containing two or three concatenated words, without any exposure to multi-word training data. This phenomenon is replicated across independently trained models with varying hyperparameters; models trained on pairs of words further embed them into novel unobserved combinations, and the outputs exhibit precursors to compositionality. The authors propose and formalize a 'disinhibition' neural mechanism as a possible pathway to concatenation and compositionality, with implications for modeling syntax evolution from raw speech.

Significance. If the central observation is quantitatively verified, the result would constitute a novel, previously unreported property of these architectures when applied to raw speech, providing a computational model of basic syntactic concatenation arising from unsupervised learning. The replication across models and the proposed disinhibition mechanism could generate testable predictions for both artificial and biological neural processing of spoken language.

major comments (3)

[Abstract] Abstract: The central claim that generated waveforms contain identifiable sequences of two or three distinct words rests on qualitative acoustic inspection. No quantitative validation (forced alignment, ASR transcription accuracy, phonetic boundary metrics, or controls for spectral leakage/vocoder artifacts) is reported to establish that these are true lexical concatenations rather than duration extensions or spurious formant transitions produced by the convolutional generator.
[Abstract] Abstract: The claim that training data consisted strictly of single-word recordings is presented without verification against possible co-articulation, recording-session structure, or latent cues that could introduce implicit multi-word information, undermining the assertion that concatenation arises purely from single-word exposure.
[Abstract] Abstract: While replication across several models is stated, the absence of error analysis, prevalence statistics, or artifact controls in the reported findings leaves the load-bearing observation without the quantitative grounding needed to support the syntactic-evolution interpretation.

minor comments (1)

The formalization of the 'disinhibition' mechanism would benefit from explicit comparison to existing neural concepts (e.g., gating or disinhibitory circuits in the literature) to clarify its distinct contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which identify key areas where additional quantitative support would strengthen the manuscript. We address each point below and commit to revisions that provide the requested validation without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that generated waveforms contain identifiable sequences of two or three distinct words rests on qualitative acoustic inspection. No quantitative validation (forced alignment, ASR transcription accuracy, phonetic boundary metrics, or controls for spectral leakage/vocoder artifacts) is reported to establish that these are true lexical concatenations rather than duration extensions or spurious formant transitions produced by the convolutional generator.

Authors: We agree that the current presentation relies primarily on qualitative inspection and that quantitative metrics are needed to rule out artifacts. In the revised manuscript we will report ASR transcription accuracy on the generated outputs, forced-alignment boundary metrics, and controls comparing concatenated generations against single-word baselines and vocoder-reconstructed artifacts. These additions will be placed in a new results subsection with accompanying statistics. revision: yes
Referee: [Abstract] Abstract: The claim that training data consisted strictly of single-word recordings is presented without verification against possible co-articulation, recording-session structure, or latent cues that could introduce implicit multi-word information, undermining the assertion that concatenation arises purely from single-word exposure.

Authors: The datasets used are standard single-word corpora (e.g., isolated-word subsets of TIMIT-style recordings) where each file is documented as containing one word. To address the concern, the revision will include an explicit verification subsection that inspects file-level metadata, checks for session-level co-articulation via waveform inspection, and reports controls confirming no multi-word cues are present in the training distribution. revision: yes
Referee: [Abstract] Abstract: While replication across several models is stated, the absence of error analysis, prevalence statistics, or artifact controls in the reported findings leaves the load-bearing observation without the quantitative grounding needed to support the syntactic-evolution interpretation.

Authors: We acknowledge the need for prevalence statistics and error analysis. The revised version will report the proportion of generated samples exhibiting clear two- and three-word concatenations across all trained models, include failure-case analysis, and add artifact-control experiments. These quantitative results will directly support the interpretation while preserving the replication already performed. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct model training and output inspection

full rationale

The paper reports an empirical observation: ciwGAN/fiwGAN CNNs trained exclusively on single-word acoustic recordings generate outputs containing two or three concatenated words. This finding is replicated across independent runs with varying hyperparameters and data. No derivation chain, equations, or fitted parameters are presented that reduce the claimed spontaneous concatenation to the training inputs by construction. The proposed 'disinhibition' mechanism is introduced as a formalization of a possible pathway, not as a load-bearing mathematical step whose validity depends on self-citation or redefinition. The central result is therefore self-contained against external benchmarks (model outputs) and receives no circularity flags.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the assumption that training data contain only isolated words and that generated outputs can be unambiguously interpreted as word concatenations; the disinhibition mechanism is introduced without independent evidence.

axioms (1)

domain assumption Training corpora consist exclusively of isolated single-word recordings with no multi-word sequences present.
Stated directly in the abstract as the condition under which concatenation emerges.

invented entities (1)

disinhibition neural mechanism no independent evidence
purpose: To provide a possible artificial and biological pathway that produces concatenation and compositionality from single-word inputs.
Proposed and formalized in the paper; no independent falsifiable evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5777 in / 1315 out tokens · 24455 ms · 2026-05-24T09:05:15.160718+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

work page
[3]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

work page
[4]

, year 2019

author Andreas, J. , year 2019 . title Measuring compositionality in representation learning , in: booktitle International Conference on Learning Representations . https://openreview.net/forum?id=HJz05o0qK7

work page 2019
[5]

, author Chintala, S

author Arjovsky, M. , author Chintala, S. , author Bottou, L. , year 2017 . title W asserstein generative adversarial networks , in: editor Precup, D. , editor Teh, Y.W. (Eds.), booktitle Proceedings of the 34th International Conference on Machine Learning , publisher PMLR , address International Convention Centre, Sydney, Australia . pp. pages 214--223 ....

work page 2017
[6]

, author Marchman, V

author Bates, E. , author Marchman, V. , author Thal, D. , author Fenson, L. , author Dale, P. , author Reznick, J.S. , author Reilly, J. , author Hartung, J. , year 1994 . title Developmental and stylistic variation in the composition of early vocabulary . journal Journal of Child Language volume 21 , pages 85–123 . :10.1017/S0305000900008680

work page doi:10.1017/s0305000900008680 1994
[7]

, author Zhou, A

author Begu s , G. , author Zhou, A. , author Zhao, T.C. , year 2023 . title Encoding of speech in convolutional layers and the brain stem based on language experience . journal Scientific Reports volume 13 , pages 6480 . https://doi.org/10.1038/s41598-023-33384-9, :10.1038/s41598-023-33384-9

work page doi:10.1038/s41598-023-33384-9 2023
[8]

, year 2020

author Begu s , G. , year 2020 . title Generative adversarial phonology: Modeling unsupervised phonetic and phonological learning with neural networks . journal Frontiers in Artificial Intelligence volume 3 , pages 44 . https://www.frontiersin.org/article/10.3389/frai.2020.00044, :10.3389/frai.2020.00044

work page doi:10.3389/frai.2020.00044 2020
[9]

, year 2021 a

author Begu s , G. , year 2021 a. title CiwGAN and fiwGAN : Encoding information in acoustic data to model lexical learning with generative adversarial networks . journal Neural Networks volume 139 , pages 305--325 . https://www.sciencedirect.com/science/article/pii/S0893608021001052, :https://doi.org/10.1016/j.neunet.2021.03.017

work page doi:10.1016/j.neunet.2021.03.017 2021
[10]

, year 2021 b

author Begu s , G. , year 2021 b. title Identity-based patterns in deep convolutional networks: Generative adversarial phonology and reduplication . journal Transactions of the Association for Computational Linguistics volume 9 , pages 1180--1196 . https://aclanthology.org/2021.tacl-1.70, :10.1162/tacl_a_00421

work page doi:10.1162/tacl_a_00421 2021
[11]

, author Zhou, A

author Begu s , G. , author Zhou, A. , year 2022 a. title Interpreting intermediate convolutional layers in unsupervised acoustic word classification , in: booktitle ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. pages 8207--8211 . :10.1109/ICASSP43922.2022.9746849

work page doi:10.1109/icassp43922.2022.9746849 2022
[12]

, author Zhou, A

author Begu s , G. , author Zhou, A. , year 2022 b. title Interpreting intermediate convolutional layers of generative CNN s trained on waveforms . journal IEEE/ACM Transactions on Audio, Speech, and Language Processing volume 30 , pages 3214--3229 . :10.1109/TASLP.2022.3209938

work page doi:10.1109/taslp.2022.3209938 2022
[13]

, author Zhou, A

author Begu s , G. , author Zhou, A. , year 2022 c. title Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data , in: booktitle Proc. Interspeech 2022 , pp. pages 5298--5302 . :10.21437/Interspeech.2022-11219

work page doi:10.21437/interspeech.2022-11219 2022
[14]

, year 2022

author Beguš, G. , year 2022 . title Local and non-local dependency learning and emergence of rule-like representations in speech data by deep convolutional generative adversarial networks . journal Computer Speech & Language volume 71 , pages 101244 . https://www.sciencedirect.com/science/article/pii/S0885230821000516, :https://doi.org/10.1016/j.csl.2021.101244

work page doi:10.1016/j.csl.2021.101244 2022
[15]

, author Bat-El, O

author Berent, I. , author Bat-El, O. , author Brentari, D. , author Dupuis, A. , author Vaknin-Nusbaum, V. , year 2016 . title The double identity of linguistic doubling . journal Proceedings of the National Academy of Sciences volume 113 , pages 13702--13707 . https://www.pnas.org/doi/abs/10.1073/pnas.1613749113, :10.1073/pnas.1613749113, http://arxiv.o...

work page doi:10.1073/pnas.1613749113 2016
[16]

, author Lillo-Martin, D

author Berk, S. , author Lillo-Martin, D. , year 2012 . title The two-word stage: Motivated by linguistic or cognitive constraints? journal Cognitive Psychology volume 65 , pages 118--140 . https://www.sciencedirect.com/science/article/pii/S001002851200014X, :https://doi.org/10.1016/j.cogpsych.2012.02.002

work page doi:10.1016/j.cogpsych.2012.02.002 2012
[17]

, author Chomsky, N

author Berwick, R.C. , author Chomsky, N. , year 2019 . title All or nothing: No half-merge and the evolution of syntax . journal PLOS Biology volume 17 , pages 1--5 . https://doi.org/10.1371/journal.pbio.3000539, :10.1371/journal.pbio.3000539

work page doi:10.1371/journal.pbio.3000539 2019
[18]

, author Okanoya, K

author Berwick, R.C. , author Okanoya, K. , author Beckers, G.J. , author Bolhuis, J.J. , year 2011 . title Songs to syntax: the linguistics of birdsong . journal Trends in Cognitive Sciences volume 15 , pages 113--121 . https://www.sciencedirect.com/science/article/pii/S1364661311000039, :https://doi.org/10.1016/j.tics.2011.01.002

work page doi:10.1016/j.tics.2011.01.002 2011
[19]

, author Weenink, D

author Boersma, P. , author Weenink, D. , year 2015 . title Praat: doing phonetics by computer [computer program]. version 5.4.06. howpublished Retrieved 21 February 2015 from http://www.praat.org/

work page 2015
[20]

, author Kharitonov, E

author Chaabouni, R. , author Kharitonov, E. , author Bouchacourt, D. , author Dupoux, E. , author Baroni, M. , year 2020 . title Compositionality and generalization in emergent languages , in: editor Jurafsky, D. , editor Chai, J. , editor Schluter, N. , editor Tetreault, J. (Eds.), booktitle Proceedings of the 58th Annual Meeting of the Association for ...

work page doi:10.18653/v1/2020.acl-main.407 2020
[21]

, author Duan, Y

author Chen, X. , author Duan, Y. , author Houthooft, R. , author Schulman, J. , author Sutskever, I. , author Abbeel, P. , year 2016 . title Infogan: Interpretable representation learning by information maximizing generative adversarial nets , in: editor Lee, D.D. , editor Sugiyama, M. , editor Luxburg, U.V. , editor Guyon, I. , editor Garnett, R. (Eds.)...

work page 2016
[22]

, year 2014

author Chomsky, N. , year 2014 . title The Minimalist Program . publisher The MIT Press . https://doi.org/10.7551/mitpress/9780262527347.001.0001, :10.7551/mitpress/9780262527347.001.0001

work page doi:10.7551/mitpress/9780262527347.001.0001 2014
[23]

, year 2020

author Chomsky, N. , year 2020 . title The ucla lectures . https://ling.auf.net/lingbuzz/005485. note unpublished; reference: lingbuzz/005485

work page 2020
[24]

, author Dale, R

author Christiansen, M.H. , author Dale, R. , year 2003 . title Language evolution and change , in: editor Arbib, M.A. (Ed.), booktitle The Handbook of Brain Theory and Neural Networks . edition 2nd ed.. publisher MIT Press , address Cambridge, MA and London , pp. pages 604--606

work page 2003
[25]

, author Kirby, S

author Christiansen, M.H. , author Kirby, S. , year 2003 . title Language evolution: consensus and controversies . journal Trends in Cognitive Sciences volume 7 , pages 300--307 . https://www.sciencedirect.com/science/article/pii/S1364661303001360, :https://doi.org/10.1016/S1364-6613(03)00136-0

work page doi:10.1016/s1364-6613(03)00136-0 2003
[26]

, year 1977

author Clark, R. , year 1977 . title What’s the use of imitation? journal Journal of Child Language volume 4 , pages 341–358 . :10.1017/S0305000900001732

work page doi:10.1017/s0305000900001732 1977
[27]

, author Tomasello, M

author Diessel, H. , author Tomasello, M. , year 2005 . title A new look at the acquisition of relative clauses . journal Language volume 81 , pages 882--906 . https://doi.org/10.1353/lan.2005.0169, :10.1353/lan.2005.0169

work page doi:10.1353/lan.2005.0169 2005
[28]

, author Heinz, J

author Dolatian, H. , author Heinz, J. , year 2020 . title Computing and classifying reduplication with 2-way finite-state transducers . journal Journal of Language Modelling volume 8 , pages 179--250 . https://jlm.ipipan.waw.pl/index.php/JLM/article/view/245, :10.15398/jlm.v8i1.245

work page doi:10.15398/jlm.v8i1.245 2020
[29]

, author McAuley, J.J

author Donahue, C. , author McAuley, J.J. , author Puckette, M.S. , year 2019 . title Adversarial audio synthesis , in: booktitle 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 , publisher OpenReview.net . pp. pages 1--16 . https://openreview.net/forum?id=ByMVTsR5KQ

work page 2019
[30]

, year 2000

author Fitch, W. , year 2000 . title The evolution of speech: a comparative review . journal Trends in Cognitive Sciences volume 4 , pages 258--267 . https://www.sciencedirect.com/science/article/pii/S1364661300014947, :https://doi.org/10.1016/S1364-6613(00)01494-7

work page doi:10.1016/s1364-6613(00)01494-7 2000
[31]

, year 2003

author Fox, J. , year 2003 . title Effect displays in R for generalised linear models . journal Journal of Statistical Software volume 8 , pages 1--27 . :10.18637/jss.v008.i15

work page doi:10.18637/jss.v008.i15 2003
[32]

, author Weisberg, S

author Fox, J. , author Weisberg, S. , year 2019 . title An R Companion to Applied Regression . edition 3rd ed., publisher Sage , address Thousand Oaks CA . https://socialsciences.mcmaster.ca/jfox/Books/Companion/index.html

work page 2019
[33]

, author Lamel, L

author Garofolo, J.S. , author Lamel, L. , author M Fisher, W. , author Fiscus, J. , author S. Pallett, D. , author L. Dahlgren, N. , author Zue, V. , year 1993 . title TIMIT acoustic-phonetic continuous speech corpus . journal Linguistic Data Consortium

work page 1993
[34]

, author Poeppel, D

author Giraud, A.L. , author Poeppel, D. , year 2012 . title Cortical oscillations and speech processing: emerging computational principles and operations . journal Nature Neuroscience volume 15 , pages 511--517 . https://doi.org/10.1038/nn.3063, :10.1038/nn.3063

work page doi:10.1038/nn.3063 2012
[35]

, author Kalish, M.L

author Griffiths, T.L. , author Kalish, M.L. , year 2007 . title Language evolution by iterated learning with bayesian agents . journal Cognitive Science volume 31 , pages 441--480 . https://onlinelibrary.wiley.com/doi/abs/10.1080/15326900701326576, :https://doi.org/10.1080/15326900701326576, http://arxiv.org/abs/https://onlinelibrary.wiley.com/doi/pdf/10...

work page doi:10.1080/15326900701326576 2007
[36]

, author Eckstein, K

author Hahne, A. , author Eckstein, K. , author Friederici, A.D. , year 2004 . title Brain Signatures of Syntactic and Semantic Processes during Children's Language Development . journal Journal of Cognitive Neuroscience volume 16 , pages 1302--1318 . https://doi.org/10.1162/0898929041920504, :10.1162/0898929041920504

work page doi:10.1162/0898929041920504 2004
[37]

, year 1999

author Jackendoff, R. , year 1999 . title Possible stages in the evolution of the language capacity . journal Trends in Cognitive Sciences volume 3 , pages 272--279 . https://www.sciencedirect.com/science/article/pii/S1364661399013339, :https://doi.org/10.1016/S1364-6613(99)01333-9

work page doi:10.1016/s1364-6613(99)01333-9 1999
[38]

, year 2000

author Kirby, S. , year 2000 . title Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners , in: editor Knight, C. , editor Studdert-Kennedy, M. , editor Hurford, J. (Eds.), booktitle The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form . publisher Cambridge Univers...

work page 2000
[39]

, year 2001

author Kirby, S. , year 2001 . title Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity . journal IEEE Transactions on Evolutionary Computation volume 5 , pages 102--110 . :10.1109/4235.918430

work page doi:10.1109/4235.918430 2001
[40]

, author Meltzoff, A.N

author Kuhl, P.K. , author Meltzoff, A.N. , year 1996 . title Infant vocalizations in response to speech: Vocal imitation and developmental change . journal The Journal of the Acoustical Society of America volume 100 , pages 2425--2438 . https://doi.org/10.1121/1.417951, :10.1121/1.417951

work page doi:10.1121/1.417951 1996
[41]

, author Shi, F

author Lai, C.I. , author Shi, F. , author Peng, P. , author Kim, Y. , author Gimpel, K. , author Chang, S. , author Chuang, Y.S. , author Bhati, S. , author Cox, D.D. , author Harwath, D. , author Zhang, Y. , author Livescu, K. , author Glass, J.R. , year 2023 . title Textless phrase structure induction from visually-grounded speech . https://openreview....

work page 2023
[42]

, author Kharitonov, E

author Lakhotia, K. , author Kharitonov, E. , author Hsu, W.N. , author Adi, Y. , author Polyak, A. , author Bolte, B. , author Nguyen, T.A. , author Copet, J. , author Baevski, A. , author Mohamed, A. , author Dupoux, E. , year 2021 . title On generative spoken language modeling from raw audio . journal Transactions of the Association for Computational L...

work page doi:10.1162/tacl_a_00430 2021
[43]

, year 2022

author Lin, R. , year 2022 . title Analysis on the selection of the appropriate batch size in cnn neural network , in: booktitle 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE) , pp. pages 106--109 . :10.1109/MLKE55170.2022.00026

work page doi:10.1109/mlke55170.2022.00026 2022
[44]

, author Luuk, H

author Luuk, E. , author Luuk, H. , year 2014 . title The evolution of syntax: Signs, concatenation and embedding . journal Cognitive Systems Research volume 27 , pages 1--10 . https://www.sciencedirect.com/science/article/pii/S1389041713000028, :https://doi.org/10.1016/j.cogsys.2013.01.001

work page doi:10.1016/j.cogsys.2013.01.001 2014
[45]

, author Chomsky, N

author Marcolli, M. , author Chomsky, N. , author Berwick, R. , year 2023 . title Mathematical structure of syntactic merge . journal arXiv https://arxiv.org/abs/2305.18278, http://arxiv.org/abs/2305.18278 arXiv:2305.18278

work page arXiv 2023
[46]

, author Rodemaker, J.E

author Masur, E.F. , author Rodemaker, J.E. , year 1999 . title Mothers' and infants' spontaneous vocal, verbal, and action imitation during the second year . journal Merrill-Palmer Quarterly volume 45 , pages 392 -- 412 . https://www.scopus.com/inward/record.uri?eid=2-s2.0-0033235026&partnerID=40&md5=6d43c615c751588167621ec9057eca48. note cited by: 41

work page 1999
[47]

, author Komarova, N.L

author Nowak, M.A. , author Komarova, N.L. , year 2001 . title Towards an evolutionary theory of language . journal Trends in Cognitive Sciences volume 5 , pages 288--295 . https://www.sciencedirect.com/science/article/pii/S1364661300016831, :https://doi.org/10.1016/S1364-6613(00)01683-1

work page doi:10.1016/s1364-6613(00)01683-1 2001
[48]

, author Krakauer, D.C

author Nowak, M.A. , author Krakauer, D.C. , year 1999 . title The evolution of language . journal Proceedings of the National Academy of Sciences volume 96 , pages 8028--8033 . https://www.pnas.org/doi/abs/10.1073/pnas.96.14.8028, :10.1073/pnas.96.14.8028, http://arxiv.org/abs/https://www.pnas.org/doi/pdf/10.1073/pnas.96.14.8028 arXiv:https://www.pnas.or...

work page doi:10.1073/pnas.96.14.8028 1999
[49]

, year 2013

author Okanoya, K. , year 2013 . title Finite-State Song Syntax in Bengalese Finches: Sensorimotor Evidence, Developmental Processes, and Formal Procedures for Syntax Extraction , in: booktitle Birdsong, Speech, and Language: Exploring the Evolution of Mind and Brain . publisher The MIT Press . https://doi.org/10.7551/mitpress/9322.003.0016, :10.7551/mitp...

work page doi:10.7551/mitpress/9322.003.0016 2013
[50]

, author Smith, K

author O’Grady, C. , author Smith, K. , year 2018 . title 899Models of Language Evolution , in: booktitle The Oxford Handbook of Psycholinguistics . publisher Oxford University Press , p. pages 899–914 . https://doi.org/10.1093/oxfordhb/9780198786825.013.38, :10.1093/oxfordhb/9780198786825.013.38

work page doi:10.1093/oxfordhb/9780198786825.013.38 2018
[51]

, year 2007

author Patel, A.D. , year 2007 . title Syntax , in: booktitle Music, Language, and the Brain . publisher Oxford University Press . :10.1093/acprof:oso/9780195123753.003.0005

work page doi:10.1093/acprof:oso/9780195123753.003.0005 2007
[52]

, author Navarro, D.J

author Perfors, A. , author Navarro, D.J. , year 2014 . title Language evolution can be shaped by the structure of the world . journal Cognitive Science volume 38 , pages 775--793 . https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.12102, :https://doi.org/10.1111/cogs.12102, http://arxiv.org/abs/https://onlinelibrary.wiley.com/doi/pdf/10.1111/cogs.1210...

work page doi:10.1111/cogs.12102 2014
[53]

, author Xue, M

author Pfeffer, C.K. , author Xue, M. , author He, M. , author Huang, Z.J. , author Scanziani, M. , year 2013 . title Inhibition of inhibition in visual cortex: the logic of connections between molecularly distinct interneurons . journal Nature Neuroscience volume 16 , pages 1068--1076 . https://doi.org/10.1038/nn.3446, :10.1038/nn.3446

work page doi:10.1038/nn.3446 2013
[54]

, author Hangya, B

author Pi, H.J. , author Hangya, B. , author Kvitsiani, D. , author Sanders, J.I. , author Huang, Z.J. , author Kepecs, A. , year 2013 . title Cortical interneurons that specialize in disinhibitory control . journal Nature volume 503 , pages 521--524 . https://doi.org/10.1038/nature12676, :10.1038/nature12676

work page doi:10.1038/nature12676 2013
[55]

, year 2015

author Progovac, L. , year 2015 . title Evolutionary syntax . Oxford studies in the evolution of language ; 20. edition first edition ed., publisher Oxford University Press , address Oxford, United Kingdom ; New York, NY

work page 2015
[56]

Robust Speech Recognition via Large-Scale Weak Supervision

author Radford, A. , author Kim, J.W. , author Xu, T. , author Brockman, G. , author McLeavey, C. , author Sutskever, I. , year 2022 . title Robust speech recognition via large-scale weak supervision . http://arxiv.org/abs/2212.04356 arXiv:2212.04356

work page internal anchor Pith review Pith/arXiv arXiv 2022
[57]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

author Radford, A. , author Metz, L. , author Chintala, S. , year 2015 . title Unsupervised representation learning with deep convolutional generative adversarial networks . journal arXiv preprint arXiv:1511.06434

work page internal anchor Pith review Pith/arXiv arXiv 2015
[58]

, author R \"a s \"a nen, O

author Rasilo, H. , author R \"a s \"a nen, O. , year 2017 . title An online model for vowel imitation learning . journal Speech Communication volume 86 , pages 1--23 . https://www.sciencedirect.com/science/article/pii/S0167639315300728, :https://doi.org/10.1016/j.specom.2016.10.010

work page doi:10.1016/j.specom.2016.10.010 2017
[59]

, author Michel, P

author Rita, M. , author Michel, P. , author Chaabouni, R. , author Pietquin, O. , author Dupoux, E. , author Strub, F. , year 2024 . title Language evolution with deep learning . journal arXiv https://arxiv.org/abs/2403.11958, http://arxiv.org/abs/2403.11958 arXiv:2403.11958

work page arXiv 2024
[60]

, author Tallec, C

author Rita, M. , author Tallec, C. , author Michel, P. , author Grill, J.B. , author Pietquin, O. , author Dupoux, E. , author Strub, F. , year 2022 . title Emergent communication: Generalization and overfitting in lewis games , in: editor Oh, A.H. , editor Agarwal, A. , editor Belgrave, D. , editor Cho, K. (Eds.), booktitle Advances in Neural Informatio...

work page 2022
[61]

, author Shah, J

author Singla, Y.K. , author Shah, J. , author Chen, C. , author Shah, R.R. , year 2022 . title What do audio transformers hear? probing their representations for language delivery & structure , in: booktitle 2022 IEEE International Conference on Data Mining Workshops (ICDMW) , pp. pages 910--925 . :10.1109/ICDMW58026.2022.00120

work page doi:10.1109/icdmw58026.2022.00120 2022
[62]

, author Brauer, J

author Skeide, M.A. , author Brauer, J. , author Friederici, A.D. , year 2014 . title Syntax gradually segregates from semantics in the developing brain . journal NeuroImage volume 100 , pages 106--111 . https://www.sciencedirect.com/science/article/pii/S1053811914004650, :https://doi.org/10.1016/j.neuroimage.2014.05.080

work page doi:10.1016/j.neuroimage.2014.05.080 2014
[63]

, year 2024

author Youngblood, M. , year 2024 . title Language-like efficiency and structure in house finch song . journal Proceedings of the Royal Society B: Biological Sciences volume 291 , pages 20240250 . https://royalsocietypublishing.org/doi/abs/10.1098/rspb.2024.0250, :10.1098/rspb.2024.0250, http://arxiv.org/abs/https://royalsocietypublishing.org/doi/pdf/10.1...

work page doi:10.1098/rspb.2024.0250 2024
[64]

, author de Boer , B

author Zuidema, W. , author de Boer , B. , year 2009 . title The evolution of combinatorial phonology . journal Journal of Phonetics volume 37 , pages 125--144 . https://www.sciencedirect.com/science/article/pii/S0095447008000624, :https://doi.org/10.1016/j.wocn.2008.10.003

work page doi:10.1016/j.wocn.2008.10.003 2009

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

work page

[3] [3]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

work page

[4] [4]

, year 2019

author Andreas, J. , year 2019 . title Measuring compositionality in representation learning , in: booktitle International Conference on Learning Representations . https://openreview.net/forum?id=HJz05o0qK7

work page 2019

[5] [5]

, author Chintala, S

author Arjovsky, M. , author Chintala, S. , author Bottou, L. , year 2017 . title W asserstein generative adversarial networks , in: editor Precup, D. , editor Teh, Y.W. (Eds.), booktitle Proceedings of the 34th International Conference on Machine Learning , publisher PMLR , address International Convention Centre, Sydney, Australia . pp. pages 214--223 ....

work page 2017

[6] [6]

, author Marchman, V

author Bates, E. , author Marchman, V. , author Thal, D. , author Fenson, L. , author Dale, P. , author Reznick, J.S. , author Reilly, J. , author Hartung, J. , year 1994 . title Developmental and stylistic variation in the composition of early vocabulary . journal Journal of Child Language volume 21 , pages 85–123 . :10.1017/S0305000900008680

work page doi:10.1017/s0305000900008680 1994

[7] [7]

, author Zhou, A

author Begu s , G. , author Zhou, A. , author Zhao, T.C. , year 2023 . title Encoding of speech in convolutional layers and the brain stem based on language experience . journal Scientific Reports volume 13 , pages 6480 . https://doi.org/10.1038/s41598-023-33384-9, :10.1038/s41598-023-33384-9

work page doi:10.1038/s41598-023-33384-9 2023

[8] [8]

, year 2020

author Begu s , G. , year 2020 . title Generative adversarial phonology: Modeling unsupervised phonetic and phonological learning with neural networks . journal Frontiers in Artificial Intelligence volume 3 , pages 44 . https://www.frontiersin.org/article/10.3389/frai.2020.00044, :10.3389/frai.2020.00044

work page doi:10.3389/frai.2020.00044 2020

[9] [9]

, year 2021 a

author Begu s , G. , year 2021 a. title CiwGAN and fiwGAN : Encoding information in acoustic data to model lexical learning with generative adversarial networks . journal Neural Networks volume 139 , pages 305--325 . https://www.sciencedirect.com/science/article/pii/S0893608021001052, :https://doi.org/10.1016/j.neunet.2021.03.017

work page doi:10.1016/j.neunet.2021.03.017 2021

[10] [10]

, year 2021 b

author Begu s , G. , year 2021 b. title Identity-based patterns in deep convolutional networks: Generative adversarial phonology and reduplication . journal Transactions of the Association for Computational Linguistics volume 9 , pages 1180--1196 . https://aclanthology.org/2021.tacl-1.70, :10.1162/tacl_a_00421

work page doi:10.1162/tacl_a_00421 2021

[11] [11]

, author Zhou, A

author Begu s , G. , author Zhou, A. , year 2022 a. title Interpreting intermediate convolutional layers in unsupervised acoustic word classification , in: booktitle ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. pages 8207--8211 . :10.1109/ICASSP43922.2022.9746849

work page doi:10.1109/icassp43922.2022.9746849 2022

[12] [12]

, author Zhou, A

author Begu s , G. , author Zhou, A. , year 2022 b. title Interpreting intermediate convolutional layers of generative CNN s trained on waveforms . journal IEEE/ACM Transactions on Audio, Speech, and Language Processing volume 30 , pages 3214--3229 . :10.1109/TASLP.2022.3209938

work page doi:10.1109/taslp.2022.3209938 2022

[13] [13]

, author Zhou, A

author Begu s , G. , author Zhou, A. , year 2022 c. title Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data , in: booktitle Proc. Interspeech 2022 , pp. pages 5298--5302 . :10.21437/Interspeech.2022-11219

work page doi:10.21437/interspeech.2022-11219 2022

[14] [14]

, year 2022

author Beguš, G. , year 2022 . title Local and non-local dependency learning and emergence of rule-like representations in speech data by deep convolutional generative adversarial networks . journal Computer Speech & Language volume 71 , pages 101244 . https://www.sciencedirect.com/science/article/pii/S0885230821000516, :https://doi.org/10.1016/j.csl.2021.101244

work page doi:10.1016/j.csl.2021.101244 2022

[15] [15]

, author Bat-El, O

author Berent, I. , author Bat-El, O. , author Brentari, D. , author Dupuis, A. , author Vaknin-Nusbaum, V. , year 2016 . title The double identity of linguistic doubling . journal Proceedings of the National Academy of Sciences volume 113 , pages 13702--13707 . https://www.pnas.org/doi/abs/10.1073/pnas.1613749113, :10.1073/pnas.1613749113, http://arxiv.o...

work page doi:10.1073/pnas.1613749113 2016

[16] [16]

, author Lillo-Martin, D

author Berk, S. , author Lillo-Martin, D. , year 2012 . title The two-word stage: Motivated by linguistic or cognitive constraints? journal Cognitive Psychology volume 65 , pages 118--140 . https://www.sciencedirect.com/science/article/pii/S001002851200014X, :https://doi.org/10.1016/j.cogpsych.2012.02.002

work page doi:10.1016/j.cogpsych.2012.02.002 2012

[17] [17]

, author Chomsky, N

author Berwick, R.C. , author Chomsky, N. , year 2019 . title All or nothing: No half-merge and the evolution of syntax . journal PLOS Biology volume 17 , pages 1--5 . https://doi.org/10.1371/journal.pbio.3000539, :10.1371/journal.pbio.3000539

work page doi:10.1371/journal.pbio.3000539 2019

[18] [18]

, author Okanoya, K

author Berwick, R.C. , author Okanoya, K. , author Beckers, G.J. , author Bolhuis, J.J. , year 2011 . title Songs to syntax: the linguistics of birdsong . journal Trends in Cognitive Sciences volume 15 , pages 113--121 . https://www.sciencedirect.com/science/article/pii/S1364661311000039, :https://doi.org/10.1016/j.tics.2011.01.002

work page doi:10.1016/j.tics.2011.01.002 2011

[19] [19]

, author Weenink, D

author Boersma, P. , author Weenink, D. , year 2015 . title Praat: doing phonetics by computer [computer program]. version 5.4.06. howpublished Retrieved 21 February 2015 from http://www.praat.org/

work page 2015

[20] [20]

, author Kharitonov, E

author Chaabouni, R. , author Kharitonov, E. , author Bouchacourt, D. , author Dupoux, E. , author Baroni, M. , year 2020 . title Compositionality and generalization in emergent languages , in: editor Jurafsky, D. , editor Chai, J. , editor Schluter, N. , editor Tetreault, J. (Eds.), booktitle Proceedings of the 58th Annual Meeting of the Association for ...

work page doi:10.18653/v1/2020.acl-main.407 2020

[21] [21]

, author Duan, Y

author Chen, X. , author Duan, Y. , author Houthooft, R. , author Schulman, J. , author Sutskever, I. , author Abbeel, P. , year 2016 . title Infogan: Interpretable representation learning by information maximizing generative adversarial nets , in: editor Lee, D.D. , editor Sugiyama, M. , editor Luxburg, U.V. , editor Guyon, I. , editor Garnett, R. (Eds.)...

work page 2016

[22] [22]

, year 2014

author Chomsky, N. , year 2014 . title The Minimalist Program . publisher The MIT Press . https://doi.org/10.7551/mitpress/9780262527347.001.0001, :10.7551/mitpress/9780262527347.001.0001

work page doi:10.7551/mitpress/9780262527347.001.0001 2014

[23] [23]

, year 2020

author Chomsky, N. , year 2020 . title The ucla lectures . https://ling.auf.net/lingbuzz/005485. note unpublished; reference: lingbuzz/005485

work page 2020

[24] [24]

, author Dale, R

author Christiansen, M.H. , author Dale, R. , year 2003 . title Language evolution and change , in: editor Arbib, M.A. (Ed.), booktitle The Handbook of Brain Theory and Neural Networks . edition 2nd ed.. publisher MIT Press , address Cambridge, MA and London , pp. pages 604--606

work page 2003

[25] [25]

, author Kirby, S

author Christiansen, M.H. , author Kirby, S. , year 2003 . title Language evolution: consensus and controversies . journal Trends in Cognitive Sciences volume 7 , pages 300--307 . https://www.sciencedirect.com/science/article/pii/S1364661303001360, :https://doi.org/10.1016/S1364-6613(03)00136-0

work page doi:10.1016/s1364-6613(03)00136-0 2003

[26] [26]

, year 1977

author Clark, R. , year 1977 . title What’s the use of imitation? journal Journal of Child Language volume 4 , pages 341–358 . :10.1017/S0305000900001732

work page doi:10.1017/s0305000900001732 1977

[27] [27]

, author Tomasello, M

author Diessel, H. , author Tomasello, M. , year 2005 . title A new look at the acquisition of relative clauses . journal Language volume 81 , pages 882--906 . https://doi.org/10.1353/lan.2005.0169, :10.1353/lan.2005.0169

work page doi:10.1353/lan.2005.0169 2005

[28] [28]

, author Heinz, J

author Dolatian, H. , author Heinz, J. , year 2020 . title Computing and classifying reduplication with 2-way finite-state transducers . journal Journal of Language Modelling volume 8 , pages 179--250 . https://jlm.ipipan.waw.pl/index.php/JLM/article/view/245, :10.15398/jlm.v8i1.245

work page doi:10.15398/jlm.v8i1.245 2020

[29] [29]

, author McAuley, J.J

author Donahue, C. , author McAuley, J.J. , author Puckette, M.S. , year 2019 . title Adversarial audio synthesis , in: booktitle 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 , publisher OpenReview.net . pp. pages 1--16 . https://openreview.net/forum?id=ByMVTsR5KQ

work page 2019

[30] [30]

, year 2000

author Fitch, W. , year 2000 . title The evolution of speech: a comparative review . journal Trends in Cognitive Sciences volume 4 , pages 258--267 . https://www.sciencedirect.com/science/article/pii/S1364661300014947, :https://doi.org/10.1016/S1364-6613(00)01494-7

work page doi:10.1016/s1364-6613(00)01494-7 2000

[31] [31]

, year 2003

author Fox, J. , year 2003 . title Effect displays in R for generalised linear models . journal Journal of Statistical Software volume 8 , pages 1--27 . :10.18637/jss.v008.i15

work page doi:10.18637/jss.v008.i15 2003

[32] [32]

, author Weisberg, S

author Fox, J. , author Weisberg, S. , year 2019 . title An R Companion to Applied Regression . edition 3rd ed., publisher Sage , address Thousand Oaks CA . https://socialsciences.mcmaster.ca/jfox/Books/Companion/index.html

work page 2019

[33] [33]

, author Lamel, L

author Garofolo, J.S. , author Lamel, L. , author M Fisher, W. , author Fiscus, J. , author S. Pallett, D. , author L. Dahlgren, N. , author Zue, V. , year 1993 . title TIMIT acoustic-phonetic continuous speech corpus . journal Linguistic Data Consortium

work page 1993

[34] [34]

, author Poeppel, D

author Giraud, A.L. , author Poeppel, D. , year 2012 . title Cortical oscillations and speech processing: emerging computational principles and operations . journal Nature Neuroscience volume 15 , pages 511--517 . https://doi.org/10.1038/nn.3063, :10.1038/nn.3063

work page doi:10.1038/nn.3063 2012

[35] [35]

, author Kalish, M.L

author Griffiths, T.L. , author Kalish, M.L. , year 2007 . title Language evolution by iterated learning with bayesian agents . journal Cognitive Science volume 31 , pages 441--480 . https://onlinelibrary.wiley.com/doi/abs/10.1080/15326900701326576, :https://doi.org/10.1080/15326900701326576, http://arxiv.org/abs/https://onlinelibrary.wiley.com/doi/pdf/10...

work page doi:10.1080/15326900701326576 2007

[36] [36]

, author Eckstein, K

author Hahne, A. , author Eckstein, K. , author Friederici, A.D. , year 2004 . title Brain Signatures of Syntactic and Semantic Processes during Children's Language Development . journal Journal of Cognitive Neuroscience volume 16 , pages 1302--1318 . https://doi.org/10.1162/0898929041920504, :10.1162/0898929041920504

work page doi:10.1162/0898929041920504 2004

[37] [37]

, year 1999

author Jackendoff, R. , year 1999 . title Possible stages in the evolution of the language capacity . journal Trends in Cognitive Sciences volume 3 , pages 272--279 . https://www.sciencedirect.com/science/article/pii/S1364661399013339, :https://doi.org/10.1016/S1364-6613(99)01333-9

work page doi:10.1016/s1364-6613(99)01333-9 1999

[38] [38]

, year 2000

author Kirby, S. , year 2000 . title Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners , in: editor Knight, C. , editor Studdert-Kennedy, M. , editor Hurford, J. (Eds.), booktitle The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form . publisher Cambridge Univers...

work page 2000

[39] [39]

, year 2001

author Kirby, S. , year 2001 . title Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity . journal IEEE Transactions on Evolutionary Computation volume 5 , pages 102--110 . :10.1109/4235.918430

work page doi:10.1109/4235.918430 2001

[40] [40]

, author Meltzoff, A.N

author Kuhl, P.K. , author Meltzoff, A.N. , year 1996 . title Infant vocalizations in response to speech: Vocal imitation and developmental change . journal The Journal of the Acoustical Society of America volume 100 , pages 2425--2438 . https://doi.org/10.1121/1.417951, :10.1121/1.417951

work page doi:10.1121/1.417951 1996

[41] [41]

, author Shi, F

author Lai, C.I. , author Shi, F. , author Peng, P. , author Kim, Y. , author Gimpel, K. , author Chang, S. , author Chuang, Y.S. , author Bhati, S. , author Cox, D.D. , author Harwath, D. , author Zhang, Y. , author Livescu, K. , author Glass, J.R. , year 2023 . title Textless phrase structure induction from visually-grounded speech . https://openreview....

work page 2023

[42] [42]

, author Kharitonov, E

author Lakhotia, K. , author Kharitonov, E. , author Hsu, W.N. , author Adi, Y. , author Polyak, A. , author Bolte, B. , author Nguyen, T.A. , author Copet, J. , author Baevski, A. , author Mohamed, A. , author Dupoux, E. , year 2021 . title On generative spoken language modeling from raw audio . journal Transactions of the Association for Computational L...

work page doi:10.1162/tacl_a_00430 2021

[43] [43]

, year 2022

author Lin, R. , year 2022 . title Analysis on the selection of the appropriate batch size in cnn neural network , in: booktitle 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE) , pp. pages 106--109 . :10.1109/MLKE55170.2022.00026

work page doi:10.1109/mlke55170.2022.00026 2022

[44] [44]

, author Luuk, H

author Luuk, E. , author Luuk, H. , year 2014 . title The evolution of syntax: Signs, concatenation and embedding . journal Cognitive Systems Research volume 27 , pages 1--10 . https://www.sciencedirect.com/science/article/pii/S1389041713000028, :https://doi.org/10.1016/j.cogsys.2013.01.001

work page doi:10.1016/j.cogsys.2013.01.001 2014

[45] [45]

, author Chomsky, N

author Marcolli, M. , author Chomsky, N. , author Berwick, R. , year 2023 . title Mathematical structure of syntactic merge . journal arXiv https://arxiv.org/abs/2305.18278, http://arxiv.org/abs/2305.18278 arXiv:2305.18278

work page arXiv 2023

[46] [46]

, author Rodemaker, J.E

author Masur, E.F. , author Rodemaker, J.E. , year 1999 . title Mothers' and infants' spontaneous vocal, verbal, and action imitation during the second year . journal Merrill-Palmer Quarterly volume 45 , pages 392 -- 412 . https://www.scopus.com/inward/record.uri?eid=2-s2.0-0033235026&partnerID=40&md5=6d43c615c751588167621ec9057eca48. note cited by: 41

work page 1999

[47] [47]

, author Komarova, N.L

author Nowak, M.A. , author Komarova, N.L. , year 2001 . title Towards an evolutionary theory of language . journal Trends in Cognitive Sciences volume 5 , pages 288--295 . https://www.sciencedirect.com/science/article/pii/S1364661300016831, :https://doi.org/10.1016/S1364-6613(00)01683-1

work page doi:10.1016/s1364-6613(00)01683-1 2001

[48] [48]

, author Krakauer, D.C

author Nowak, M.A. , author Krakauer, D.C. , year 1999 . title The evolution of language . journal Proceedings of the National Academy of Sciences volume 96 , pages 8028--8033 . https://www.pnas.org/doi/abs/10.1073/pnas.96.14.8028, :10.1073/pnas.96.14.8028, http://arxiv.org/abs/https://www.pnas.org/doi/pdf/10.1073/pnas.96.14.8028 arXiv:https://www.pnas.or...

work page doi:10.1073/pnas.96.14.8028 1999

[49] [49]

, year 2013

author Okanoya, K. , year 2013 . title Finite-State Song Syntax in Bengalese Finches: Sensorimotor Evidence, Developmental Processes, and Formal Procedures for Syntax Extraction , in: booktitle Birdsong, Speech, and Language: Exploring the Evolution of Mind and Brain . publisher The MIT Press . https://doi.org/10.7551/mitpress/9322.003.0016, :10.7551/mitp...

work page doi:10.7551/mitpress/9322.003.0016 2013

[50] [50]

, author Smith, K

author O’Grady, C. , author Smith, K. , year 2018 . title 899Models of Language Evolution , in: booktitle The Oxford Handbook of Psycholinguistics . publisher Oxford University Press , p. pages 899–914 . https://doi.org/10.1093/oxfordhb/9780198786825.013.38, :10.1093/oxfordhb/9780198786825.013.38

work page doi:10.1093/oxfordhb/9780198786825.013.38 2018

[51] [51]

, year 2007

author Patel, A.D. , year 2007 . title Syntax , in: booktitle Music, Language, and the Brain . publisher Oxford University Press . :10.1093/acprof:oso/9780195123753.003.0005

work page doi:10.1093/acprof:oso/9780195123753.003.0005 2007

[52] [52]

, author Navarro, D.J

author Perfors, A. , author Navarro, D.J. , year 2014 . title Language evolution can be shaped by the structure of the world . journal Cognitive Science volume 38 , pages 775--793 . https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.12102, :https://doi.org/10.1111/cogs.12102, http://arxiv.org/abs/https://onlinelibrary.wiley.com/doi/pdf/10.1111/cogs.1210...

work page doi:10.1111/cogs.12102 2014

[53] [53]

, author Xue, M

author Pfeffer, C.K. , author Xue, M. , author He, M. , author Huang, Z.J. , author Scanziani, M. , year 2013 . title Inhibition of inhibition in visual cortex: the logic of connections between molecularly distinct interneurons . journal Nature Neuroscience volume 16 , pages 1068--1076 . https://doi.org/10.1038/nn.3446, :10.1038/nn.3446

work page doi:10.1038/nn.3446 2013

[54] [54]

, author Hangya, B

author Pi, H.J. , author Hangya, B. , author Kvitsiani, D. , author Sanders, J.I. , author Huang, Z.J. , author Kepecs, A. , year 2013 . title Cortical interneurons that specialize in disinhibitory control . journal Nature volume 503 , pages 521--524 . https://doi.org/10.1038/nature12676, :10.1038/nature12676

work page doi:10.1038/nature12676 2013

[55] [55]

, year 2015

author Progovac, L. , year 2015 . title Evolutionary syntax . Oxford studies in the evolution of language ; 20. edition first edition ed., publisher Oxford University Press , address Oxford, United Kingdom ; New York, NY

work page 2015

[56] [56]

Robust Speech Recognition via Large-Scale Weak Supervision

author Radford, A. , author Kim, J.W. , author Xu, T. , author Brockman, G. , author McLeavey, C. , author Sutskever, I. , year 2022 . title Robust speech recognition via large-scale weak supervision . http://arxiv.org/abs/2212.04356 arXiv:2212.04356

work page internal anchor Pith review Pith/arXiv arXiv 2022

[57] [57]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

author Radford, A. , author Metz, L. , author Chintala, S. , year 2015 . title Unsupervised representation learning with deep convolutional generative adversarial networks . journal arXiv preprint arXiv:1511.06434

work page internal anchor Pith review Pith/arXiv arXiv 2015

[58] [58]

, author R \"a s \"a nen, O

author Rasilo, H. , author R \"a s \"a nen, O. , year 2017 . title An online model for vowel imitation learning . journal Speech Communication volume 86 , pages 1--23 . https://www.sciencedirect.com/science/article/pii/S0167639315300728, :https://doi.org/10.1016/j.specom.2016.10.010

work page doi:10.1016/j.specom.2016.10.010 2017

[59] [59]

, author Michel, P

author Rita, M. , author Michel, P. , author Chaabouni, R. , author Pietquin, O. , author Dupoux, E. , author Strub, F. , year 2024 . title Language evolution with deep learning . journal arXiv https://arxiv.org/abs/2403.11958, http://arxiv.org/abs/2403.11958 arXiv:2403.11958

work page arXiv 2024

[60] [60]

, author Tallec, C

author Rita, M. , author Tallec, C. , author Michel, P. , author Grill, J.B. , author Pietquin, O. , author Dupoux, E. , author Strub, F. , year 2022 . title Emergent communication: Generalization and overfitting in lewis games , in: editor Oh, A.H. , editor Agarwal, A. , editor Belgrave, D. , editor Cho, K. (Eds.), booktitle Advances in Neural Informatio...

work page 2022

[61] [61]

, author Shah, J

author Singla, Y.K. , author Shah, J. , author Chen, C. , author Shah, R.R. , year 2022 . title What do audio transformers hear? probing their representations for language delivery & structure , in: booktitle 2022 IEEE International Conference on Data Mining Workshops (ICDMW) , pp. pages 910--925 . :10.1109/ICDMW58026.2022.00120

work page doi:10.1109/icdmw58026.2022.00120 2022

[62] [62]

, author Brauer, J

author Skeide, M.A. , author Brauer, J. , author Friederici, A.D. , year 2014 . title Syntax gradually segregates from semantics in the developing brain . journal NeuroImage volume 100 , pages 106--111 . https://www.sciencedirect.com/science/article/pii/S1053811914004650, :https://doi.org/10.1016/j.neuroimage.2014.05.080

work page doi:10.1016/j.neuroimage.2014.05.080 2014

[63] [63]

, year 2024

author Youngblood, M. , year 2024 . title Language-like efficiency and structure in house finch song . journal Proceedings of the Royal Society B: Biological Sciences volume 291 , pages 20240250 . https://royalsocietypublishing.org/doi/abs/10.1098/rspb.2024.0250, :10.1098/rspb.2024.0250, http://arxiv.org/abs/https://royalsocietypublishing.org/doi/pdf/10.1...

work page doi:10.1098/rspb.2024.0250 2024

[64] [64]

, author de Boer , B

author Zuidema, W. , author de Boer , B. , year 2009 . title The evolution of combinatorial phonology . journal Journal of Phonetics volume 37 , pages 125--144 . https://www.sciencedirect.com/science/article/pii/S0095447008000624, :https://doi.org/10.1016/j.wocn.2008.10.003

work page doi:10.1016/j.wocn.2008.10.003 2009