Creating ConLangs to Probe the Metalinguistic Grammatical Knowledge of LLMs

Chihiro Taguchi; Richard Sproat

arxiv: 2510.07591 · v3 · submitted 2025-10-08 · 💻 cs.CL

Creating ConLangs to Probe the Metalinguistic Grammatical Knowledge of LLMs

Chihiro Taguchi , Richard Sproat This is my paper

Pith reviewed 2026-05-18 08:37 UTC · model grok-4.3

classification 💻 cs.CL

keywords constructed languageslarge language modelsmetalinguistic knowledgemorphosyntaxtypological patternslanguage generationAI evaluation

0 comments

The pith

A modular system for creating constructed languages shows LLMs handle common grammatical patterns far more easily than rare ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces IASC, a system that breaks language creation into separate modules for sounds, word formation, sentence structure, vocabulary, writing systems, and grammar guides, each handled by targeted prompts to an LLM. Modules can refine their outputs using automatically generated commentary on prior steps, creating an interactive process. Experiments concentrate on the morphosyntax module and find substantial differences in how well various LLMs manage different linguistic rules, with clear advantages for patterns that appear often in natural languages over those that do not. A sympathetic reader would care because the work supplies a concrete way to examine what models understand about language as a system rather than facts about particular languages.

Core claim

We present IASC, an interactive agentic system for ConLangs that creates phonology, morphology and syntax, lexicon, orthography, and a grammatical handbook through module-specific prompts, with refinement allowed by automatically generated commentary on previous outputs. Focusing on the morphosyntax module, the experiments demonstrate a fairly wide gulf in capabilities both among different LLMs and among different linguistic specifications, with it being notably easier for systems to deal with more typologically common patterns than rarer ones.

What carries the argument

The IASC modular prompt framework, which generates and refines ConLang components module by module using automatic commentary to isolate and test metalinguistic grammatical knowledge.

If this is right

Different LLMs display markedly different success rates when generating consistent morphosyntactic rules for a new language.
LLMs succeed more readily when the required patterns match those frequent across natural languages than when the patterns are typologically uncommon.
The overall system supplies a practical, interactive tool that lets users build complete constructed languages through successive refinements.
This method offers a route to probe LLMs' grasp of general linguistic concepts without relying on facts from any specific existing language.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the observed gaps hold up, they suggest that models' apparent linguistic knowledge largely tracks statistical regularities in training data rather than abstract rule systems.
The same modular approach could be adapted to test other layers of language understanding such as semantic role assignment or discourse coherence.
Model developers could incorporate rare-pattern ConLang tasks as a diagnostic for identifying blind spots in grammatical generalization.

Load-bearing premise

That the modular prompt-based generation plus automatic commentary accurately isolates metalinguistic grammatical knowledge rather than testing the model's ability to follow complex instructions or to continue patterns seen in training data.

What would settle it

Re-running the morphosyntax generation tasks on the same models and linguistic specifications but without any module prompts, commentary steps, or ConLang framing and checking whether the performance advantage for typologically common patterns over rare ones disappears.

Figures

Figures reproduced from arXiv: 2510.07591 by Chihiro Taguchi, Richard Sproat.

read the original abstract

We present a system that uses LLMs as a tool in the development of Constructed Languages -- ConLangs, which we call IASC (Interactive Agentic System for ConLangs). The system is modular in that it creates each of the components -- phonology, morphology and syntax, lexicon, orthography, and grammatical handbook, using module-specific sets of prompts. The approach is agentic in that various modules allow for refining the output given automatically-generated commentary on a previous step. Our main goals are twofold. First, we aim to provide tools that facilitate an engaging and enjoyable experience in creating artificially constructed languages. Second, the focus of this paper is on using our ConLang framework as a novel way to explore what LLMs 'know' about language -- not what they know about any particular language or encyclopedic facts, but how much they know about and understand language and linguistic concepts. In the experiments, we particularly focus on the morphosyntax module and show that there is a fairly wide gulf in capabilities both among different LLMs and among different linguistic specifications, with it being notably easier for systems to deal with more typologically common patterns than rarer ones. All code is released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives us IASC, a modular LLM pipeline for building full ConLangs with refinement steps, and uses the morphosyntax part to show models handle common patterns better than rare ones.

read the letter

The main point is that they built IASC to generate constructed languages in pieces—phonology, morphology, syntax, lexicon, orthography, and a handbook—where each module uses targeted prompts and then refines outputs from auto-generated commentary. The experiments zero in on morphosyntax and report noticeable gaps: models do better with typologically common specifications than with rarer ones, and different LLMs vary quite a bit in overall performance. All the code is out, which is straightforward to check.

Referee Report

3 major / 2 minor

Summary. The paper introduces IASC, a modular agentic LLM-based system for generating constructed languages (ConLangs) by separately prompting for phonology, morphology/syntax, lexicon, orthography, and a grammatical handbook, with iterative refinement driven by automatically generated commentary on prior outputs. The central empirical focus is the morphosyntax module, where experiments across multiple LLMs reveal performance differences both between models and between typologically common versus rare linguistic patterns, which the authors interpret as evidence of varying metalinguistic grammatical knowledge.

Significance. If the interpretation holds after addressing confounds, the work supplies a practical tool for ConLang creation while offering a novel probe for abstract linguistic knowledge in LLMs that goes beyond language-specific facts. The open release of code is a clear strength that enables reproducibility and extension. The significance is tempered by the need to demonstrate that observed gaps reflect conceptual understanding rather than instruction-following or pretraining biases.

major comments (3)

[Experiments] Morphosyntax experiments: The central claim that performance differences reflect metalinguistic grammatical knowledge depends on the modular prompt system isolating linguistic concepts, yet no control conditions (e.g., non-linguistic rule-generation tasks or novel feature combinations) are reported to rule out confounds from prompt complexity, multi-step instruction following, or differential exposure to common vs. rare structures in pretraining data.
[Experiments] Evaluation of results: The reported 'wide gulf' across models and pattern types is presented qualitatively without quantitative metrics, error bars, number of trials, or statistical tests, making it difficult to assess robustness or to compare effect sizes between typologically common and rare specifications.
[IASC system description] Agentic refinement: The automatic commentary module that drives refinement is itself LLM-generated and not validated against human linguistic judgments or inter-annotator agreement, raising the possibility that observed differences partly reflect commentary quality or error accumulation rather than the target model's grammatical knowledge.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly separate the tool-building contribution from the probing contribution to clarify the paper's primary focus for readers.
Notation for linguistic features and pattern types should be standardized and defined in a single table or section to improve readability when comparing common vs. rare specifications.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We value the opportunity to address the concerns raised regarding experimental controls, quantitative evaluation, and validation of the agentic components. We respond to each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Experiments] Morphosyntax experiments: The central claim that performance differences reflect metalinguistic grammatical knowledge depends on the modular prompt system isolating linguistic concepts, yet no control conditions (e.g., non-linguistic rule-generation tasks or novel feature combinations) are reported to rule out confounds from prompt complexity, multi-step instruction following, or differential exposure to common vs. rare structures in pretraining data.

Authors: We agree that additional controls would help isolate whether differences stem from metalinguistic knowledge rather than other factors. Our design varies only the linguistic specifications (common vs. rare patterns) while holding the modular prompt structure constant across conditions, which provides some control for prompt complexity and instruction following. Nevertheless, we acknowledge the value of explicit controls such as non-linguistic rule tasks. In the revised manuscript we will add a dedicated limitations subsection discussing these potential confounds and include one simple control experiment comparing linguistic versus arbitrary symbolic rule generation. revision: partial
Referee: [Experiments] Evaluation of results: The reported 'wide gulf' across models and pattern types is presented qualitatively without quantitative metrics, error bars, number of trials, or statistical tests, making it difficult to assess robustness or to compare effect sizes between typologically common and rare specifications.

Authors: We accept this criticism. The current draft relies on qualitative descriptions of observed differences. For the revision we will report the exact number of trials per condition, success rates as percentages with standard error bars, and apply appropriate statistical tests (e.g., paired t-tests or Fisher's exact tests) to quantify differences both across models and between common versus rare linguistic patterns. These metrics will be added to the results section and figures. revision: yes
Referee: [IASC system description] Agentic refinement: The automatic commentary module that drives refinement is itself LLM-generated and not validated against human linguistic judgments or inter-annotator agreement, raising the possibility that observed differences partly reflect commentary quality or error accumulation rather than the target model's grammatical knowledge.

Authors: This is a fair observation. The commentary module is intentionally LLM-generated to enable fully automated iterative refinement. We will revise the system description to explicitly state this design choice and its scalability benefits while adding a limitations paragraph noting the absence of human validation. We will also outline future work that could include human annotation studies to measure commentary quality and inter-annotator agreement. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical system and experimental results are self-contained

full rationale

The paper introduces an agentic modular LLM system (IASC) for generating ConLang components via prompt-based modules and automatic refinement, then reports experimental performance gaps on morphosyntax tasks across LLMs and linguistic specifications. These gaps are measured directly from system outputs on typologically common vs. rare patterns and do not reduce to any fitted parameters, self-defined quantities, or self-citation chains. No equations, uniqueness theorems, or ansatzes are present that loop back to the authors' inputs or prior work. The claims rest on observable experimental results with released code, making the work externally falsifiable and independent of its own construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that prompt-driven generation plus auto-commentary isolates genuine metalinguistic knowledge. No free parameters are fitted to data in the reported experiments. No new physical or mathematical entities are postulated.

axioms (1)

domain assumption LLM outputs under modular prompts plus automatic commentary reflect metalinguistic grammatical knowledge rather than instruction-following or training-data continuation.
This premise is required for the performance differences to be interpreted as evidence about what the models 'know' about language.

invented entities (1)

IASC (Interactive Agentic System for ConLangs) no independent evidence
purpose: Modular pipeline that generates and refines ConLang components to serve as a probe.
The system is the main technical contribution; it is not claimed to be a new fundamental entity but a practical tool.

pith-pipeline@v0.9.0 · 5742 in / 1358 out tokens · 26240 ms · 2026-05-18T08:37:09.831557+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

B leu: a Method for Automatic Evaluation of Machine Translation

URLhttps://arxiv.org/abs/2408.09639. Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, and Christopher Potts. Mis- sion: Impossible language models, 2024. URLhttps://arxiv.org/abs/2401.06416. Anisia Katinskaia and Roman Yangarber. Probing the category of verbal aspect in transformer language models, 2024. URLhttps://arxiv.org/abs/2406.0...

work page doi:10.3115/1073083.1073135 2024
[2]

Findings of the B aby LM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

URLhttps://arxiv.org/abs/2509.07389. Garrett Tanzer, Mirac Suzgun, Eline Visser, Dan Jurafsky, and Luke Melas-Kyriazi. A benchmark for learning to translate a new language from one grammar book, 2024. URLhttps://arxiv. org/abs/2309.16575. Lindia Tjuatja, Graham Neubig, Tal Linzen, and Sophie Hao. What goes into a LM acceptability judgment? rethinking the ...

work page doi:10.18653/v1/2023.conll-babylm.1 2024
[4]

", again separated by spaces. For example, the following is a valid output

Mark internal syllable boundaries with ".", again separated by spaces. For example, the following is a valid output. "o . k a . tSa" The following would be wrong, because the phonemes are not space-separated: "o.ka.tSa" Your code should include two global dictionaries, one named "consonants" and the other named "vowels". These should contain the phonemes ...

work page
[5]

Within each morpheme, please place spaces between every phoneme

work page
[6]

", again separated by spaces. For example, the following is a valid output

Mark internal syllable boundaries with ".", again separated by spaces. For example, the following is a valid output. "o . k a . tSa" The following would be wrong, because the phonemes are not space-separated: "o.ka.tSa" Your code should include two global dictionaries, one named "consonants" and the 36 **DRAFT** other named "vowels". These should contain ...

work page
[7]

The text must be in English

work page
[8]

Use simple words (5th grade reading level), and simple sentence structure

work page
[9]

Please put the story you write within <OUTPUT></OUTPUT> tags

No more than about 500 words. Please put the story you write within <OUTPUT></OUTPUT> tags. </INSTRUCTIONS> Sentence design general guidelines <INSTRUCTIONS>  General guidelines: - All sentences must be in English. - Use simple words (5th grade reading level), and simple sentence structure. - Each sentence should be...

work page
[10]

Mary is taller than John

sentence two ... ... Collect your numbered sentences together into a single text and place that within the tags <OUTPUT></OUTPUT>. You may explain your rationale for the set of sentences you came up with, but please put NOTHING except the sentence text within the <OUTPUT></OUTPUT> tags. </INSTRUCTIONS> Comparative examples <INSTRUCTIONS> I'd like you to w...

work page
[11]

NAME The first thing you need to do is come up with a LANGUAGE_NAME for the ConLang

work page
[12]

The set of phonemes (in IPA) that you developed for this language, is as follows: Consonants: {{consonants}} Vowels: {{vowels}}

PHONOLOGY Then you describe the phonology. The set of phonemes (in IPA) that you developed for this language, is as follows: Consonants: {{consonants}} Vowels: {{vowels}}

work page
[13]

ORTHOGRAPHY Next you will describe the orthography, which is handled by the following code: # Begin orthography code {{orthography_code}} # End orthography code

work page
[14]

LEXICON A subset of the lexicon with terms needed for the sample story is as follows: # Begin lexicon {{lexicon}} # End lexicon

work page
[15]

{{number_marking}}

MORPHOSYNTAX The morphosyntactic description on which you based your morphosyntax was as follows. Below I remind you what the different annotations mean, and which particular annotations were chosen for our language: NOMINAL MARKING Number: SING: Singular (one) PLUR: Plural (more than one) DUAL: Dual (exactly two) Our language has: "{{number_marking}}". C...

work page
[16]

{{gender_marking}}

COMMON, NEUT For personal names, gender appropriate affixes were chosen. For common nouns, since English does not have gender, you made a decision as to what gender to assign to the noun, with the injunction: be creative, but be consistent! Our language marks "{{gender_marking}}". NUMERAL CLASSIFIERS Many languages have classifiers that go with numbers, a...

work page
[17]

drink John water

SYNTAX/WORD ORDER Main: VSO: "drink John water" SVO: "John drink water" SOV: "John water drink" VOS: "drink water John" 48 **DRAFT** OVS: "water drink John" OSV: "water John drink" Adjective-Noun: NA: "book red" AN: "red book" Adposition-Noun: PN: "on table" NP: "table on" Our language has "{{main_word_order}}" main word order. "{{adj_noun_word_order}}" a...

work page
[18]

translation

TEXTS An example text is as follows, where after "==" the first line is the morphosyntactic gloss, the second is the phonetic transcription and the final is the orthographic transcription: # BEGIN EXAMPLE TEXT {{sample_text}} # END EXAMPLE TEXT ========================== Arrange your handbook according to the format specified between the BEGIN and END lin...

work page 2004
[19]

V[+nas]→V[-nas]

Removed nasal vowels ( ˜e:, õ:, ˜ı:,˜u:): The rule “V[+nas]→V[-nas]” indicates that all nasal vowels became non-nasal

work page
[20]

Retained ‘h’: This phoneme was present in Anglo-Frisian and there’s no rule indicating its loss, so it should be kept in Old English

work page
[21]

Removed ’˜A’: The rule “˜A:→õ:” suggests this phoneme changed, and then the nasalization was lost as per the second rule

work page
[22]

The original ‘k’ should still exist in other contexts

Retained ’k’: While there’s a rule about ‘k’ becoming ‘tS’ in certain circumstances, it’s not a blanket change. The original ‘k’ should still exist in other contexts

work page
[23]

kGg→tS JdZ

Added ‘J’: This comes from the rule “kGg→tS JdZ” although it’s noted to occur only in certain circumstances

work page
[24]

Devise a set of rules that makes sense for the phoneme set of our language given the observed phonotactics

Retained ‘u’ and ‘i’: The rule about their loss is context-dependent (word-final position), so they should still exist in other positions. The second revised version was as follows: a, a:, b, d, dZ, e, e:, f, g, h, i, i:, j, k, k w, l, m, n, o, o:, p, r, s, t, tS, u, u:, w, x, æ, æ:, J,T. According to Wikipedia13, the actual Old English phoneme inventory ...

work page 2019
[25]

Fixed the spacing issue by ensuring that all phonemes are properly space-separated, includ- ing word and syllable boundaries

work page
[26]

Corrected the ‘consonant_lenition’ function to properly apply the rules for t→ð, k→x, and p→f between vowels

work page
[27]

Fixed the ‘vowel_nasalization’ function to maintain the space between the nasalized vowel and the following nasal consonant

work page
[28]

Improved the ‘vowel_reduction’ function to correctly reduce vowels in non-final sylla- bles and the final syllable of words with more than one syllable

work page
[29]

Updated the vowel set in various functions to include the new ‘0’ phoneme introduced by the vowel shift. The second round code is as follows: 1import re 2 3def vowel_shift(inp: str) -> str: 4# Example inputs/outputs: 5# u ->0 6# o -> u 7#O-> o 8vowel_map = { 9'u':'0', 10'o':'u', 11'O':'o', 12'E':'e', 13'æ':'E' 14} 15return' '.join(vowel_map.get(phone, pho...

work page
[30]

Devise a set of rules that makes sense for the phoneme set of our language given the observed phonotactics

work page
[31]

Implement each of these rules in Python

work page
[32]

#" as an end-of-word symbol: you should assume that word boundaries in the input will be indicated by this symbol. The symbol

Put the rules together into a Python library. Each rule should take the form of a python function that takes as input a string of space-delimited phonemes and possible syllable boundaries, and outputs a string in the same format. You must make sure that the output phonemes are space-delimited as in the input. All your rules must allow for the presence of ...

work page
[33]

n p", the output should be

All phonemes are space-separated. All your rules must take this into account. Be careful: make sure you allow for EXACTLY ONE space between phonemes since rules that depend on more than one space being there will not work. Your rules must also output space-separated phonemes. Thus if the input to a nasal assimilation rule is "n p", the output should be "m...

work page
[34]

A common error is to have backreferences like "\\1\\2", but only have a single previous capturing group

If you use the Python regex library and make use of groups, make sure you have enough capturing groups to support the number of back references you assume. A common error is to have backreferences like "\\1\\2", but only have a single previous capturing group

work page
[35]

sre_constants.error: look-behind requires fixed-width pattern

Avoid using regex look-behind since you inevitably miss the point that look-behind patterns are fixed width, which triggers the "sre_constants.error: look-behind requires fixed-width pattern" error

work page
[36]

Make sure you have imports for all needed libraries in your code

Finally, it is OK if your rules introduce phonemes that are NOT in the input phoneme set since, after all, that is what sound change is all about. Make sure you have imports for all needed libraries in your code. Explain your reasoning. Then place your resulting code in the block <OUTPUT></OUTPUT>. This task will depend on your deep knowledge of historica...

work page 1985

[1] [1]

B leu: a Method for Automatic Evaluation of Machine Translation

URLhttps://arxiv.org/abs/2408.09639. Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, and Christopher Potts. Mis- sion: Impossible language models, 2024. URLhttps://arxiv.org/abs/2401.06416. Anisia Katinskaia and Roman Yangarber. Probing the category of verbal aspect in transformer language models, 2024. URLhttps://arxiv.org/abs/2406.0...

work page doi:10.3115/1073083.1073135 2024

[2] [2]

Findings of the B aby LM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

URLhttps://arxiv.org/abs/2509.07389. Garrett Tanzer, Mirac Suzgun, Eline Visser, Dan Jurafsky, and Luke Melas-Kyriazi. A benchmark for learning to translate a new language from one grammar book, 2024. URLhttps://arxiv. org/abs/2309.16575. Lindia Tjuatja, Graham Neubig, Tal Linzen, and Sophie Hao. What goes into a LM acceptability judgment? rethinking the ...

work page doi:10.18653/v1/2023.conll-babylm.1 2024

[3] [4]

", again separated by spaces. For example, the following is a valid output

Mark internal syllable boundaries with ".", again separated by spaces. For example, the following is a valid output. "o . k a . tSa" The following would be wrong, because the phonemes are not space-separated: "o.ka.tSa" Your code should include two global dictionaries, one named "consonants" and the other named "vowels". These should contain the phonemes ...

work page

[4] [5]

Within each morpheme, please place spaces between every phoneme

work page

[5] [6]

", again separated by spaces. For example, the following is a valid output

Mark internal syllable boundaries with ".", again separated by spaces. For example, the following is a valid output. "o . k a . tSa" The following would be wrong, because the phonemes are not space-separated: "o.ka.tSa" Your code should include two global dictionaries, one named "consonants" and the 36 **DRAFT** other named "vowels". These should contain ...

work page

[6] [7]

The text must be in English

work page

[7] [8]

Use simple words (5th grade reading level), and simple sentence structure

work page

[8] [9]

Please put the story you write within <OUTPUT></OUTPUT> tags

No more than about 500 words. Please put the story you write within <OUTPUT></OUTPUT> tags. </INSTRUCTIONS> Sentence design general guidelines <INSTRUCTIONS> <!-- Common to all the instructions. --> General guidelines: - All sentences must be in English. - Use simple words (5th grade reading level), and simple sentence structure. - Each sentence should be...

work page

[9] [10]

Mary is taller than John

sentence two ... ... Collect your numbered sentences together into a single text and place that within the tags <OUTPUT></OUTPUT>. You may explain your rationale for the set of sentences you came up with, but please put NOTHING except the sentence text within the <OUTPUT></OUTPUT> tags. </INSTRUCTIONS> Comparative examples <INSTRUCTIONS> I'd like you to w...

work page

[10] [11]

NAME The first thing you need to do is come up with a LANGUAGE_NAME for the ConLang

work page

[11] [12]

The set of phonemes (in IPA) that you developed for this language, is as follows: Consonants: {{consonants}} Vowels: {{vowels}}

PHONOLOGY Then you describe the phonology. The set of phonemes (in IPA) that you developed for this language, is as follows: Consonants: {{consonants}} Vowels: {{vowels}}

work page

[12] [13]

ORTHOGRAPHY Next you will describe the orthography, which is handled by the following code: # Begin orthography code {{orthography_code}} # End orthography code

work page

[13] [14]

LEXICON A subset of the lexicon with terms needed for the sample story is as follows: # Begin lexicon {{lexicon}} # End lexicon

work page

[14] [15]

{{number_marking}}

MORPHOSYNTAX The morphosyntactic description on which you based your morphosyntax was as follows. Below I remind you what the different annotations mean, and which particular annotations were chosen for our language: NOMINAL MARKING Number: SING: Singular (one) PLUR: Plural (more than one) DUAL: Dual (exactly two) Our language has: "{{number_marking}}". C...

work page

[15] [16]

{{gender_marking}}

COMMON, NEUT For personal names, gender appropriate affixes were chosen. For common nouns, since English does not have gender, you made a decision as to what gender to assign to the noun, with the injunction: be creative, but be consistent! Our language marks "{{gender_marking}}". NUMERAL CLASSIFIERS Many languages have classifiers that go with numbers, a...

work page

[16] [17]

drink John water

SYNTAX/WORD ORDER Main: VSO: "drink John water" SVO: "John drink water" SOV: "John water drink" VOS: "drink water John" 48 **DRAFT** OVS: "water drink John" OSV: "water John drink" Adjective-Noun: NA: "book red" AN: "red book" Adposition-Noun: PN: "on table" NP: "table on" Our language has "{{main_word_order}}" main word order. "{{adj_noun_word_order}}" a...

work page

[17] [18]

translation

TEXTS An example text is as follows, where after "==" the first line is the morphosyntactic gloss, the second is the phonetic transcription and the final is the orthographic transcription: # BEGIN EXAMPLE TEXT {{sample_text}} # END EXAMPLE TEXT ========================== Arrange your handbook according to the format specified between the BEGIN and END lin...

work page 2004

[18] [19]

V[+nas]→V[-nas]

Removed nasal vowels ( ˜e:, õ:, ˜ı:,˜u:): The rule “V[+nas]→V[-nas]” indicates that all nasal vowels became non-nasal

work page

[19] [20]

Retained ‘h’: This phoneme was present in Anglo-Frisian and there’s no rule indicating its loss, so it should be kept in Old English

work page

[20] [21]

Removed ’˜A’: The rule “˜A:→õ:” suggests this phoneme changed, and then the nasalization was lost as per the second rule

work page

[21] [22]

The original ‘k’ should still exist in other contexts

Retained ’k’: While there’s a rule about ‘k’ becoming ‘tS’ in certain circumstances, it’s not a blanket change. The original ‘k’ should still exist in other contexts

work page

[22] [23]

kGg→tS JdZ

Added ‘J’: This comes from the rule “kGg→tS JdZ” although it’s noted to occur only in certain circumstances

work page

[23] [24]

Devise a set of rules that makes sense for the phoneme set of our language given the observed phonotactics

Retained ‘u’ and ‘i’: The rule about their loss is context-dependent (word-final position), so they should still exist in other positions. The second revised version was as follows: a, a:, b, d, dZ, e, e:, f, g, h, i, i:, j, k, k w, l, m, n, o, o:, p, r, s, t, tS, u, u:, w, x, æ, æ:, J,T. According to Wikipedia13, the actual Old English phoneme inventory ...

work page 2019

[24] [25]

Fixed the spacing issue by ensuring that all phonemes are properly space-separated, includ- ing word and syllable boundaries

work page

[25] [26]

Corrected the ‘consonant_lenition’ function to properly apply the rules for t→ð, k→x, and p→f between vowels

work page

[26] [27]

Fixed the ‘vowel_nasalization’ function to maintain the space between the nasalized vowel and the following nasal consonant

work page

[27] [28]

Improved the ‘vowel_reduction’ function to correctly reduce vowels in non-final sylla- bles and the final syllable of words with more than one syllable

work page

[28] [29]

Updated the vowel set in various functions to include the new ‘0’ phoneme introduced by the vowel shift. The second round code is as follows: 1import re 2 3def vowel_shift(inp: str) -> str: 4# Example inputs/outputs: 5# u ->0 6# o -> u 7#O-> o 8vowel_map = { 9'u':'0', 10'o':'u', 11'O':'o', 12'E':'e', 13'æ':'E' 14} 15return' '.join(vowel_map.get(phone, pho...

work page

[29] [30]

Devise a set of rules that makes sense for the phoneme set of our language given the observed phonotactics

work page

[30] [31]

Implement each of these rules in Python

work page

[31] [32]

#" as an end-of-word symbol: you should assume that word boundaries in the input will be indicated by this symbol. The symbol

Put the rules together into a Python library. Each rule should take the form of a python function that takes as input a string of space-delimited phonemes and possible syllable boundaries, and outputs a string in the same format. You must make sure that the output phonemes are space-delimited as in the input. All your rules must allow for the presence of ...

work page

[32] [33]

n p", the output should be

All phonemes are space-separated. All your rules must take this into account. Be careful: make sure you allow for EXACTLY ONE space between phonemes since rules that depend on more than one space being there will not work. Your rules must also output space-separated phonemes. Thus if the input to a nasal assimilation rule is "n p", the output should be "m...

work page

[33] [34]

A common error is to have backreferences like "\\1\\2", but only have a single previous capturing group

If you use the Python regex library and make use of groups, make sure you have enough capturing groups to support the number of back references you assume. A common error is to have backreferences like "\\1\\2", but only have a single previous capturing group

work page

[34] [35]

sre_constants.error: look-behind requires fixed-width pattern

Avoid using regex look-behind since you inevitably miss the point that look-behind patterns are fixed width, which triggers the "sre_constants.error: look-behind requires fixed-width pattern" error

work page

[35] [36]

Make sure you have imports for all needed libraries in your code

Finally, it is OK if your rules introduce phonemes that are NOT in the input phoneme set since, after all, that is what sound change is all about. Make sure you have imports for all needed libraries in your code. Explain your reasoning. Then place your resulting code in the block <OUTPUT></OUTPUT>. This task will depend on your deep knowledge of historica...

work page 1985