GLeMM: A large-scale multilingual dataset for morphological research

Basilio Calderone (CLLE; Comue de Toulouse); Fiammetta Namer (ATILF; Franck Sajous (CLLE-ERSS; Hathout Nabil (CLLE; UBM); UL)

arxiv: 2604.12442 · v1 · submitted 2026-04-14 · 💻 cs.CL

GLeMM: A large-scale multilingual dataset for morphological research

Hathout Nabil (CLLE , Comue de Toulouse) , Basilio Calderone (CLLE , UBM) , Fiammetta Namer (ATILF , UL) , Franck Sajous (CLLE-ERSS This is my paper

Pith reviewed 2026-05-10 15:06 UTC · model grok-4.3

classification 💻 cs.CL

keywords derivational morphologymultilingual datasetWiktionaryword formationmorphological annotationEuropean languagescomputational morphology

0 comments

The pith

GLeMM supplies a large automated multilingual dataset of derivational morphology drawn from Wiktionary to support data-driven analysis of word formation across languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GLeMM as a new resource for investigating how form and meaning interact when new words are created. It is constructed through an identical automated process applied to Wiktionary for seven languages, with added annotations for morphological features and semantic descriptions on many entries. This scale and consistency allow studies of derivational morphology to move past small hand-selected examples toward replicable, generalizable results. A sympathetic reader would care because prior work on these questions has often relied on limited data or intuition, limiting what can be firmly established.

Core claim

GLeMM is a derivational resource of large size with coverage of seven European languages, a fully automated design that is identical across languages, automatic annotation of morphological features on each entry, and encoding of semantic descriptions for a significant subset. Created from Wiktionary articles, it enables researchers to address questions such as the role of form and meaning in word-formation and to develop and test computational methods that identify the structures of derivational morphology.

What carries the argument

The automated extraction and annotation pipeline applied identically to Wiktionary articles, which generates entries carrying morphological annotations and partial semantic descriptions.

If this is right

Researchers can now examine the role of form and meaning in word-formation with large-scale, consistent data instead of limited observations.
Computational methods for identifying derivational morphology structures can be developed and tested experimentally on the same resource.
Morphological studies can be replicated and generalized across German, English, Spanish, French, Italian, Polish, and Russian.
Data-driven description in morphology becomes feasible beyond intuition-based approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dataset could serve as training material for machine learning systems that process word formation in new texts.
Direct comparisons of annotated patterns across the seven languages might highlight both shared and language-specific tendencies in derivation.
Applying the same pipeline to additional languages would extend the scope for testing claims about universal aspects of morphology.

Load-bearing premise

The automated extraction and annotation pipeline applied identically to Wiktionary articles produces accurate and consistent morphological information across all seven languages without significant language-specific errors or coverage gaps.

What would settle it

A manual verification of randomly sampled entries from each of the seven languages that finds frequent inaccuracies or inconsistencies in the morphological annotations would show the resource cannot support reliable research.

read the original abstract

In derivational morphology, what mechanisms govern the variation in form-meaning relations between words? The answers to this type of questions are typically based on intuition and on observations drawn from limited data, even when a wide range of languages is considered. Many of these studies are difficult to replicate and generalize. To address this issue, we present GLeMM, a new derivational resource designed for experimentation and data-driven description in morphology. GLeMM is characterized by (i) its large size, (ii) its extensive coverage (currently amounting to seven European languages, i.e., German, English, Spanish, French, Italian, Polish, Russian, (iii) its fully automated design, identical across all languages, (iv) the automatic annotation of morphological features on each entry, as well as (v) the encoding of semantic descriptions for a significant subset of these entries. It enables researchers to address difficult questions, such as the role of form and meaning in word-formation, and to develop and experimentally test computational methods that identify the structures of derivational morphology. The article describes how GLeMM is created using Wiktionary articles and presents various case studies illustrating possible applications of the resource.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GLeMM scales up derivational morphology data across seven languages with a uniform Wiktionary pipeline, but the paper provides no quantitative checks on extraction accuracy.

read the letter

The paper's main offering is GLeMM, a new dataset of derivational morphology entries for German, English, Spanish, French, Italian, Polish, and Russian. It pulls the data automatically from Wiktionary using one pipeline for all languages and adds morphological feature tags plus semantic descriptions for a subset of entries. This setup aims to support replicable work on form-meaning patterns in word formation and to test computational methods for identifying derivational structures.

Referee Report

3 major / 3 minor

Summary. The manuscript presents GLeMM, a large-scale multilingual dataset for derivational morphology covering seven European languages (German, English, Spanish, French, Italian, Polish, Russian). It is constructed via a fully automated, identical pipeline from Wiktionary articles, with automatic annotation of morphological features on each entry and semantic descriptions for a significant subset. The paper describes the creation process and includes case studies to illustrate applications for studying form-meaning relations in word-formation and for developing/testing computational methods in derivational morphology.

Significance. If the automated pipeline produces accurate and consistent annotations, GLeMM would be a valuable resource: its scale, uniform cross-lingual design, and coverage of derivational data address a gap in replicable, data-driven morphological research. The automated construction and inclusion of semantic encodings are strengths that could support large-scale experiments on form-meaning mappings and method evaluation.

major comments (3)

[§3 (Construction Pipeline)] Construction section (described in the abstract and §3): The central claim that the identical automated extraction and annotation pipeline yields reliable morphological information across all seven languages lacks any quantitative validation. No precision, recall, error rates, coverage statistics, or gold-standard comparisons are reported, despite acknowledged differences in Wiktionary article structure and quality by language. This directly undermines the weakest assumption that the resource reliably supports the stated research questions.
[§4 (Annotations)] Annotation and semantic encoding (abstract and §4): While morphological features are automatically annotated and semantic descriptions are provided for a subset, the paper provides no details on validation of these annotations (e.g., inter-annotator agreement or per-language accuracy metrics). This is load-bearing for claims about enabling form-meaning analyses.
[§6 (Case Studies)] Case studies (§6): The applications for addressing questions on word-formation and testing computational methods are illustrated but without any empirical assessment of dataset quality or utility in those tasks, such as baseline experiments or error analysis on the extracted data.

minor comments (3)

[Introduction] The abstract and introduction would benefit from explicit comparison to existing morphological resources (e.g., UniMorph or derivational databases) to clarify novelty in scale and automation.
[§4] Notation for morphological features and semantic encodings should be defined more clearly, perhaps with an example table entry for each language.
[§6] The paper mentions 'various case studies' but the text would be improved by a summary table linking each study to specific dataset properties used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and positive evaluation of GLeMM's potential significance. We address each major comment below and describe the revisions we will make to the manuscript.

read point-by-point responses

Referee: [§3 (Construction Pipeline)] Construction section (described in the abstract and §3): The central claim that the identical automated extraction and annotation pipeline yields reliable morphological information across all seven languages lacks any quantitative validation. No precision, recall, error rates, coverage statistics, or gold-standard comparisons are reported, despite acknowledged differences in Wiktionary article structure and quality by language. This directly undermines the weakest assumption that the resource reliably supports the stated research questions.

Authors: We agree that quantitative validation metrics would strengthen the presentation of the pipeline's reliability. The manuscript's primary focus is on documenting the uniform, fully automated construction process that enables cross-lingual comparability. In the revision, we will add to §3 coverage statistics (entries and features per language), precision/recall estimates from manual inspection of random samples (200 entries per language), and a discussion of how the pipeline handles Wiktionary structural differences. These additions will provide concrete support for the resource's usability. revision: yes
Referee: [§4 (Annotations)] Annotation and semantic encoding (abstract and §4): While morphological features are automatically annotated and semantic descriptions are provided for a subset, the paper provides no details on validation of these annotations (e.g., inter-annotator agreement or per-language accuracy metrics). This is load-bearing for claims about enabling form-meaning analyses.

Authors: We acknowledge that the current version lacks explicit validation details for the automatic annotations. We will expand §4 to describe the annotation heuristics and rules in greater detail, report per-language accuracy figures obtained by comparing a held-out sample against manual gold standards, and clarify the extraction and coverage of semantic descriptions. This will better substantiate the dataset's value for form-meaning research. revision: yes
Referee: [§6 (Case Studies)] Case studies (§6): The applications for addressing questions on word-formation and testing computational methods are illustrated but without any empirical assessment of dataset quality or utility in those tasks, such as baseline experiments or error analysis on the extracted data.

Authors: The case studies are illustrative of potential uses rather than exhaustive evaluations. We agree that adding empirical elements would improve the section. In the revision, we will incorporate a baseline experiment in one case study (e.g., a rule-based derivational relation identifier evaluated on GLeMM data) together with performance metrics and error analysis. This will demonstrate practical utility while remaining within the scope of a dataset paper. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset construction is self-contained

full rationale

The paper describes the automated extraction of GLeMM from Wiktionary articles via a uniform pipeline, with automatic morphological annotation and partial semantic encoding. No equations, fitted parameters, predictions, or derivations appear in the abstract or described content. Claims about enabling research on form-meaning relations follow directly from the stated size, coverage, and identical processing across languages, without any reduction to self-defined quantities or load-bearing self-citations. This matches the expected non-circular outcome for a resource paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality and consistency of Wiktionary source data plus the correctness of the automated extraction pipeline; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Wiktionary articles contain sufficiently accurate and structured morphological information that can be extracted automatically and uniformly across languages
The entire resource is built from Wiktionary using an identical automated design.

pith-pipeline@v0.9.0 · 5544 in / 1220 out tokens · 34335 ms · 2026-05-10T15:06:55.946022+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages

[1]

Ackerman, Farrell, James P Blevins & Robert Malouf. 2009. Parts and wholes: Implicative patterns in inflectional paradigms. In James P Blevins & Juliette Blevins (eds.), Analogy in grammar: Form and acquisition, 54--81. Oxford: Oxford University Press

work page 2009
[2]

Albright, Adam & Bruce Hayes. 2003. Rules vs. analogy in E nglish past tenses: a computational/experimental study. Cognition 90(2). 119--161. doi:https://doi.org/10.1016/S0010-0277(03)00146-X

work page doi:10.1016/s0010-0277(03)00146-x 2003
[3]

Arndt-Lappe, Sabine. 2014. Analogy in suffix rivalry: the case of english -ity and -ness . English Language and Linguistics 18. 497--548

work page 2014
[4]

Aronoff, Mark. 1976. Word formation in generative grammar Linguistic Inquiry Monographs. Cambridge, MA: MIT Press

work page 1976
[5]

Aronoff, Mark. 2019. Competitors and alternants in linguistic morphology. In Franz Rainer, Wolfgang U. Dressler & Hans Christian Luschützky (eds.), Competition in inflection and word-formation, 39--66. Springer

work page 2019
[6]

Harald, Richard Piepenbrock & Leon Gulikers

Baayen, R. Harald, Richard Piepenbrock & Leon Gulikers. 1995. The CELEX lexical database (release 2). CD-ROM. Linguistic Data Consortium, Philadelphia, PA

work page 1995
[7]

Bagasheva, Alexandra. 2017. Comparative semantic concepts in affixation. In Santana Lario Juan & Salvador Valera (eds.), Competing patterns in E nglish affixation , 33--65. Peter Lang Bern

work page 2017
[8]

Barque, Lucie, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoit Crabbé & Vincent Segonne. 2020. FrSemCor : Annotating a F rench corpus with supersenses. In 12th edition of its language resources and evaluation conference ( LREC ) , ELRA. ://hal.archives-ouvertes.fr/hal-02511929

work page 2020
[9]

Batsuren, Khuyagbaatar, Gabor Bella & Fausto Giunchiglia. 2019. C og N et: A large-scale cognate database. In Proceedings of the 57th annual meeting of the association for computational linguistics, 3136--3145. Florence, Italy

work page 2019
[10]

Batsuren, Khuyagbaatar, G \'a bor Bella & Fausto Giunchiglia. 2021. M orphy N et: a large multilingual database of derivational and inflectional morphology. In Proceedings of the 18th SIGMORPHON workshop on computational research in phonetics, phonology, and morphology , 39--48

work page 2021
[11]

Batsuren, Khuyagbaatar, Omer Goldman & al. 2022. U ni M orph 4.0: U niversal M orphology. In Proceedings of the thirteenth language resources and evaluation conference, 840--855. Marseille, France

work page 2022
[12]

Bauer, Laurie. 2017. Compounds and compounding. Cambridge University Press

work page 2017
[13]

Beniamine, Sacha. 2018. Classifications flexionnelles. \'E tude quantitative des structures de paradigmes : Univeristé Paris Diderot Thèse de doctorat

work page 2018
[14]

Beniamine, Sacha & Mat \'i as Guzm \'a n Naranjo. 2021. Multiple alignments of inflectional paradigms. In Proceedings of the society for computation in linguistics 2021, 216--227

work page 2021
[15]

Bobkova, Natalia. 2025. La concurrence suffixale dans la construction des adjectifs dénominaux en russe : analyse des suffixes -n- , -sk- et -ov : Université de Toulouse Thèse de doctorat

work page 2025
[16]

Bobkova, Natalia & Fabio Montermini. 2023. A quantitative approach to doublets in R ussian denominal adjective construction. Word Structure 16(1). 63--86. doi:10.3366/word.2023.0221

work page doi:10.3366/word.2023.0221 2023
[17]

Bonami, Olivier & Sacha Beniamine. 2016. Joint predictiveness in inflectional paradigms. Word Structure 9(2). 156--182

work page 2016
[18]

Calderone, Basilio, Franck Sajous & Nabil Hathout. 2016. GLAW-IT : A free large I talian dictionary encoded in a fine-grained XML format. In Proceedings of the 49th annual meeting of the societas linguistica europaea (sle 2016), 43--45. Naples, Italy

work page 2016
[19]

Cardillo, Alberto Franco, Marcello Ferro, Claudia Marzi & Vito Pirrelli. 2018. Deep learning of inflection and the cell-filling problem. Italian Journal of Computational Linguistics 4(1). 57--75

work page 2018
[20]

Cotterell, Ryan & Hinrich Schütze. 2018. Joint semantic synthesis and morphological analysis of the derived word. Transactions of the Association for Computational Linguistics 6. 33--48

work page 2018
[21]

Creutz, Mathias & Krista Lagus. 2002. Unsupervised discovery of morphemes. In Proceedings of the ACL workshop on morphological and phonological learning , 21--30. Philadelphia, PA: ACL

work page 2002
[22]

Creutz, Mathias & Krista Lagus. 2004. Induction of a simple morphology for highly-inflecting languages. In Proceedings of the 7th meeting of the ACL special interest group in computational phonology: Current themes in computational phonology and morphology , 43--51. Barcelona, Spain

work page 2004
[23]

Creutz, Mathias & Krista Lagus. 2005. Unsupervised morpheme segmentation and morphology induction from text corpora using M orfessor 1.0. Tech. Rep. A81 Helsinki University of Technology

work page 2005
[24]

Dal, Georgette & Fiammetta Namer. 2022. \'Eco- lave plus vert, et il lave toute la famille. Neologica 16. 111--128. doi:10.48611/isbn.978-2-406-13219-6.p.0111

work page doi:10.48611/isbn.978-2-406-13219-6.p.0111 2022
[25]

Dendien, Jacques & Jean-Marie Pierrel. 2003. Le T résor de la L angue F rançaise informatisé: un exemple d'informatisation d'un dictionnaire de langue de référence. Traitement automatique des langues 44(2). 11--37

work page 2003
[26]

Fellbaum, Christiane. 1998. Wordnet: An electronic lexical database. MIT Press

work page 1998
[27]

Fellbaum, Christiane (ed.). 1999. Wordnet: an electronic lexical database. Cambridge, MA: MIT Press

work page 1999
[28]

Fellbaum, Christiane, Anne Osherson & Peter E. Clark. 2009. Putting semantics into W ord N et's ``morphosemantic'' links. In Human language technology. challenges of the information society, vol. 5603 Lecture Notes in Computer Science Volume, 350--358. Springer

work page 2009
[29]

Fradin, Bernard. 2019. Competition in derivation: What can we learn from F rench doublets in -age and -ment ? In Franz Rainer, Francesco Gardani, Wolfgang U. Dressler & Hans Christian Luschützky (eds.), Competition in inflection and word-formation, 67--93. Springer

work page 2019
[30]

Gage, Philip. 1994. A new algorithm for data compression. C Users Journal 12(2). 23–38

work page 1994
[31]

Goldsmith, John. 2001. Unsupervised learning of the morphology of natural language. Computational Linguistics 27(2). 153--198

work page 2001
[32]

Goldsmith, John. 2006. An algorithm for the unsupervised learning of morphology. Natural Language Engineering 12(4). 353--371

work page 2006
[33]

Guzm \'a n Naranjo, Mat \' as. 2020. Analogy, complexity and predictability in the R ussian nominal inflection system. Morphology 30(3). 219--262

work page 2020
[34]

Habash, Nizar & Bonnie Dorr. 2003. A categorial variation database for E nglish. In Proceedings of the human language technology and north american association for computational linguistics conference (naacl/hlt 2003), 96--102. Edmonton: ACL

work page 2003
[35]

Hathout, Nabil. 2001. Analogies morpho-synonymiques. U ne méthode d'acquisition automatique de liens morphologiques à partir d'un dictionnaire de synonymes. In Denis Maurel (ed.), Actes de la 8 \ conférence annuelle sur le traitement automatique des langues naturelles (taln-2001), 223--232. Tours: ATALA

work page 2001
[36]

Hathout, Nabil. 2002. From WordNet to CELEX : A cquiring morphological links from dictionaries of synonyms. In Proceedings of the third international conference on language resources and evaluation, 1478--1484. Las Palmas de Gran Canaria: ELRA

work page 2002
[37]

Hathout, Nabil. 2005. Exploiter la structure analogique du lexique construit : U ne approche computationnelle. Cahiers de lexicologie 87(2). 5--28

work page 2005
[38]

Hathout, Nabil. 2008. Acquisition of the morphological structure of the lexicon based on lexical similarity and formal analogy. In Proceedings of the coling workshop textgraphs-3, 1--8. Manchester: ACL

work page 2008
[39]

Hathout, Nabil. 2009 a . Acquisition morphologique à partir d'un dictionnaire informatisé. In Actes de la 16 \ conférence sur le traitement automatique des langues naturelles (taln-2009), Senlis: ATALA

work page 2009
[40]

Hathout, Nabil. 2009 b . Acquisition of morphological families and derivational series from a machine readable dictionary. In Fabio Montermini, Gilles Boyé & Jesse Tseng (eds.), Selected proceedings of the 6th décembrettes: Morphology in bordeaux, Somerville, MA: Cascadilla Proceedings Project

work page 2009
[41]

Hathout, Nabil. 2011 a . Morphonette: a paradigm-based morphological network. Lingue e linguaggio 2011(2). 243--262

work page 2011
[42]

Hathout, Nabil. 2011 b . Une approche topologique de la construction des mots : propositions théoriques et application à la préfixation en anti- . In Michel Roché, Gilles Boyé, Nabil Hathout, Stéphanie Lignon & Marc Plénat (eds.), Des unités morphologiques au lexique, 251--318. Hermès Science-Lavoisier

work page 2011
[43]

Hathout, Nabil. 2014. Phonotactics in morphological similarity metrics. Language Sciences 46. 71--83

work page 2014
[44]

Hathout, Nabil. 2016. La question des données en morphologie. Cahiers de l'ILSL 45. 123--160

work page 2016
[45]

Hathout, Nabil, Basilio Calderone, Franck Sajous & Fiammetta Namer. 2025. Form and meaning in word-formation: Who does what? Manuscript

work page 2025
[46]

Hathout, Nabil, Fabio Montermini & Ludovic Tanguy. 2008. Extensive data for morphology: U sing the W orld W ide W eb. Journal of F rench Language Studies 18(1). 67--85

work page 2008
[47]

Hathout, Nabil & Fiammetta Namer. 2014. Démonette, a F rench derivational morpho-semantic network. Linguistic Issues in Language Technology 11(5). 125--168

work page 2014
[48]

Hathout, Nabil & Fiammetta Namer. 2016. Giving lexical resources a second life: D émonette, a multi-sourced morpho-semantic network for F rench. In Proceedings of the tenth international conference on language resources and evaluation ( LREC 2016) , Portorož, Slovenia

work page 2016
[49]

Hathout, Nabil & Fiammetta Namer. 2018. La parasynthèse à travers les modèles : des RCL au P ara D is. In Olivier Bonami, Gilles Boyé, Georgette Dal, Hélène Giraudo & Fiammetta Namer (eds.), The lexeme in descriptive and theorical morphology, 365--399. Langage Sciences Press

work page 2018
[50]

Hathout, Nabil & Fiammetta Namer. 2025. What do derivational paradigms tell us about back-formation and what does back-formation tell us about derivational paradigms? Word Structure 18(3). 239--280

work page 2025
[51]

Hathout, Nabil, Fiammetta Namer, Marc Plénat & Ludovic Tanguy. 2009. La collecte et l'utilisation des données en morphologie. In Bernard Fradin, Françoise Kerleroux & Marc Plénat (eds.), Aperçus de morphologie du français, 267--287. Saint-Denis: Presses universitaires de Vincennes

work page 2009
[52]

Hathout, Nabil & Franck Sajous. 2016. Wiktionnaire's W ikicode GLAWI fied: a workable F rench machine-readable dictionary. In Proceedings of the tenth international conference on language resources and evaluation ( LREC 2016) , Portorož, Slovenia

work page 2016
[53]

Hathout, Nabil, Franck Sajous & Basilio Calderone. 2014. Acquisition and enrichment of morphological and morphosemantic knowledge from the F rench W iktionary. In Proceedings of the COLING workshop on lexical and grammatical resources for language processing , 65--74. Dublin, Ireland

work page 2014
[54]

Hathout, Nabil, Franck Sajous, Basilio Calderone & Fiammetta Namer. 2020. G lawinette: a linguistically motivated derivational description of F rench acquired from GLAWI . In Proceedings of the twelfth international conference on language resources and evaluation ( LREC 2020) , 3870--3878. Marseille

work page 2020
[55]

Hay, Jennifer & Harald Baayen. 2003. Phonotactics, parsing and productivity. Italian Journal of Linguistics 15(1). 99–130

work page 2003
[56]

Hledíková, Hana & Magda Ševčíková. 2024. Conversion in languages with different morphological structures: a semantic comparison of E nglish and C zech. Morphology 34(1). 73--102. doi:10.1007/s11525-024-09422-1

work page doi:10.1007/s11525-024-09422-1 2024
[57]

Huguin, Mathilde, Lucie Barque, Pauline Haas & Delphine Tribout. 2023. Typage sémantique des noms dans la ressource morphologique D émonette. Lexique 33. 41--56. doi:10.54563/lexique.1086. ://www.peren-revues.fr/lexique/1086

work page doi:10.54563/lexique.1086 2023
[58]

Huyghe, Richard & Rossella Varvara. 2023. Affix rivalry: Theoretical and methodological challenges. Word Structure 16(1). 1--23

work page 2023
[59]

Harald Baayen

de Jong, Nivja H., Robert Schreuder & R. Harald Baayen. 2000. The morphological family size effect and morphology. Language and cognitive processes 15(4/5). 329--365

work page 2000
[60]

Kann, Katharina & Hinrich Sch \"u tze. 2016. Single-model encoder-decoder with explicit morphological representation for reinflection. In Katrin Erk & Noah A. Smith (eds.), Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers), 555--560. Berlin, Germany: Association for Computational Linguistics

work page 2016
[61]

Kelling, Carmen. 2001. Agentivity and suffix selection. In Proceedings of the LFG conference , 147--162. Stanford, CA: CSLI

work page 2001
[62]

Koehl, Aurore. 2012. La construction morphologique des noms désadjectivaux suffixés en français. Nancy: Université de L orraine Thèse de doctorat

work page 2012
[63]

Koehl, Aurore & Stéphanie Lignon. 2014. Property nouns with -ité and -itude: formal alternation and morphopragmatics or the sad-itude of the A ité _ N . Morphology 24(4). 351--376

work page 2014
[64]

Kyj \' a nek, Luk \' a s . 2018. Morphological resources of derivational word-formation relations. Tech. Rep. 61 \' U FAL - Charles University Prague

work page 2018
[65]

Kyj \'a nek, Luk \'a s , Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky & Zden e k Z abokrtsk \'y . 2022. Constructing a lexical resource of R ussian derivational morphology. In Proceedings of the thirteenth language resources and evaluation conference, 2788--2797. Marseille, France

work page 2022
[66]

Kyj \'a nek, Luk \'a s , Zden e k Z abokrtsk \'y , Jon \'a s Vidra & Magda S ev c \' kov \'a . 2021. Universal derivations v1.1. LINDAT / CLARIAH - CZ digital library at the Institute of Formal and Applied Linguistics ( \'U FAL ), Faculty of Mathematics and Physics, Charles University

work page 2021
[67]

Kyjánek, Lukáš, Zdenĕk Žabokrtský, Magda Ševčíková & Jonáš Vidra. 2020. U niversal D erivations 1.0, a growing collection of harmonised word-formation resources. The Prague Bulletin of Mathematical Linguistics 115. 5--30

work page 2020
[68]

Langlais, Philippe & François Yvon. 2008. Scaling up analogical learning. In Proceedings of the 22nd international conference on computational linguistics (coling 2008), 51–54. Manchester

work page 2008
[69]

Lango, Mateusz, Magda S ev c \'i kov \'a & Zden e k Z abokrtsk \'y . 2018. Semi-automatic construction of word-formation networks (for P olish and S panish). In Proceedings of the eleventh international conference on language resources and evaluation ( LREC 2018) , Miyazaki, Japan

work page 2018
[70]

Lango, Mateusz, Zdenĕk Žabokrtský & Magda Ševčíková. 2021. Semi-automatic construction of word-formation networks. Language Resources and Evaluation 55. 3--32. doi:10.1007/s10579-019-09484-2

work page doi:10.1007/s10579-019-09484-2 2021
[71]

Lavallée, Jean-François & Philippe Langlais. 2009. Morphological acquisition by formal analogy. In Working notes for the morphochallenge at clef 2009, Corfu, Greece

work page 2009
[72]

Lepage, Yves. 1998. Solving analogies on words: A n algorithm. In Proceedings of the 36th annual meeting of the association for computational linguistics and of the 17th international conference on computational linguistics, vol. 2, 728--735. Montréal

work page 1998
[73]

Lepage, Yves. 2003. De l'analogie rendant compte de la commutation en linguistique. Grenoble: Université Joseph Fourier Habilitation à diriger des recherches

work page 2003
[74]

Lepage, Yves. 2004. Analogy and formal languages. Electronic Notes in Theoretical Computer Science 53. 180--191. Proceedings of the the 6th Conference on Formal Grammar and the 7th on the Mathematics of Language (FG/MOL-2001)

work page 2004
[75]

Lignon, Stéphanie, Georgette Dal, Nabil Hathout & Fiammetta Namer. 2025. La morphophonologie est-elle paradigmatique ? P hononette vous répond. Langue Française 228. 59--16

work page 2025
[76]

Lignon, Stéphanie, Fiammetta Namer & Florence Villoing. 2014. De l'agglutination à la triangulation ou comment expliquer certaines séries morphologiques. In Actes du 4 \ congrès mondial de linguistique française ( CMLF 2014) , 1813--1836

work page 2014
[77]

Lignon, Stéphanie & Michel Roché. 2011. Entre histoire et morphophonologie, quelle distribution pour -éen vs -ien ? In Michel Roché, Gilles Boyé, Nabil Hathout, Stéphanie Lignon & Marc Plénat (eds.), Des unités morphologiques au lexique, 191--250. Hermès Science-Lavoisier

work page 2011
[78]

Lindsay, Mark & Mark Aronoff. 2013. Natural selection in self-organizing morphological systems. In Nabil Hathout, Fabio Montermini & Jesse Tseng (eds.), M orphology in T oulouse , 133--153. München: Lincom Europa

work page 2013
[79]

Malouf, Rob. 2017. Abstractive morphological learning with a recurrent neural network. Morphology 27(4). 431–458

work page 2017
[80]

Marchand, Hans. 1969. The categories and types of present-day E nglish word-formation: A synchronic-diachronic approach . Beck

work page 1969

Showing first 80 references.

[1] [1]

Ackerman, Farrell, James P Blevins & Robert Malouf. 2009. Parts and wholes: Implicative patterns in inflectional paradigms. In James P Blevins & Juliette Blevins (eds.), Analogy in grammar: Form and acquisition, 54--81. Oxford: Oxford University Press

work page 2009

[2] [2]

Albright, Adam & Bruce Hayes. 2003. Rules vs. analogy in E nglish past tenses: a computational/experimental study. Cognition 90(2). 119--161. doi:https://doi.org/10.1016/S0010-0277(03)00146-X

work page doi:10.1016/s0010-0277(03)00146-x 2003

[3] [3]

Arndt-Lappe, Sabine. 2014. Analogy in suffix rivalry: the case of english -ity and -ness . English Language and Linguistics 18. 497--548

work page 2014

[4] [4]

Aronoff, Mark. 1976. Word formation in generative grammar Linguistic Inquiry Monographs. Cambridge, MA: MIT Press

work page 1976

[5] [5]

Aronoff, Mark. 2019. Competitors and alternants in linguistic morphology. In Franz Rainer, Wolfgang U. Dressler & Hans Christian Luschützky (eds.), Competition in inflection and word-formation, 39--66. Springer

work page 2019

[6] [6]

Harald, Richard Piepenbrock & Leon Gulikers

Baayen, R. Harald, Richard Piepenbrock & Leon Gulikers. 1995. The CELEX lexical database (release 2). CD-ROM. Linguistic Data Consortium, Philadelphia, PA

work page 1995

[7] [7]

Bagasheva, Alexandra. 2017. Comparative semantic concepts in affixation. In Santana Lario Juan & Salvador Valera (eds.), Competing patterns in E nglish affixation , 33--65. Peter Lang Bern

work page 2017

[8] [8]

Barque, Lucie, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoit Crabbé & Vincent Segonne. 2020. FrSemCor : Annotating a F rench corpus with supersenses. In 12th edition of its language resources and evaluation conference ( LREC ) , ELRA. ://hal.archives-ouvertes.fr/hal-02511929

work page 2020

[9] [9]

Batsuren, Khuyagbaatar, Gabor Bella & Fausto Giunchiglia. 2019. C og N et: A large-scale cognate database. In Proceedings of the 57th annual meeting of the association for computational linguistics, 3136--3145. Florence, Italy

work page 2019

[10] [10]

Batsuren, Khuyagbaatar, G \'a bor Bella & Fausto Giunchiglia. 2021. M orphy N et: a large multilingual database of derivational and inflectional morphology. In Proceedings of the 18th SIGMORPHON workshop on computational research in phonetics, phonology, and morphology , 39--48

work page 2021

[11] [11]

Batsuren, Khuyagbaatar, Omer Goldman & al. 2022. U ni M orph 4.0: U niversal M orphology. In Proceedings of the thirteenth language resources and evaluation conference, 840--855. Marseille, France

work page 2022

[12] [12]

Bauer, Laurie. 2017. Compounds and compounding. Cambridge University Press

work page 2017

[13] [13]

Beniamine, Sacha. 2018. Classifications flexionnelles. \'E tude quantitative des structures de paradigmes : Univeristé Paris Diderot Thèse de doctorat

work page 2018

[14] [14]

Beniamine, Sacha & Mat \'i as Guzm \'a n Naranjo. 2021. Multiple alignments of inflectional paradigms. In Proceedings of the society for computation in linguistics 2021, 216--227

work page 2021

[15] [15]

Bobkova, Natalia. 2025. La concurrence suffixale dans la construction des adjectifs dénominaux en russe : analyse des suffixes -n- , -sk- et -ov : Université de Toulouse Thèse de doctorat

work page 2025

[16] [16]

Bobkova, Natalia & Fabio Montermini. 2023. A quantitative approach to doublets in R ussian denominal adjective construction. Word Structure 16(1). 63--86. doi:10.3366/word.2023.0221

work page doi:10.3366/word.2023.0221 2023

[17] [17]

Bonami, Olivier & Sacha Beniamine. 2016. Joint predictiveness in inflectional paradigms. Word Structure 9(2). 156--182

work page 2016

[18] [18]

Calderone, Basilio, Franck Sajous & Nabil Hathout. 2016. GLAW-IT : A free large I talian dictionary encoded in a fine-grained XML format. In Proceedings of the 49th annual meeting of the societas linguistica europaea (sle 2016), 43--45. Naples, Italy

work page 2016

[19] [19]

Cardillo, Alberto Franco, Marcello Ferro, Claudia Marzi & Vito Pirrelli. 2018. Deep learning of inflection and the cell-filling problem. Italian Journal of Computational Linguistics 4(1). 57--75

work page 2018

[20] [20]

Cotterell, Ryan & Hinrich Schütze. 2018. Joint semantic synthesis and morphological analysis of the derived word. Transactions of the Association for Computational Linguistics 6. 33--48

work page 2018

[21] [21]

Creutz, Mathias & Krista Lagus. 2002. Unsupervised discovery of morphemes. In Proceedings of the ACL workshop on morphological and phonological learning , 21--30. Philadelphia, PA: ACL

work page 2002

[22] [22]

Creutz, Mathias & Krista Lagus. 2004. Induction of a simple morphology for highly-inflecting languages. In Proceedings of the 7th meeting of the ACL special interest group in computational phonology: Current themes in computational phonology and morphology , 43--51. Barcelona, Spain

work page 2004

[23] [23]

Creutz, Mathias & Krista Lagus. 2005. Unsupervised morpheme segmentation and morphology induction from text corpora using M orfessor 1.0. Tech. Rep. A81 Helsinki University of Technology

work page 2005

[24] [24]

Dal, Georgette & Fiammetta Namer. 2022. \'Eco- lave plus vert, et il lave toute la famille. Neologica 16. 111--128. doi:10.48611/isbn.978-2-406-13219-6.p.0111

work page doi:10.48611/isbn.978-2-406-13219-6.p.0111 2022

[25] [25]

Dendien, Jacques & Jean-Marie Pierrel. 2003. Le T résor de la L angue F rançaise informatisé: un exemple d'informatisation d'un dictionnaire de langue de référence. Traitement automatique des langues 44(2). 11--37

work page 2003

[26] [26]

Fellbaum, Christiane. 1998. Wordnet: An electronic lexical database. MIT Press

work page 1998

[27] [27]

Fellbaum, Christiane (ed.). 1999. Wordnet: an electronic lexical database. Cambridge, MA: MIT Press

work page 1999

[28] [28]

Fellbaum, Christiane, Anne Osherson & Peter E. Clark. 2009. Putting semantics into W ord N et's ``morphosemantic'' links. In Human language technology. challenges of the information society, vol. 5603 Lecture Notes in Computer Science Volume, 350--358. Springer

work page 2009

[29] [29]

Fradin, Bernard. 2019. Competition in derivation: What can we learn from F rench doublets in -age and -ment ? In Franz Rainer, Francesco Gardani, Wolfgang U. Dressler & Hans Christian Luschützky (eds.), Competition in inflection and word-formation, 67--93. Springer

work page 2019

[30] [30]

Gage, Philip. 1994. A new algorithm for data compression. C Users Journal 12(2). 23–38

work page 1994

[31] [31]

Goldsmith, John. 2001. Unsupervised learning of the morphology of natural language. Computational Linguistics 27(2). 153--198

work page 2001

[32] [32]

Goldsmith, John. 2006. An algorithm for the unsupervised learning of morphology. Natural Language Engineering 12(4). 353--371

work page 2006

[33] [33]

Guzm \'a n Naranjo, Mat \' as. 2020. Analogy, complexity and predictability in the R ussian nominal inflection system. Morphology 30(3). 219--262

work page 2020

[34] [34]

Habash, Nizar & Bonnie Dorr. 2003. A categorial variation database for E nglish. In Proceedings of the human language technology and north american association for computational linguistics conference (naacl/hlt 2003), 96--102. Edmonton: ACL

work page 2003

[35] [35]

Hathout, Nabil. 2001. Analogies morpho-synonymiques. U ne méthode d'acquisition automatique de liens morphologiques à partir d'un dictionnaire de synonymes. In Denis Maurel (ed.), Actes de la 8 \ conférence annuelle sur le traitement automatique des langues naturelles (taln-2001), 223--232. Tours: ATALA

work page 2001

[36] [36]

Hathout, Nabil. 2002. From WordNet to CELEX : A cquiring morphological links from dictionaries of synonyms. In Proceedings of the third international conference on language resources and evaluation, 1478--1484. Las Palmas de Gran Canaria: ELRA

work page 2002

[37] [37]

Hathout, Nabil. 2005. Exploiter la structure analogique du lexique construit : U ne approche computationnelle. Cahiers de lexicologie 87(2). 5--28

work page 2005

[38] [38]

Hathout, Nabil. 2008. Acquisition of the morphological structure of the lexicon based on lexical similarity and formal analogy. In Proceedings of the coling workshop textgraphs-3, 1--8. Manchester: ACL

work page 2008

[39] [39]

Hathout, Nabil. 2009 a . Acquisition morphologique à partir d'un dictionnaire informatisé. In Actes de la 16 \ conférence sur le traitement automatique des langues naturelles (taln-2009), Senlis: ATALA

work page 2009

[40] [40]

Hathout, Nabil. 2009 b . Acquisition of morphological families and derivational series from a machine readable dictionary. In Fabio Montermini, Gilles Boyé & Jesse Tseng (eds.), Selected proceedings of the 6th décembrettes: Morphology in bordeaux, Somerville, MA: Cascadilla Proceedings Project

work page 2009

[41] [41]

Hathout, Nabil. 2011 a . Morphonette: a paradigm-based morphological network. Lingue e linguaggio 2011(2). 243--262

work page 2011

[42] [42]

Hathout, Nabil. 2011 b . Une approche topologique de la construction des mots : propositions théoriques et application à la préfixation en anti- . In Michel Roché, Gilles Boyé, Nabil Hathout, Stéphanie Lignon & Marc Plénat (eds.), Des unités morphologiques au lexique, 251--318. Hermès Science-Lavoisier

work page 2011

[43] [43]

Hathout, Nabil. 2014. Phonotactics in morphological similarity metrics. Language Sciences 46. 71--83

work page 2014

[44] [44]

Hathout, Nabil. 2016. La question des données en morphologie. Cahiers de l'ILSL 45. 123--160

work page 2016

[45] [45]

Hathout, Nabil, Basilio Calderone, Franck Sajous & Fiammetta Namer. 2025. Form and meaning in word-formation: Who does what? Manuscript

work page 2025

[46] [46]

Hathout, Nabil, Fabio Montermini & Ludovic Tanguy. 2008. Extensive data for morphology: U sing the W orld W ide W eb. Journal of F rench Language Studies 18(1). 67--85

work page 2008

[47] [47]

Hathout, Nabil & Fiammetta Namer. 2014. Démonette, a F rench derivational morpho-semantic network. Linguistic Issues in Language Technology 11(5). 125--168

work page 2014

[48] [48]

Hathout, Nabil & Fiammetta Namer. 2016. Giving lexical resources a second life: D émonette, a multi-sourced morpho-semantic network for F rench. In Proceedings of the tenth international conference on language resources and evaluation ( LREC 2016) , Portorož, Slovenia

work page 2016

[49] [49]

Hathout, Nabil & Fiammetta Namer. 2018. La parasynthèse à travers les modèles : des RCL au P ara D is. In Olivier Bonami, Gilles Boyé, Georgette Dal, Hélène Giraudo & Fiammetta Namer (eds.), The lexeme in descriptive and theorical morphology, 365--399. Langage Sciences Press

work page 2018

[50] [50]

Hathout, Nabil & Fiammetta Namer. 2025. What do derivational paradigms tell us about back-formation and what does back-formation tell us about derivational paradigms? Word Structure 18(3). 239--280

work page 2025

[51] [51]

Hathout, Nabil, Fiammetta Namer, Marc Plénat & Ludovic Tanguy. 2009. La collecte et l'utilisation des données en morphologie. In Bernard Fradin, Françoise Kerleroux & Marc Plénat (eds.), Aperçus de morphologie du français, 267--287. Saint-Denis: Presses universitaires de Vincennes

work page 2009

[52] [52]

Hathout, Nabil & Franck Sajous. 2016. Wiktionnaire's W ikicode GLAWI fied: a workable F rench machine-readable dictionary. In Proceedings of the tenth international conference on language resources and evaluation ( LREC 2016) , Portorož, Slovenia

work page 2016

[53] [53]

Hathout, Nabil, Franck Sajous & Basilio Calderone. 2014. Acquisition and enrichment of morphological and morphosemantic knowledge from the F rench W iktionary. In Proceedings of the COLING workshop on lexical and grammatical resources for language processing , 65--74. Dublin, Ireland

work page 2014

[54] [54]

Hathout, Nabil, Franck Sajous, Basilio Calderone & Fiammetta Namer. 2020. G lawinette: a linguistically motivated derivational description of F rench acquired from GLAWI . In Proceedings of the twelfth international conference on language resources and evaluation ( LREC 2020) , 3870--3878. Marseille

work page 2020

[55] [55]

Hay, Jennifer & Harald Baayen. 2003. Phonotactics, parsing and productivity. Italian Journal of Linguistics 15(1). 99–130

work page 2003

[56] [56]

Hledíková, Hana & Magda Ševčíková. 2024. Conversion in languages with different morphological structures: a semantic comparison of E nglish and C zech. Morphology 34(1). 73--102. doi:10.1007/s11525-024-09422-1

work page doi:10.1007/s11525-024-09422-1 2024

[57] [57]

Huguin, Mathilde, Lucie Barque, Pauline Haas & Delphine Tribout. 2023. Typage sémantique des noms dans la ressource morphologique D émonette. Lexique 33. 41--56. doi:10.54563/lexique.1086. ://www.peren-revues.fr/lexique/1086

work page doi:10.54563/lexique.1086 2023

[58] [58]

Huyghe, Richard & Rossella Varvara. 2023. Affix rivalry: Theoretical and methodological challenges. Word Structure 16(1). 1--23

work page 2023

[59] [59]

Harald Baayen

de Jong, Nivja H., Robert Schreuder & R. Harald Baayen. 2000. The morphological family size effect and morphology. Language and cognitive processes 15(4/5). 329--365

work page 2000

[60] [60]

Kann, Katharina & Hinrich Sch \"u tze. 2016. Single-model encoder-decoder with explicit morphological representation for reinflection. In Katrin Erk & Noah A. Smith (eds.), Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers), 555--560. Berlin, Germany: Association for Computational Linguistics

work page 2016

[61] [61]

Kelling, Carmen. 2001. Agentivity and suffix selection. In Proceedings of the LFG conference , 147--162. Stanford, CA: CSLI

work page 2001

[62] [62]

Koehl, Aurore. 2012. La construction morphologique des noms désadjectivaux suffixés en français. Nancy: Université de L orraine Thèse de doctorat

work page 2012

[63] [63]

Koehl, Aurore & Stéphanie Lignon. 2014. Property nouns with -ité and -itude: formal alternation and morphopragmatics or the sad-itude of the A ité _ N . Morphology 24(4). 351--376

work page 2014

[64] [64]

Kyj \' a nek, Luk \' a s . 2018. Morphological resources of derivational word-formation relations. Tech. Rep. 61 \' U FAL - Charles University Prague

work page 2018

[65] [65]

Kyj \'a nek, Luk \'a s , Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky & Zden e k Z abokrtsk \'y . 2022. Constructing a lexical resource of R ussian derivational morphology. In Proceedings of the thirteenth language resources and evaluation conference, 2788--2797. Marseille, France

work page 2022

[66] [66]

Kyj \'a nek, Luk \'a s , Zden e k Z abokrtsk \'y , Jon \'a s Vidra & Magda S ev c \' kov \'a . 2021. Universal derivations v1.1. LINDAT / CLARIAH - CZ digital library at the Institute of Formal and Applied Linguistics ( \'U FAL ), Faculty of Mathematics and Physics, Charles University

work page 2021

[67] [67]

Kyjánek, Lukáš, Zdenĕk Žabokrtský, Magda Ševčíková & Jonáš Vidra. 2020. U niversal D erivations 1.0, a growing collection of harmonised word-formation resources. The Prague Bulletin of Mathematical Linguistics 115. 5--30

work page 2020

[68] [68]

Langlais, Philippe & François Yvon. 2008. Scaling up analogical learning. In Proceedings of the 22nd international conference on computational linguistics (coling 2008), 51–54. Manchester

work page 2008

[69] [69]

Lango, Mateusz, Magda S ev c \'i kov \'a & Zden e k Z abokrtsk \'y . 2018. Semi-automatic construction of word-formation networks (for P olish and S panish). In Proceedings of the eleventh international conference on language resources and evaluation ( LREC 2018) , Miyazaki, Japan

work page 2018

[70] [70]

Lango, Mateusz, Zdenĕk Žabokrtský & Magda Ševčíková. 2021. Semi-automatic construction of word-formation networks. Language Resources and Evaluation 55. 3--32. doi:10.1007/s10579-019-09484-2

work page doi:10.1007/s10579-019-09484-2 2021

[71] [71]

Lavallée, Jean-François & Philippe Langlais. 2009. Morphological acquisition by formal analogy. In Working notes for the morphochallenge at clef 2009, Corfu, Greece

work page 2009

[72] [72]

Lepage, Yves. 1998. Solving analogies on words: A n algorithm. In Proceedings of the 36th annual meeting of the association for computational linguistics and of the 17th international conference on computational linguistics, vol. 2, 728--735. Montréal

work page 1998

[73] [73]

Lepage, Yves. 2003. De l'analogie rendant compte de la commutation en linguistique. Grenoble: Université Joseph Fourier Habilitation à diriger des recherches

work page 2003

[74] [74]

Lepage, Yves. 2004. Analogy and formal languages. Electronic Notes in Theoretical Computer Science 53. 180--191. Proceedings of the the 6th Conference on Formal Grammar and the 7th on the Mathematics of Language (FG/MOL-2001)

work page 2004

[75] [75]

Lignon, Stéphanie, Georgette Dal, Nabil Hathout & Fiammetta Namer. 2025. La morphophonologie est-elle paradigmatique ? P hononette vous répond. Langue Française 228. 59--16

work page 2025

[76] [76]

Lignon, Stéphanie, Fiammetta Namer & Florence Villoing. 2014. De l'agglutination à la triangulation ou comment expliquer certaines séries morphologiques. In Actes du 4 \ congrès mondial de linguistique française ( CMLF 2014) , 1813--1836

work page 2014

[77] [77]

Lignon, Stéphanie & Michel Roché. 2011. Entre histoire et morphophonologie, quelle distribution pour -éen vs -ien ? In Michel Roché, Gilles Boyé, Nabil Hathout, Stéphanie Lignon & Marc Plénat (eds.), Des unités morphologiques au lexique, 191--250. Hermès Science-Lavoisier

work page 2011

[78] [78]

Lindsay, Mark & Mark Aronoff. 2013. Natural selection in self-organizing morphological systems. In Nabil Hathout, Fabio Montermini & Jesse Tseng (eds.), M orphology in T oulouse , 133--153. München: Lincom Europa

work page 2013

[79] [79]

Malouf, Rob. 2017. Abstractive morphological learning with a recurrent neural network. Morphology 27(4). 431–458

work page 2017

[80] [80]

Marchand, Hans. 1969. The categories and types of present-day E nglish word-formation: A synchronic-diachronic approach . Beck

work page 1969