{"paper":{"title":"Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan","license":"http://creativecommons.org/licenses/by/4.0/","headline":"An interpretable neural framework shows grammatical gender cues shifting from Latin word forms to Occitan sentence context.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Ahan Chatterjee, Esteban Garces Arias, Marinus Wiedner, Matthias A{\\ss}enmacher, Matthias Sch\\\"offel","submitted_at":"2026-05-09T20:36:49Z","abstract_excerpt":"The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine) in most Romance languages. In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexica"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels... Together, these analyses characterize the distribution of gender information between the lemma and its sentential context.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That neural network predictions and feature attributions on limited historical data accurately reflect genuine diachronic linguistic processes rather than artifacts of tokenization, data scarcity, or model biases.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"An interpretable deep learning framework with a new tokenizer is used to quantify how grammatical gender information is distributed between lemmas and sentential context during the Latin-to-Occitan transition.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"An interpretable neural framework shows grammatical gender cues shifting from Latin word forms to Occitan sentence context.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"5cb8325c9b8a254152101c611fea30b259790f4a63c4358e05e789c0e4970586"},"source":{"id":"2605.09156","kind":"arxiv","version":2},"verdict":{"id":"347af29b-766e-4c10-9a7c-32b80b78dc1e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T03:51:09.540267Z","strongest_claim":"we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels... Together, these analyses characterize the distribution of gender information between the lemma and its sentential context.","one_line_summary":"An interpretable deep learning framework with a new tokenizer is used to quantify how grammatical gender information is distributed between lemmas and sentential context during the Latin-to-Occitan transition.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That neural network predictions and feature attributions on limited historical data accurately reflect genuine diachronic linguistic processes rather than artifacts of tokenization, data scarcity, or model biases.","pith_extraction_headline":"An interpretable neural framework shows grammatical gender cues shifting from Latin word forms to Occitan sentence context."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.09156/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-20T08:22:01.618854Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T20:36:33.381928Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T13:31:18.609681Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T10:29:37.262432Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"c8bbebd9bc236c92fde6abb071e4bfd7f8e4fdec8586f87159fd2d8ef4e2be96"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}