Modeling Narrative Structure in Latin Epic Poetry with Automatically Generated Story Grammars

Abigail Swenor; John James; Neil Coffee; Walter Scheirer

arxiv: 2502.12276 · v2 · submitted 2025-02-17 · 💻 cs.CL

Modeling Narrative Structure in Latin Epic Poetry with Automatically Generated Story Grammars

Abigail Swenor , John James , Neil Coffee , Walter Scheirer This is my paper

Pith reviewed 2026-05-23 02:35 UTC · model grok-4.3

classification 💻 cs.CL

keywords story grammarnarrative structureLatin epic poetrylarge language modelsfew-shot learningcomputational literary analysisstory elementsnarrative comprehension

0 comments

The pith

Large language models with few-shot learning can automatically label story grammar elements in Latin epic poetry to analyze narrative structure and style.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method that uses an LLM pipeline and few-shot learning to assign story element labels to Latin epic texts. These labels serve as an interpretable representation that simulates aspects of human narrative comprehension. The output is applied directly to examine story structure and style across works. The approach is positioned as accessible for both literary scholars seeking new patterns and for machine learning tasks that need structured features from literary text.

Core claim

An LLM pipeline using few-shot learning generates story grammar labels for Latin epic poetry, and these labels are used directly to support analysis of narrative structure and style in a way that remains interpretable to humanists and technologists.

What carries the argument

LLM pipeline with few-shot learning that produces story element labels for input texts.

If this is right

Literary scholars gain a tool to discover new areas of interest across multiple texts.
The labels supply a new feature set usable in downstream machine learning tasks.
Computational analysis of literary text shifts from abstract embeddings toward context-rich story elements.
The method extends interpretable narrative modeling to classical languages like Latin.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could be tested on other poetic traditions or prose genres to check if story grammar labels transfer.
If labels prove consistent, manual annotation efforts for narrative datasets might decrease.
Integration with existing digital humanities tools could allow scholars to query labeled corpora for structural patterns without coding expertise.

Load-bearing premise

Few-shot prompting of an LLM produces story grammar labels that accurately and meaningfully capture narrative structure in Latin epic poetry.

What would settle it

Human experts independently label the same passages of Latin epic poetry with story grammar elements and the resulting label sets show low agreement with the LLM outputs.

Figures

Figures reproduced from arXiv: 2502.12276 by Abigail Swenor, John James, Neil Coffee, Walter Scheirer.

**Figure 1.** Figure 1: The SGSM computational pipeline that processes two texts 𝑇1 and 𝑇2 by first passing them through the BERT model 𝑀. The output of the model 𝑀 produces a feature set 𝑓 𝑖 for every text passage 𝑡𝑖 . Each of these feature sets 𝑓 𝑖 is a set of story grammar labels ℓ, which ends the labeling stage. Lastly, the matching stage compares each text passage 𝑡𝑖 from the source text 𝑇1 against every text passage 𝑡𝑖 fro… view at source ↗

read the original abstract

Computational methods for analyzing prose and poetry utilize word embeddings and other abstract representations that sometimes obscure context-rich literary text. Inspired by the psychology of reading, we utilize story structure and elements to simulate human narrative comprehension to produce a more comprehensive representation of literary text. We present a method for automatically generating story grammar labels for input texts as a means of analysis that is interpretable and accessible by humanists and technologists alike. Using a large language model (LLM) pipeline and few-shot learning, we label Latin epic poetry with story element labels and use this output directly to aid an analysis of the story structure and style. Our method guides literary scholars to discover new areas of interest across texts and provides a new feature set for further study for downstream machine learning tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies few-shot LLM prompting to generate story grammar labels on Latin epic poetry but reports no validation of label quality or accuracy.

read the letter

The main thing here is a pipeline that feeds Latin epic texts into an LLM with few-shot examples to assign story element labels, then uses the output for narrative structure and style analysis. This combination applied to classical Latin is not in the cited prior work, so the concrete setup is new. The paper does well in choosing labels meant to be readable by humanists instead of opaque vectors, and in linking the idea back to psychology of reading for motivation. The goal of producing something that could also serve as features for downstream ML tasks is reasonable on paper. The soft spot is the complete absence of any grounding for the labels. The abstract describes the method but gives no accuracy numbers, no expert agreement scores, no error analysis, and no sample output on actual Latin passages. That makes the claim that the labels aid meaningful analysis rest on an untested assumption about how well few-shot prompting works in this low-resource domain. The citation pattern looks standard and the circularity burden is low since there are no fitted parameters or self-referential quantities. This is for digital humanities researchers who want interpretable computational tools for classical texts. A reader already working at the classics-NLP boundary could pick up the prompting approach as a starting point. It deserves peer review because the domain and the interpretability angle are worth referee time even if the current version needs a validation section added.

Referee Report

1 major / 1 minor

Summary. The paper claims that an LLM pipeline with few-shot learning can automatically generate story-grammar labels for Latin epic poetry; these labels are then used directly to analyze narrative structure and style, yielding results that are interpretable by both humanists and technologists and that supply new features for downstream ML tasks.

Significance. If the generated labels prove reliable, the work would supply an accessible, human-readable representation of narrative structure for low-resource classical texts, potentially enabling new comparative analyses across epics and a novel feature set for computational literary studies. The absence of any reported validation, however, prevents assessment of whether this potential is realized.

major comments (1)

[Abstract, §3] Abstract and §3 (method description): the central claim that the LLM-generated labels 'aid an analysis of the story structure and style' and are 'interpretable and accessible by humanists' rests on the untested premise that the labels accurately capture narrative elements. No human-expert comparison, inter-annotator agreement, accuracy metrics, or even qualitative error analysis on the Latin output is reported anywhere in the manuscript.

minor comments (1)

[Abstract, Introduction] The abstract and introduction repeatedly use 'story grammar labels' and 'story element labels' without an explicit definition or example set of the label inventory; a short table or appendix listing the grammar would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive report. The primary concern is the lack of validation for the generated labels, which we address point-by-point below. We agree this is a substantive gap and will revise accordingly.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (method description): the central claim that the LLM-generated labels 'aid an analysis of the story structure and style' and are 'interpretable and accessible by humanists' rests on the untested premise that the labels accurately capture narrative elements. No human-expert comparison, inter-annotator agreement, accuracy metrics, or even qualitative error analysis on the Latin output is reported anywhere in the manuscript.

Authors: We agree that the manuscript reports no quantitative validation (accuracy, IAA) or systematic qualitative error analysis of the Latin labels. The presented work centers on the pipeline design and its direct application to exploratory analysis of narrative patterns across texts; the resulting labels are treated as an interpretable representation whose utility is illustrated through the downstream structural and stylistic observations. To strengthen the claims, the revised version will include a new section with qualitative error analysis: selected passages from the Latin epics will be shown with LLM-generated labels alongside brief expert commentary on label fidelity, highlighting both successful captures of narrative elements and common error types. This addition will directly support the assertions of humanist accessibility and analytical utility without altering the core method. revision: yes

Circularity Check

0 steps flagged

No circularity: LLM labeling pipeline is self-contained without self-referential reductions

full rationale

The paper presents a methodological pipeline that takes Latin epic texts as input, applies few-shot LLM prompting to generate story grammar labels, and uses those labels for narrative analysis. No equations, parameter fitting, or derivations are described that would make any output equivalent to its inputs by construction. The abstract and described method contain no self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations that reduce the central claim to unverified prior work by the same authors. The approach is presented as a direct application of existing LLM capabilities to a new domain, remaining independent of the circularity patterns enumerated.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the untested assumption that current LLMs can perform reliable story-element labeling on ancient poetic texts via few-shot examples.

axioms (1)

domain assumption Large language models can be effectively prompted with few-shot examples to label narrative elements in ancient texts.
This assumption underpins the entire LLM pipeline described in the abstract.

pith-pipeline@v0.9.0 · 5655 in / 1141 out tokens · 34689 ms · 2026-05-23T02:35:37.477791+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat inductive structure; Peano axioms recovered from Law of Logic echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We adapt a modified story grammar... <dispute> ⟶ {<event>} {<document>} ... with <dispute> as our start symbol and all other elements as non-terminals.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using a large language model (LLM) pipeline and few-shot learning, we label Latin epic poetry with story element labels

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Department of Computer Science and Engineering, University of Notre Dame, South Bend, USA

work page
[2]

Abstract

Department of Classics, University at Buffalo, Buffalo, USA. Abstract. In Natural Language Processing (NLP), semantic matching algorithms have traditionally relied on the feature of word co-occurrence to measure se- mantic similarity. While this feature approach has proven valuable in many contexts, its simplistic nature limits its analytical and explanat...

work page
[3]

3 In this work, we focus on semantic matching in order to capture similarities in liter- 4 ary texts beyond exact or near-exact quotation

Introduction 1 Methods of discovering meaningful textual similarities in Natural Language Process- 2 ing (NLP) tend to fall into two categories: lexical matching and semantic matching. 3 In this work, we focus on semantic matching in order to capture similarities in liter- 4 ary texts beyond exact or near-exact quotation. Many designs for modern semantic ...

work page 1999
[4]

Related Work 44 Computational semantic matching is important to the study of literature, especially in 45 areas such as allusion detection. Allusion detection (Bamman and Crane 2008) has 46 found a place among digital classicists who wish to track references between classical 47 texts to find instances of intertextuality (Evans 1988; Hinds 1998). There ha...

work page 2008
[9]

Data Availability 435 Data can be found here: https://anonymous.4open.science/r/Semantic-Grammar-D 436 19D 437

work page
[10]

Software Availability 438 Software can be found here: https://anonymous.4open.science/r/Semantic-Grammar 439 -D19D 440

work page
[11]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Author Contributions 441 Abigail Swenor: Methodology , Data curation, Software, Writing – original draft 442 Neil Coffee:V alidation, Writing – review & editing 443 Walter Scheirer: Conceptualization, Supervision, Writing – review & editing 444 References 445 Amiran, Eyal (1992). “Proofs of Origin: Stephen’s Intertextual Art in” Ulysses””. In: 446 James J...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48694/jcls.xxx 1992

[1] [1]

Department of Computer Science and Engineering, University of Notre Dame, South Bend, USA

work page

[2] [2]

Abstract

Department of Classics, University at Buffalo, Buffalo, USA. Abstract. In Natural Language Processing (NLP), semantic matching algorithms have traditionally relied on the feature of word co-occurrence to measure se- mantic similarity. While this feature approach has proven valuable in many contexts, its simplistic nature limits its analytical and explanat...

work page

[3] [3]

3 In this work, we focus on semantic matching in order to capture similarities in liter- 4 ary texts beyond exact or near-exact quotation

Introduction 1 Methods of discovering meaningful textual similarities in Natural Language Process- 2 ing (NLP) tend to fall into two categories: lexical matching and semantic matching. 3 In this work, we focus on semantic matching in order to capture similarities in liter- 4 ary texts beyond exact or near-exact quotation. Many designs for modern semantic ...

work page 1999

[4] [4]

Related Work 44 Computational semantic matching is important to the study of literature, especially in 45 areas such as allusion detection. Allusion detection (Bamman and Crane 2008) has 46 found a place among digital classicists who wish to track references between classical 47 texts to find instances of intertextuality (Evans 1988; Hinds 1998). There ha...

work page 2008

[5] [9]

Data Availability 435 Data can be found here: https://anonymous.4open.science/r/Semantic-Grammar-D 436 19D 437

work page

[6] [10]

Software Availability 438 Software can be found here: https://anonymous.4open.science/r/Semantic-Grammar 439 -D19D 440

work page

[7] [11]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Author Contributions 441 Abigail Swenor: Methodology , Data curation, Software, Writing – original draft 442 Neil Coffee:V alidation, Writing – review & editing 443 Walter Scheirer: Conceptualization, Supervision, Writing – review & editing 444 References 445 Amiran, Eyal (1992). “Proofs of Origin: Stephen’s Intertextual Art in” Ulysses””. In: 446 James J...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48694/jcls.xxx 1992