pith. sign in

arxiv: 2605.21623 · v1 · pith:XK6GBVOZnew · submitted 2026-05-20 · 💻 cs.AI

The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison

Pith reviewed 2026-05-22 09:25 UTC · model grok-4.3

classification 💻 cs.AI
keywords holocaust studiesoral historycomputational humanitiestopic modelingdiscourse analysisstructurednessarchive comparisonsurvivor testimonies
0
0 comments X

The pith

Computational analysis of over 1600 Holocaust testimonies reveals significant overlaps in narrative structure between the USC Shoah Foundation and Yale Fortunoff archives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish whether the commonly accepted distinction between structured USC Shoah Foundation interviews and free-form Yale Fortunoff testimonies holds up under large-scale computational scrutiny. It applies discourse segmentation, topic modeling, and LLM analysis to quantify structuredness via topic coherence, dynamics between interviewer and survivor, and question distributions. Results generally support some differences but also find substantial overlaps in individual interviews and shared narrative patterns. A sympathetic reader cares because this refines understanding of how survivor stories are collected and preserved, with implications for historical research and archive design. The study offers a replicable method for comparing oral history collections more broadly.

Core claim

While earlier research has distinguished the USC Shoah Foundation's structured, interviewer-guided format from the Yale Fortunoff Video Archive's free-form, open-ended style, a large-scale analysis of more than 1,600 testimonies using discourse segmentation, topic modeling, and LLM-based analysis shows both structural differences and significant overlaps within interviews and across common narrative patterns, complicating the simple dichotomy.

What carries the argument

Quantification of testimony structuredness through proxies including topic coherence, interviewer-survivor dynamics, and distribution of question types, applied via discourse segmentation, topic modeling, and large language model analysis.

If this is right

  • The binary classification of testimonies as structured or free-form should be reconsidered in favor of recognizing overlaps and gradients.
  • The provided framework enables scalable and replicable comparisons across other oral history archives.
  • Common narrative patterns across collections suggest shared elements in how survivors recount experiences regardless of interview style.
  • This approach can inform the design of future oral history projects and digital tools for analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future oral history archives might benefit from intentionally incorporating elements from both styles to capture richer testimonies.
  • Computational tools like this could be integrated into archive management systems to identify atypical interviews for special handling.
  • Similar methods might apply to comparing oral histories from other historical events or cultural contexts.
  • Overlaps could indicate that interviewer training or survivor preferences play larger roles than archive protocols alone.

Load-bearing premise

That the computational measures of topic coherence, interviewer-survivor dynamics, and question type distribution validly represent the qualitative idea of structuredness in Holocaust testimony scholarship.

What would settle it

Finding a subset of interviews where manual expert coding shows markedly different levels of structure than predicted by the topic coherence and question distribution metrics.

Figures

Figures reproduced from arXiv: 2605.21623 by Amit Pinchevski, Itamar Trainin, Renana Keydar.

Figure 1
Figure 1. Figure 1: (a) Number of interviews conducted (individually or jointly) by each interviewer in each archive. (b) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean (left) and SD (right) of answer length over time for USC and Yale archives. Statistically [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean (left) and SD (right) of question length over time for USC and Yale archives. Statistically [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Intervention Density for USC and Yale archives. Statistically significant (t-test) differences are marked with an asterisk (*) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of question types for USC and Yale archives. “other”). To test this hypothesis, we extracted all inter￾viewer questions from both testimony corpora and employed an LLM to classify each into one of the seven categories (see Prompt 7.3 in the Appendix). Model predictions were validated through manual review of a random subset of 50 examples per cat￾egory (3,500 questions in total). The overall a… view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of question types over time for USC and Yale archives. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The topical sequence extraction pipeline. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
read the original abstract

Researchers in Holocaust studies have often distinguished between two styles of oral survivor testimony: the USC Shoah Foundation's interviews tend to follow a structured, interviewer-guided format, whereas the Yale Fortunoff Video Archive generally favors a more free-form, open-ended style. This distinction has influenced both scholarly research and the development of later archives. In this study, we critically examine that claim by conducting a large-scale computational analysis of more than 1,600 testimonies from both collections. Leveraging discourse segmentation, topic modeling, and large language model (LLM) based analysis, we quantify the "structuredness" level of testimonies through topic coherence, interviewer-survivor dynamics, and the distribution of question types. Our results generally corroborate the structural differences identified in earlier research, while also revealing significant overlaps between the collections, both within individual interviews and across common narrative patterns. This complicates the simple "structured vs. free-form" dichotomy often applied to these oral histories. Beyond revisiting a foundational claim in Holocaust studies, our work provides a scalable, replicable framework for comparative corpus analysis. As a proof of concept, it suggests broader applications for digital oral history, narrative analysis, and the design of citizen-science annotation platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper claims that a large-scale computational analysis of over 1,600 testimonies from the USC Shoah Foundation and Yale Fortunoff archives, using discourse segmentation, topic modeling, and LLM analysis, quantifies 'structuredness' through topic coherence, interviewer-survivor dynamics, and question-type distributions. Results are said to corroborate prior structural differences while revealing significant overlaps within interviews and across narrative patterns, thereby complicating the 'structured vs. free-form' dichotomy and providing a scalable framework for oral history comparison.

Significance. If the computational proxies are demonstrated to validly measure the qualitative notion of structuredness used in Holocaust scholarship, the work would offer a replicable, scalable method for archive comparison with direct implications for revising foundational distinctions in the field and broader applications in digital oral history and narrative analysis. The provision of a proof-of-concept framework is a strength if implementation details support reproducibility.

major comments (3)
  1. [Methods (metric definition and LLM prompting)] Methods section on metric construction: No validation, calibration, or correlation is reported between the computational proxies (topic coherence scores, interviewer-survivor dynamics, and LLM-derived question-type distributions) and expert human classifications of testimony structuredness from Holocaust scholarship. This mapping is load-bearing for the central claim that overlaps complicate the established dichotomy, as ungrounded metrics may not correspond to the qualitative distinction being revisited.
  2. [Results (topic modeling and overlap analysis)] Results section on overlaps and narrative patterns: The reported 'significant overlaps' and cross-collection patterns lack details on statistical controls for differing collection sizes, sensitivity analysis for topic count in modeling, or robustness checks against parameter choices in discourse segmentation. Without these, the evidence for complicating the simple dichotomy rests on potentially unstable quantitative differences.
  3. [Discussion] Discussion of prior research corroboration: The claim that results 'generally corroborate' earlier qualitative distinctions is not supported by direct quantitative comparison or re-analysis of the same testimonies using the new metrics; this leaves open whether the computational findings align with or merely restate the original scholarly observations.
minor comments (3)
  1. [Methods] Clarify the exact prompting strategy and few-shot examples used for the LLM question-type classifier, including any post-processing rules for discourse segmentation.
  2. [Limitations] Add explicit discussion of limitations regarding the generalizability of the framework beyond these two archives and potential biases in LLM outputs for historical narrative content.
  3. [Figures] Ensure all figures include axis labels, error bars where applicable, and legends that distinguish the two collections clearly.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive and detailed report. The comments identify important opportunities to improve methodological transparency, statistical robustness, and the precision of our claims regarding corroboration with prior work. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: Methods section on metric construction: No validation, calibration, or correlation is reported between the computational proxies (topic coherence scores, interviewer-survivor dynamics, and LLM-derived question-type distributions) and expert human classifications of testimony structuredness from Holocaust scholarship. This mapping is load-bearing for the central claim that overlaps complicate the established dichotomy, as ungrounded metrics may not correspond to the qualitative distinction being revisited.

    Authors: We acknowledge that the manuscript does not include a direct empirical validation study correlating our proxies with expert human ratings of structuredness. Each proxy draws on established NLP literature linking topic coherence to narrative organization, question-type distributions to interviewer guidance, and turn-taking patterns to interactional dynamics. In the revised version we will add a new subsection in Methods that explicitly grounds each metric in this literature and states the assumptions involved. We will also add a limitations paragraph noting the absence of calibration against expert classifications and identifying this as a priority for future collaborative work with Holocaust scholars. These changes will make the evidential basis for the proxies clearer without overstating their current validation status. revision: yes

  2. Referee: Results section on overlaps and narrative patterns: The reported 'significant overlaps' and cross-collection patterns lack details on statistical controls for differing collection sizes, sensitivity analysis for topic count in modeling, or robustness checks against parameter choices in discourse segmentation. Without these, the evidence for complicating the simple dichotomy rests on potentially unstable quantitative differences.

    Authors: We agree that additional robustness information is needed. The original analysis selected topic count via coherence optimization and used sentence-level segmentation. In revision we will expand the Results section to include: normalized per-interview metrics with statistical tests (t-tests and effect sizes) that account for unequal collection sizes; sensitivity plots varying topic count from 5 to 20 and showing stability of the reported overlap statistics; and a brief robustness check repeating the overlap analysis under alternative segmentation schemes (paragraph boundaries and LLM-assisted segmentation). These additions will demonstrate that the core finding of substantial within- and cross-collection overlap is not an artifact of the chosen parameters. revision: yes

  3. Referee: Discussion of prior research corroboration: The claim that results 'generally corroborate' earlier qualitative distinctions is not supported by direct quantitative comparison or re-analysis of the same testimonies using the new metrics; this leaves open whether the computational findings align with or merely restate the original scholarly observations.

    Authors: We accept that the language 'generally corroborate' should be qualified. Our quantitative patterns (greater interviewer dominance and more closed questions in the USC collection) are directionally consistent with the qualitative descriptions in the cited prior scholarship. However, because we lack access to the precise interview identifiers or expert annotations used in those earlier studies, a direct re-analysis on identical data is not feasible. In the revised Discussion we will replace 'generally corroborate' with 'are qualitatively consistent with' and explicitly note the limitation regarding direct quantitative alignment. This revision preserves the contribution while accurately reflecting the nature of the comparison. revision: partial

standing simulated objections not resolved
  • Direct quantitative re-analysis of the identical testimonies examined in prior qualitative studies, as the required interview-level identifiers and expert structuredness labels are not publicly available or linked to the current corpora.

Circularity Check

0 steps flagged

No circularity: standard computational methods applied to independent data

full rationale

The paper applies off-the-shelf discourse segmentation, topic modeling, and LLM-based classification to compute topic coherence, interviewer dynamics, and question-type distributions across two oral-history corpora. These quantities are treated as direct empirical measurements rather than quantities fitted to or defined by the target qualitative distinction; no equations, self-citations, or parameter-tuning steps are shown that would make the reported overlaps or corroborations tautological with the input labels. The central claim therefore rests on the external validity of the chosen proxies, not on any internal reduction to the paper's own definitions or prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLM-based question classification and topic coherence scores are faithful proxies for scholarly notions of interview structure. No free parameters are explicitly listed in the abstract, but typical topic-modeling pipelines contain several (number of topics, coherence threshold, prompt templates). No new physical or mathematical entities are introduced.

axioms (1)
  • domain assumption Topic coherence and question-type distributions are valid quantitative stand-ins for the qualitative 'structuredness' concept used in Holocaust studies literature.
    Invoked when the paper states it quantifies structuredness through these measures.

pith-pipeline@v0.9.0 · 5745 in / 1306 out tokens · 31909 ms · 2026-05-22T09:25:20.771710+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

  1. [1]

    2021 , url=

    USC Shoah Foundation , title=. 2021 , url=

  2. [2]

    Testimony , pages=

    Bearing witness or the vicissitudes of listening , author=. Testimony , pages=. 2013 , publisher=

  3. [3]

    1993 , publisher=

    Holocaust testimonies: The ruins of memory , author=. 1993 , publisher=

  4. [4]

    2005 , address =

    The Practice of Qualitative Research , author =. 2005 , address =

  5. [5]

    1992 , publisher =

    Testimony: Crises of Witnessing in Literature, Psychoanalysis and History , author =. 1992 , publisher =

  6. [6]

    Memory, History, and the Extermination of the Jews of Europe , year =

    Friedl. Memory, History, and the Extermination of the Jews of Europe , year =

  7. [7]

    , title =

    Smith, Stephen D. , title =. 2022 , month = dec, publisher =

  8. [8]

    2021 , month = mar, howpublished =

    Interviewer Guidelines , author =. 2021 , month = mar, howpublished =

  9. [9]

    2018 , publisher =

    Ecologies of Witnessing: Language, Place, and Holocaust Testimony , author =. 2018 , publisher =

  10. [10]

    2024 , publisher =

    Ethics of the Algorithm: Digital Humanities and Holocaust Memory , author =. 2024 , publisher =

  11. [11]

    2006 , publisher =

    The Era of the Witness , author =. 2006 , publisher =

  12. [12]

    2015 , publisher =

    Reframing Holocaust Testimony , author =. 2015 , publisher =

  13. [13]

    2017 , month = aug, publisher =

    Shandler, Jeffrey , title =. 2017 , month = aug, publisher =

  14. [14]

    1996 , publisher=

    The longest shadow: In the aftermath of the Holocaust , author=. 1996 , publisher=

  15. [15]

    2000 , publisher=

    Witness: Voices from the Holocaust , author=. 2000 , publisher=

  16. [16]

    1993 , publisher=

    Memory, History, and the Extermination of the Jews of Europe , author=. 1993 , publisher=

  17. [17]

    1992 , publisher=

    Testimony: Crises of witnessing in literature, psychoanalysis, and history , author=. 1992 , publisher=

  18. [18]

    2011 , publisher=

    Cultural memory and Western civilization: Functions, media, archives , author=. 2011 , publisher=

  19. [19]

    Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes)@ LREC-COLING 2024 , pages=

    Identifying narrative patterns and outliers in holocaust testimonies using topic modeling , author=. Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes)@ LREC-COLING 2024 , pages=

  20. [20]

    IEEE Access , volume=

    Computational understanding of narratives: A survey , author=. IEEE Access , volume=. 2022 , publisher=

  21. [21]

    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

    Narrative theory for computational narrative understanding , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

  22. [22]

    arXiv preprint arXiv:2504.05954 , year=

    Unsupervised Location Mapping for Narrative Corpora , author=. arXiv preprint arXiv:2504.05954 , year=

  23. [23]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    Computational analysis of character development in holocaust testimonies , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  24. [24]

    arXiv preprint arXiv:2210.13783 , year=

    Topical segmentation of spoken narratives: A test case on holocaust survivor testimonies , author=. arXiv preprint arXiv:2210.13783 , year=

  25. [25]

    Findings of the Association for Computational Linguistics: ACL 2025 , pages=

    T5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

  26. [26]

    BERTopic: Neural topic modeling with a class-based TF-IDF procedure

    BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

  27. [27]

    Journal of machine Learning research , volume=

    Latent dirichlet allocation , author=. Journal of machine Learning research , volume=

  28. [28]

    Big Data & Society , volume=

    The ethics of algorithms: Mapping the debate , author=. Big Data & Society , volume=. 2016 , publisher=

  29. [29]

    Holocaust and Genocide Studies , volume=

    Learning from survivors: The Yale testimony project , author=. Holocaust and Genocide Studies , volume=. 1995 , publisher=

  30. [30]

    2022 , publisher=

    The trajectory of Holocaust memory: The crisis of testimony in theory and practice , author=. 2022 , publisher=

  31. [31]

    University of Illinois Journal of Law, Technology & Policy , year =

    Keydar, Renana , title =. University of Illinois Journal of Law, Technology & Policy , year =

  32. [32]

    Jewish Studies Quarterly , year =

    Keydar, Renana , title =. Jewish Studies Quarterly , year =

  33. [33]

    Jewish studies in the digital age , volume=

    Digitizing Holocaust Memories , author=. Jewish studies in the digital age , volume=. 2022 , publisher=

  34. [34]

    NLTK : The Natural Language Toolkit

    Bird, Steven and Loper, Edward. NLTK : The Natural Language Toolkit. Proceedings of the ACL Interactive Poster and Demonstration Sessions. 2004

  35. [35]

    Proceedings of Text2Story@ECIR , year =

    The Geography of 'Fear', 'Sadness', 'Anger' and 'Joy': Exploring the Emotional Landscapes in the Holocaust Survivors' Testimonies , author =. Proceedings of Text2Story@ECIR , year =

  36. [36]

    Let Them Speak: An Effort to Reconnect Communities of Survivors in a Digital Archive

    Naron, Stephen and Toth, Gabor Mihaly. Let Them Speak: An Effort to Reconnect Communities of Survivors in a Digital Archive. Mass Violence and Memory in the Digital Age: Memorialization Unmoored. 2020. doi:10.1007/978-3-030-39395-3_4

  37. [37]

    Digital Scholarship in the Humanities , volume =

    Blanke, Tobias and Bryant, Michael and Hedges, Mark , title =. Digital Scholarship in the Humanities , volume =. 2019 , month =. doi:10.1093/llc/fqy082 , url =

  38. [38]

    Holocaust and Genocide Studies , volume =

    Keydar, Renana and Pinchevski, Amit and Ifergan, Maxim and Abend, Omri , title =. Holocaust and Genocide Studies , volume =