The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison

Amit Pinchevski; Itamar Trainin; Renana Keydar

arxiv: 2605.21623 · v1 · pith:XK6GBVOZnew · submitted 2026-05-20 · 💻 cs.AI

The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison

Itamar Trainin , Renana Keydar , Amit Pinchevski This is my paper

Pith reviewed 2026-05-22 09:25 UTC · model grok-4.3

classification 💻 cs.AI

keywords holocaust studiesoral historycomputational humanitiestopic modelingdiscourse analysisstructurednessarchive comparisonsurvivor testimonies

0 comments

The pith

Computational analysis of over 1600 Holocaust testimonies reveals significant overlaps in narrative structure between the USC Shoah Foundation and Yale Fortunoff archives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish whether the commonly accepted distinction between structured USC Shoah Foundation interviews and free-form Yale Fortunoff testimonies holds up under large-scale computational scrutiny. It applies discourse segmentation, topic modeling, and LLM analysis to quantify structuredness via topic coherence, dynamics between interviewer and survivor, and question distributions. Results generally support some differences but also find substantial overlaps in individual interviews and shared narrative patterns. A sympathetic reader cares because this refines understanding of how survivor stories are collected and preserved, with implications for historical research and archive design. The study offers a replicable method for comparing oral history collections more broadly.

Core claim

While earlier research has distinguished the USC Shoah Foundation's structured, interviewer-guided format from the Yale Fortunoff Video Archive's free-form, open-ended style, a large-scale analysis of more than 1,600 testimonies using discourse segmentation, topic modeling, and LLM-based analysis shows both structural differences and significant overlaps within interviews and across common narrative patterns, complicating the simple dichotomy.

What carries the argument

Quantification of testimony structuredness through proxies including topic coherence, interviewer-survivor dynamics, and distribution of question types, applied via discourse segmentation, topic modeling, and large language model analysis.

If this is right

The binary classification of testimonies as structured or free-form should be reconsidered in favor of recognizing overlaps and gradients.
The provided framework enables scalable and replicable comparisons across other oral history archives.
Common narrative patterns across collections suggest shared elements in how survivors recount experiences regardless of interview style.
This approach can inform the design of future oral history projects and digital tools for analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future oral history archives might benefit from intentionally incorporating elements from both styles to capture richer testimonies.
Computational tools like this could be integrated into archive management systems to identify atypical interviews for special handling.
Similar methods might apply to comparing oral histories from other historical events or cultural contexts.
Overlaps could indicate that interviewer training or survivor preferences play larger roles than archive protocols alone.

Load-bearing premise

That the computational measures of topic coherence, interviewer-survivor dynamics, and question type distribution validly represent the qualitative idea of structuredness in Holocaust testimony scholarship.

What would settle it

Finding a subset of interviews where manual expert coding shows markedly different levels of structure than predicted by the topic coherence and question distribution metrics.

Figures

Figures reproduced from arXiv: 2605.21623 by Amit Pinchevski, Itamar Trainin, Renana Keydar.

**Figure 2.** Figure 2: Mean (left) and SD (right) of answer length over time for USC and Yale archives. Statistically [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Mean (left) and SD (right) of question length over time for USC and Yale archives. Statistically [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Intervention Density for USC and Yale archives. Statistically significant (t-test) differences are marked with an asterisk (*) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of question types for USC and Yale archives. “other”). To test this hypothesis, we extracted all interviewer questions from both testimony corpora and employed an LLM to classify each into one of the seven categories (see Prompt 7.3 in the Appendix). Model predictions were validated through manual review of a random subset of 50 examples per category (3,500 questions in total). The overall a… view at source ↗

**Figure 6.** Figure 6: Distribution of question types over time for USC and Yale archives. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: The topical sequence extraction pipeline. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

read the original abstract

Researchers in Holocaust studies have often distinguished between two styles of oral survivor testimony: the USC Shoah Foundation's interviews tend to follow a structured, interviewer-guided format, whereas the Yale Fortunoff Video Archive generally favors a more free-form, open-ended style. This distinction has influenced both scholarly research and the development of later archives. In this study, we critically examine that claim by conducting a large-scale computational analysis of more than 1,600 testimonies from both collections. Leveraging discourse segmentation, topic modeling, and large language model (LLM) based analysis, we quantify the "structuredness" level of testimonies through topic coherence, interviewer-survivor dynamics, and the distribution of question types. Our results generally corroborate the structural differences identified in earlier research, while also revealing significant overlaps between the collections, both within individual interviews and across common narrative patterns. This complicates the simple "structured vs. free-form" dichotomy often applied to these oral histories. Beyond revisiting a foundational claim in Holocaust studies, our work provides a scalable, replicable framework for comparative corpus analysis. As a proof of concept, it suggests broader applications for digital oral history, narrative analysis, and the design of citizen-science annotation platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper scales a computational comparison of two Holocaust archives and finds overlaps that soften the usual structured vs free-form split, but the metrics are not shown to track the historical notion of structuredness.

read the letter

The main thing to know is that this work takes the existing qualitative distinction between the Shoah Foundation and Fortunoff collections and tests it at scale with topic modeling, discourse segmentation, and LLM-based question typing on more than 1,600 interviews. It reports both the expected differences and notable overlaps in narrative patterns and interviewer dynamics, which undercuts treating the two archives as cleanly opposite styles.

Referee Report

3 major / 3 minor

Summary. The paper claims that a large-scale computational analysis of over 1,600 testimonies from the USC Shoah Foundation and Yale Fortunoff archives, using discourse segmentation, topic modeling, and LLM analysis, quantifies 'structuredness' through topic coherence, interviewer-survivor dynamics, and question-type distributions. Results are said to corroborate prior structural differences while revealing significant overlaps within interviews and across narrative patterns, thereby complicating the 'structured vs. free-form' dichotomy and providing a scalable framework for oral history comparison.

Significance. If the computational proxies are demonstrated to validly measure the qualitative notion of structuredness used in Holocaust scholarship, the work would offer a replicable, scalable method for archive comparison with direct implications for revising foundational distinctions in the field and broader applications in digital oral history and narrative analysis. The provision of a proof-of-concept framework is a strength if implementation details support reproducibility.

major comments (3)

[Methods (metric definition and LLM prompting)] Methods section on metric construction: No validation, calibration, or correlation is reported between the computational proxies (topic coherence scores, interviewer-survivor dynamics, and LLM-derived question-type distributions) and expert human classifications of testimony structuredness from Holocaust scholarship. This mapping is load-bearing for the central claim that overlaps complicate the established dichotomy, as ungrounded metrics may not correspond to the qualitative distinction being revisited.
[Results (topic modeling and overlap analysis)] Results section on overlaps and narrative patterns: The reported 'significant overlaps' and cross-collection patterns lack details on statistical controls for differing collection sizes, sensitivity analysis for topic count in modeling, or robustness checks against parameter choices in discourse segmentation. Without these, the evidence for complicating the simple dichotomy rests on potentially unstable quantitative differences.
[Discussion] Discussion of prior research corroboration: The claim that results 'generally corroborate' earlier qualitative distinctions is not supported by direct quantitative comparison or re-analysis of the same testimonies using the new metrics; this leaves open whether the computational findings align with or merely restate the original scholarly observations.

minor comments (3)

[Methods] Clarify the exact prompting strategy and few-shot examples used for the LLM question-type classifier, including any post-processing rules for discourse segmentation.
[Limitations] Add explicit discussion of limitations regarding the generalizability of the framework beyond these two archives and potential biases in LLM outputs for historical narrative content.
[Figures] Ensure all figures include axis labels, error bars where applicable, and legends that distinguish the two collections clearly.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive and detailed report. The comments identify important opportunities to improve methodological transparency, statistical robustness, and the precision of our claims regarding corroboration with prior work. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: Methods section on metric construction: No validation, calibration, or correlation is reported between the computational proxies (topic coherence scores, interviewer-survivor dynamics, and LLM-derived question-type distributions) and expert human classifications of testimony structuredness from Holocaust scholarship. This mapping is load-bearing for the central claim that overlaps complicate the established dichotomy, as ungrounded metrics may not correspond to the qualitative distinction being revisited.

Authors: We acknowledge that the manuscript does not include a direct empirical validation study correlating our proxies with expert human ratings of structuredness. Each proxy draws on established NLP literature linking topic coherence to narrative organization, question-type distributions to interviewer guidance, and turn-taking patterns to interactional dynamics. In the revised version we will add a new subsection in Methods that explicitly grounds each metric in this literature and states the assumptions involved. We will also add a limitations paragraph noting the absence of calibration against expert classifications and identifying this as a priority for future collaborative work with Holocaust scholars. These changes will make the evidential basis for the proxies clearer without overstating their current validation status. revision: yes
Referee: Results section on overlaps and narrative patterns: The reported 'significant overlaps' and cross-collection patterns lack details on statistical controls for differing collection sizes, sensitivity analysis for topic count in modeling, or robustness checks against parameter choices in discourse segmentation. Without these, the evidence for complicating the simple dichotomy rests on potentially unstable quantitative differences.

Authors: We agree that additional robustness information is needed. The original analysis selected topic count via coherence optimization and used sentence-level segmentation. In revision we will expand the Results section to include: normalized per-interview metrics with statistical tests (t-tests and effect sizes) that account for unequal collection sizes; sensitivity plots varying topic count from 5 to 20 and showing stability of the reported overlap statistics; and a brief robustness check repeating the overlap analysis under alternative segmentation schemes (paragraph boundaries and LLM-assisted segmentation). These additions will demonstrate that the core finding of substantial within- and cross-collection overlap is not an artifact of the chosen parameters. revision: yes
Referee: Discussion of prior research corroboration: The claim that results 'generally corroborate' earlier qualitative distinctions is not supported by direct quantitative comparison or re-analysis of the same testimonies using the new metrics; this leaves open whether the computational findings align with or merely restate the original scholarly observations.

Authors: We accept that the language 'generally corroborate' should be qualified. Our quantitative patterns (greater interviewer dominance and more closed questions in the USC collection) are directionally consistent with the qualitative descriptions in the cited prior scholarship. However, because we lack access to the precise interview identifiers or expert annotations used in those earlier studies, a direct re-analysis on identical data is not feasible. In the revised Discussion we will replace 'generally corroborate' with 'are qualitatively consistent with' and explicitly note the limitation regarding direct quantitative alignment. This revision preserves the contribution while accurately reflecting the nature of the comparison. revision: partial

standing simulated objections not resolved

Direct quantitative re-analysis of the identical testimonies examined in prior qualitative studies, as the required interview-level identifiers and expert structuredness labels are not publicly available or linked to the current corpora.

Circularity Check

0 steps flagged

No circularity: standard computational methods applied to independent data

full rationale

The paper applies off-the-shelf discourse segmentation, topic modeling, and LLM-based classification to compute topic coherence, interviewer dynamics, and question-type distributions across two oral-history corpora. These quantities are treated as direct empirical measurements rather than quantities fitted to or defined by the target qualitative distinction; no equations, self-citations, or parameter-tuning steps are shown that would make the reported overlaps or corroborations tautological with the input labels. The central claim therefore rests on the external validity of the chosen proxies, not on any internal reduction to the paper's own definitions or prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLM-based question classification and topic coherence scores are faithful proxies for scholarly notions of interview structure. No free parameters are explicitly listed in the abstract, but typical topic-modeling pipelines contain several (number of topics, coherence threshold, prompt templates). No new physical or mathematical entities are introduced.

axioms (1)

domain assumption Topic coherence and question-type distributions are valid quantitative stand-ins for the qualitative 'structuredness' concept used in Holocaust studies literature.
Invoked when the paper states it quantifies structuredness through these measures.

pith-pipeline@v0.9.0 · 5745 in / 1306 out tokens · 31909 ms · 2026-05-22T09:25:20.771710+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we quantify the 'structuredness' level of testimonies through topic coherence, interviewer-survivor dynamics, and the distribution of question types
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLM-based topic extraction... staged prompting strategy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

2021 , url=

USC Shoah Foundation , title=. 2021 , url=

work page 2021
[2]

Testimony , pages=

Bearing witness or the vicissitudes of listening , author=. Testimony , pages=. 2013 , publisher=

work page 2013
[3]

1993 , publisher=

Holocaust testimonies: The ruins of memory , author=. 1993 , publisher=

work page 1993
[4]

2005 , address =

The Practice of Qualitative Research , author =. 2005 , address =

work page 2005
[5]

1992 , publisher =

Testimony: Crises of Witnessing in Literature, Psychoanalysis and History , author =. 1992 , publisher =

work page 1992
[6]

Memory, History, and the Extermination of the Jews of Europe , year =

Friedl. Memory, History, and the Extermination of the Jews of Europe , year =

work page
[7]

, title =

Smith, Stephen D. , title =. 2022 , month = dec, publisher =

work page 2022
[8]

2021 , month = mar, howpublished =

Interviewer Guidelines , author =. 2021 , month = mar, howpublished =

work page 2021
[9]

2018 , publisher =

Ecologies of Witnessing: Language, Place, and Holocaust Testimony , author =. 2018 , publisher =

work page 2018
[10]

2024 , publisher =

Ethics of the Algorithm: Digital Humanities and Holocaust Memory , author =. 2024 , publisher =

work page 2024
[11]

2006 , publisher =

The Era of the Witness , author =. 2006 , publisher =

work page 2006
[12]

2015 , publisher =

Reframing Holocaust Testimony , author =. 2015 , publisher =

work page 2015
[13]

2017 , month = aug, publisher =

Shandler, Jeffrey , title =. 2017 , month = aug, publisher =

work page 2017
[14]

1996 , publisher=

The longest shadow: In the aftermath of the Holocaust , author=. 1996 , publisher=

work page 1996
[15]

2000 , publisher=

Witness: Voices from the Holocaust , author=. 2000 , publisher=

work page 2000
[16]

1993 , publisher=

Memory, History, and the Extermination of the Jews of Europe , author=. 1993 , publisher=

work page 1993
[17]

1992 , publisher=

Testimony: Crises of witnessing in literature, psychoanalysis, and history , author=. 1992 , publisher=

work page 1992
[18]

2011 , publisher=

Cultural memory and Western civilization: Functions, media, archives , author=. 2011 , publisher=

work page 2011
[19]

Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes)@ LREC-COLING 2024 , pages=

Identifying narrative patterns and outliers in holocaust testimonies using topic modeling , author=. Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes)@ LREC-COLING 2024 , pages=

work page 2024
[20]

IEEE Access , volume=

Computational understanding of narratives: A survey , author=. IEEE Access , volume=. 2022 , publisher=

work page 2022
[21]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

Narrative theory for computational narrative understanding , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2021
[22]

arXiv preprint arXiv:2504.05954 , year=

Unsupervised Location Mapping for Narrative Corpora , author=. arXiv preprint arXiv:2504.05954 , year=

work page arXiv
[23]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Computational analysis of character development in holocaust testimonies , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025
[24]

arXiv preprint arXiv:2210.13783 , year=

Topical segmentation of spoken narratives: A test case on holocaust survivor testimonies , author=. arXiv preprint arXiv:2210.13783 , year=

work page arXiv
[25]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

T5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

work page 2025
[26]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Journal of machine Learning research , volume=

Latent dirichlet allocation , author=. Journal of machine Learning research , volume=

work page
[28]

Big Data & Society , volume=

The ethics of algorithms: Mapping the debate , author=. Big Data & Society , volume=. 2016 , publisher=

work page 2016
[29]

Holocaust and Genocide Studies , volume=

Learning from survivors: The Yale testimony project , author=. Holocaust and Genocide Studies , volume=. 1995 , publisher=

work page 1995
[30]

2022 , publisher=

The trajectory of Holocaust memory: The crisis of testimony in theory and practice , author=. 2022 , publisher=

work page 2022
[31]

University of Illinois Journal of Law, Technology & Policy , year =

Keydar, Renana , title =. University of Illinois Journal of Law, Technology & Policy , year =

work page
[32]

Jewish Studies Quarterly , year =

Keydar, Renana , title =. Jewish Studies Quarterly , year =

work page
[33]

Jewish studies in the digital age , volume=

Digitizing Holocaust Memories , author=. Jewish studies in the digital age , volume=. 2022 , publisher=

work page 2022
[34]

NLTK : The Natural Language Toolkit

Bird, Steven and Loper, Edward. NLTK : The Natural Language Toolkit. Proceedings of the ACL Interactive Poster and Demonstration Sessions. 2004

work page 2004
[35]

Proceedings of Text2Story@ECIR , year =

The Geography of 'Fear', 'Sadness', 'Anger' and 'Joy': Exploring the Emotional Landscapes in the Holocaust Survivors' Testimonies , author =. Proceedings of Text2Story@ECIR , year =

work page
[36]

Let Them Speak: An Effort to Reconnect Communities of Survivors in a Digital Archive

Naron, Stephen and Toth, Gabor Mihaly. Let Them Speak: An Effort to Reconnect Communities of Survivors in a Digital Archive. Mass Violence and Memory in the Digital Age: Memorialization Unmoored. 2020. doi:10.1007/978-3-030-39395-3_4

work page doi:10.1007/978-3-030-39395-3_4 2020
[37]

Digital Scholarship in the Humanities , volume =

Blanke, Tobias and Bryant, Michael and Hedges, Mark , title =. Digital Scholarship in the Humanities , volume =. 2019 , month =. doi:10.1093/llc/fqy082 , url =

work page doi:10.1093/llc/fqy082 2019
[38]

Holocaust and Genocide Studies , volume =

Keydar, Renana and Pinchevski, Amit and Ifergan, Maxim and Abend, Omri , title =. Holocaust and Genocide Studies , volume =

work page

[1] [1]

2021 , url=

USC Shoah Foundation , title=. 2021 , url=

work page 2021

[2] [2]

Testimony , pages=

Bearing witness or the vicissitudes of listening , author=. Testimony , pages=. 2013 , publisher=

work page 2013

[3] [3]

1993 , publisher=

Holocaust testimonies: The ruins of memory , author=. 1993 , publisher=

work page 1993

[4] [4]

2005 , address =

The Practice of Qualitative Research , author =. 2005 , address =

work page 2005

[5] [5]

1992 , publisher =

Testimony: Crises of Witnessing in Literature, Psychoanalysis and History , author =. 1992 , publisher =

work page 1992

[6] [6]

Memory, History, and the Extermination of the Jews of Europe , year =

Friedl. Memory, History, and the Extermination of the Jews of Europe , year =

work page

[7] [7]

, title =

Smith, Stephen D. , title =. 2022 , month = dec, publisher =

work page 2022

[8] [8]

2021 , month = mar, howpublished =

Interviewer Guidelines , author =. 2021 , month = mar, howpublished =

work page 2021

[9] [9]

2018 , publisher =

Ecologies of Witnessing: Language, Place, and Holocaust Testimony , author =. 2018 , publisher =

work page 2018

[10] [10]

2024 , publisher =

Ethics of the Algorithm: Digital Humanities and Holocaust Memory , author =. 2024 , publisher =

work page 2024

[11] [11]

2006 , publisher =

The Era of the Witness , author =. 2006 , publisher =

work page 2006

[12] [12]

2015 , publisher =

Reframing Holocaust Testimony , author =. 2015 , publisher =

work page 2015

[13] [13]

2017 , month = aug, publisher =

Shandler, Jeffrey , title =. 2017 , month = aug, publisher =

work page 2017

[14] [14]

1996 , publisher=

The longest shadow: In the aftermath of the Holocaust , author=. 1996 , publisher=

work page 1996

[15] [15]

2000 , publisher=

Witness: Voices from the Holocaust , author=. 2000 , publisher=

work page 2000

[16] [16]

1993 , publisher=

Memory, History, and the Extermination of the Jews of Europe , author=. 1993 , publisher=

work page 1993

[17] [17]

1992 , publisher=

Testimony: Crises of witnessing in literature, psychoanalysis, and history , author=. 1992 , publisher=

work page 1992

[18] [18]

2011 , publisher=

Cultural memory and Western civilization: Functions, media, archives , author=. 2011 , publisher=

work page 2011

[19] [19]

Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes)@ LREC-COLING 2024 , pages=

Identifying narrative patterns and outliers in holocaust testimonies using topic modeling , author=. Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes)@ LREC-COLING 2024 , pages=

work page 2024

[20] [20]

IEEE Access , volume=

Computational understanding of narratives: A survey , author=. IEEE Access , volume=. 2022 , publisher=

work page 2022

[21] [21]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

Narrative theory for computational narrative understanding , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2021

[22] [22]

arXiv preprint arXiv:2504.05954 , year=

Unsupervised Location Mapping for Narrative Corpora , author=. arXiv preprint arXiv:2504.05954 , year=

work page arXiv

[23] [23]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Computational analysis of character development in holocaust testimonies , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025

[24] [24]

arXiv preprint arXiv:2210.13783 , year=

Topical segmentation of spoken narratives: A test case on holocaust survivor testimonies , author=. arXiv preprint arXiv:2210.13783 , year=

work page arXiv

[25] [25]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

T5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

work page 2025

[26] [26]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Journal of machine Learning research , volume=

Latent dirichlet allocation , author=. Journal of machine Learning research , volume=

work page

[28] [28]

Big Data & Society , volume=

The ethics of algorithms: Mapping the debate , author=. Big Data & Society , volume=. 2016 , publisher=

work page 2016

[29] [29]

Holocaust and Genocide Studies , volume=

Learning from survivors: The Yale testimony project , author=. Holocaust and Genocide Studies , volume=. 1995 , publisher=

work page 1995

[30] [30]

2022 , publisher=

The trajectory of Holocaust memory: The crisis of testimony in theory and practice , author=. 2022 , publisher=

work page 2022

[31] [31]

University of Illinois Journal of Law, Technology & Policy , year =

Keydar, Renana , title =. University of Illinois Journal of Law, Technology & Policy , year =

work page

[32] [32]

Jewish Studies Quarterly , year =

Keydar, Renana , title =. Jewish Studies Quarterly , year =

work page

[33] [33]

Jewish studies in the digital age , volume=

Digitizing Holocaust Memories , author=. Jewish studies in the digital age , volume=. 2022 , publisher=

work page 2022

[34] [34]

NLTK : The Natural Language Toolkit

Bird, Steven and Loper, Edward. NLTK : The Natural Language Toolkit. Proceedings of the ACL Interactive Poster and Demonstration Sessions. 2004

work page 2004

[35] [35]

Proceedings of Text2Story@ECIR , year =

The Geography of 'Fear', 'Sadness', 'Anger' and 'Joy': Exploring the Emotional Landscapes in the Holocaust Survivors' Testimonies , author =. Proceedings of Text2Story@ECIR , year =

work page

[36] [36]

Let Them Speak: An Effort to Reconnect Communities of Survivors in a Digital Archive

Naron, Stephen and Toth, Gabor Mihaly. Let Them Speak: An Effort to Reconnect Communities of Survivors in a Digital Archive. Mass Violence and Memory in the Digital Age: Memorialization Unmoored. 2020. doi:10.1007/978-3-030-39395-3_4

work page doi:10.1007/978-3-030-39395-3_4 2020

[37] [37]

Digital Scholarship in the Humanities , volume =

Blanke, Tobias and Bryant, Michael and Hedges, Mark , title =. Digital Scholarship in the Humanities , volume =. 2019 , month =. doi:10.1093/llc/fqy082 , url =

work page doi:10.1093/llc/fqy082 2019

[38] [38]

Holocaust and Genocide Studies , volume =

Keydar, Renana and Pinchevski, Amit and Ifergan, Maxim and Abend, Omri , title =. Holocaust and Genocide Studies , volume =

work page