Machine-Assisted Script Curation

Alex Hedges; Dong-Ho Lee; Joseph Cummings; Manuel R. Ciosici; Marjorie Freedman; Mitchell DeHaven; Ralph Weischedel; Yash Kankanampati

arxiv: 2101.05400 · v2 · pith:3XHAHTCHnew · submitted 2021-01-14 · 💻 cs.CL · cs.AI· cs.LG

Machine-Assisted Script Curation

Manuel R. Ciosici , Joseph Cummings , Mitchell DeHaven , Alex Hedges , Yash Kankanampati , Dong-Ho Lee , Ralph Weischedel , Marjorie Freedman This is my paper

Pith reviewed 2026-05-25 08:41 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords script curationevent scriptshuman-machine collaborationWikidata linkssub-event suggestioncomplex event representationcollaborative authoring

0 comments

The pith

MASC automates parts of script authoring by suggesting event types, Wikidata links, and missing sub-events for human writers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Machine-Aided Script Curator (MASC), a collaborative system that helps humans build scripts for complex events. Scripts contain English sub-event descriptions, assigned event types, tracked entities across sub-events, and temporal orderings. MASC supplies machine-generated suggestions for event types, external knowledge-base links, and overlooked sub-events. The authors show these features in operation through several worked examples of script creation. A reader would care if the approach reduces the manual effort needed to produce structured, complete accounts of multi-step processes.

Core claim

MASC is a human-machine system for script authoring that produces four elements: English descriptions of sub-events making up a larger event, event types for each sub-event, records of entities expected to appear in multiple sub-events, and temporal sequencing among the sub-events. The machine component automates suggestions for event types, links to Wikidata, and sub-events that may have been omitted, while the human supplies the core descriptions and sequencing decisions. The authors illustrate the value of these automations through a small set of case-study scripts.

What carries the argument

Machine-Aided Script Curator (MASC), a collaborative authoring interface that supplies automated suggestions for event types, Wikidata links, and forgotten sub-events during script construction.

If this is right

Scripts produced with MASC contain both natural-language sub-event descriptions and assigned event types.
Entities that participate across multiple sub-events are explicitly recorded in the output.
Temporal sequencing between sub-events is maintained as part of the script structure.
Machine suggestions can surface sub-events the human writer might otherwise omit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same suggestion mechanism could be applied to other structured narrative tasks such as process modeling or recipe formalization.
Direct integration with additional knowledge bases beyond Wikidata might increase the coverage of the type and link suggestions.
The current reliance on case studies leaves open the question of how suggestion acceptance rates vary across different event domains.

Load-bearing premise

A handful of case-study scripts are enough to show that the machine suggestions are generally useful to script authors.

What would settle it

A controlled comparison in which writers produce scripts of comparable quality and completeness more slowly or with more errors when given MASC suggestions than when working unaided.

Figures

Figures reproduced from arXiv: 2101.05400 by Alex Hedges, Dong-Ho Lee, Joseph Cummings, Manuel R. Ciosici, Marjorie Freedman, Mitchell DeHaven, Ralph Weischedel, Yash Kankanampati.

**Figure 1.** Figure 1: Adding events to the buying a car script. large amounts of data rather than write scripts manually (Rudinger et al., 2015; Weber et al., 2018). Although improving year over year, these models still perform poorly (Recall@100 of ~7%, Weber et al., 2020) at predicting next events, given a set of preceding events - a necessary building block of scripts. These models’ training data was obtained by asking hum… view at source ↗

**Figure 2.** Figure 2: Adding details to events. For each event on the left, curators can add arguments. On the right side, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Reviewing the Wikidata link suggestions. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: GPT-2 recommendations for buying a car. reference variables used. We find that curators link 67% of the unique reference variables to Wikidata (e.g., buyer in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Mixed-Initiative: GPT-2’s suggestions for [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

We describe Machine-Aided Script Curator (MASC), a system for human-machine collaborative script authoring. Scripts produced with MASC include (1) English descriptions of sub-events that comprise a larger, complex event; (2) event types for each of those events; (3) a record of entities expected to participate in multiple sub-events; and (4) temporal sequencing between the sub-events. MASC automates portions of the script creation process with suggestions for event types, links to Wikidata, and sub-events that may have been forgotten. We illustrate how these automations are useful to the script writer with a few case-study scripts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper describes the Machine-Aided Script Curator (MASC), a human-machine collaborative system for authoring scripts of complex events. Scripts include English sub-event descriptions, event types, records of recurring entities, and temporal sequencing. MASC provides automated suggestions for event types, Wikidata links, and potentially forgotten sub-events; the authors claim these suggestions are useful to script writers and illustrate the claim with a few case-study scripts.

Significance. If the utility of the suggestions were demonstrated, MASC could contribute to more efficient construction of structured event representations with potential downstream uses in narrative modeling and knowledge extraction. The current manuscript, however, provides no quantitative evidence, so the significance cannot yet be assessed.

major comments (1)

[Abstract and §4] Abstract and §4: the central claim that the automations (event-type suggestions, Wikidata links, forgotten sub-events) are useful rests exclusively on qualitative inspection of a few case-study scripts. No acceptance rates, time/quality metrics, baseline comparisons to unaided authoring, coverage statistics, or user studies are reported, leaving the utility assertion without measurable support.

minor comments (2)

The generation mechanism for each class of suggestion (event types, Wikidata links, sub-event proposals) is not described, which limits reproducibility and technical clarity.
[§4] No details are given on the size or selection criteria of the case-study scripts, nor on how the 'forgotten' sub-events were identified as omissions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the review. We agree that the manuscript's claims regarding the utility of the automations rest on qualitative case studies without quantitative support, and we will revise the text to align the claims with the evidence presented.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4: the central claim that the automations (event-type suggestions, Wikidata links, forgotten sub-events) are useful rests exclusively on qualitative inspection of a few case-study scripts. No acceptance rates, time/quality metrics, baseline comparisons to unaided authoring, coverage statistics, or user studies are reported, leaving the utility assertion without measurable support.

Authors: We agree that the paper provides no quantitative evidence (acceptance rates, time/quality metrics, baselines, coverage statistics, or user studies) for the utility of the suggestions. The manuscript is a system description that illustrates the automations via case studies rather than evaluating them. We will revise the abstract and §4 to remove the assertion that the automations 'are useful' and instead describe them as providing suggestions 'as illustrated in the case-study scripts.' This change will be made in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity; system description paper with no derivations or fitted predictions

full rationale

The paper is a descriptive account of the MASC system for collaborative script authoring, illustrated via qualitative case studies. No equations, predictions, first-principles derivations, or parameter-fitting steps are present anywhere in the manuscript. Claims rest on system functionality and example outputs rather than any chain that could reduce to self-definition, fitted inputs, or self-citation load-bearing. This is the expected non-finding for a non-mathematical system paper whose central content is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied system-description paper with no mathematical derivations, fitted parameters, background axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 5655 in / 1023 out tokens · 36851 ms · 2026-05-25T08:41:52.018988+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

In Proceedings of the 2015 Conference on Empiri- cal Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal

A large anno- tated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empiri- cal Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Compu- tational Linguistics. Ofer Bronstein, Ido Dagan, Qi Li, Heng Ji, and Anette Frank

work page 2015
[2]

In Proceed- ings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Seattle, Washington, USA

Event schema induction with a probabilistic entity-driven model. In Proceed- ings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Seattle, Washington, USA. Association for Compu- tational Linguistics. Asaf Degani and Earl L. Wiener

work page 2013
[3]

In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online

RealToxi- cityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics. H Paul Grice

work page 2020
[4]

In The Semantic Web – ISWC 2020, pages 278–293, Cham

KGTK: A toolkit for large knowledge graph manipulation and analysis. In The Semantic Web – ISWC 2020, pages 278–293, Cham. Springer International Publishing. Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare V oss, Daniel Napierski, and Marjorie Freedman

work page 2020
[5]

KagNet: Knowledge-aware graph networks for commonsense reasoning. In Proceed- ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Inter- national Joint Conference on Natural Language Pro- cessing (EMNLP-IJCNLP), pages 2829–2839, Hong Kong, China. Association for Computational Lin- guistics. Ying Lin, Heng Ji, Fei Hu...

work page 2019
[6]

In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Process- ing, pages 392–402, Austin, Texas

Event detection and co-reference with minimal su- pervision. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Process- ing, pages 392–402, Austin, Texas. Association for Computational Linguistics. Dragomir R. Radev, Hong Qi, Harris Wu, and Weiguo Fan

work page 2016
[7]

Sentence- BERT: Sentence embeddings using Siamese BERT- networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natu- ral Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics. Rachel Rudinger, Pushpendre Ras...

work page 2019
[8]

In Proceedings of the 2015 Con- ference on Empirical Methods in Natural Language Processing, pages 1681–1686, Lisbon, Portugal

Script induction as language modeling. In Proceedings of the 2015 Con- ference on Empirical Methods in Natural Language Processing, pages 1681–1686, Lisbon, Portugal. As- sociation for Computational Linguistics. Roger C Schank and Robert P Abelson

work page 2015
[9]

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4615–4629, Online

Unsupervised commonsense question answering with self-talk. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4615–4629, Online. Association for Computa- tional Linguistics. Zhiyi Song, Ann Bies, Stephanie Strassel, Tom Riese, Justin Mott, Joe Ellis, Jonathan Wright, Seth Kulick, Neville Ryant, and Xiaoyi Ma

work page 2020
[10]

Association for Computa- tional Linguistics

On NMT search errors and model errors: Cat got your tongue? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Lan- guage Processing (EMNLP-IJCNLP) , pages 3356– 3362, Hong Kong, China. Association for Computa- tional Linguistics. David Wadden, Ulme Wennberg, Yi L...

work page 2019
[11]

Entity, relation, and event extraction with contextualized span representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Lan- guage Processing (EMNLP-IJCNLP) , pages 5784– 5789, Hong Kong, China. Association for Computa- tional Linguistics. Christopher ...

work page 2019
[12]

Web Down- load

ACE 2005 Multilin- gual Training Corpus LDC2006T06. Web Down- load. Philadelphia: Linguistic Data Consortium. Noah Weber, Rachel Rudinger, and Benjamin Van Durme

work page 2005
[13]

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7583–7596, Online

Causal inference of script knowl- edge. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7583–7596, Online. Association for Computational Linguistics. Noah Weber, Leena Shekhar, Niranjan Balasubrama- nian, and Nathanael Chambers

work page 2020
[14]

In Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing , pages 3783–3792, Brussels, Belgium

Hierarchi- cal quantized representations for script generation. In Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing , pages 3783–3792, Brussels, Belgium. Association for Computational Linguistics. Adina Williams, Nikita Nangia, and Samuel Bowman

work page 2018
[15]

A broad-coverage challenge corpus for sen- tence understanding through inference. In Proceed- ings of the 2018 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies, Volume 1 (Long Papers) , pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguis- tics

work page 2018

[1] [1]

In Proceedings of the 2015 Conference on Empiri- cal Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal

A large anno- tated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empiri- cal Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Compu- tational Linguistics. Ofer Bronstein, Ido Dagan, Qi Li, Heng Ji, and Anette Frank

work page 2015

[2] [2]

In Proceed- ings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Seattle, Washington, USA

Event schema induction with a probabilistic entity-driven model. In Proceed- ings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Seattle, Washington, USA. Association for Compu- tational Linguistics. Asaf Degani and Earl L. Wiener

work page 2013

[3] [3]

In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online

RealToxi- cityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics. H Paul Grice

work page 2020

[4] [4]

In The Semantic Web – ISWC 2020, pages 278–293, Cham

KGTK: A toolkit for large knowledge graph manipulation and analysis. In The Semantic Web – ISWC 2020, pages 278–293, Cham. Springer International Publishing. Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare V oss, Daniel Napierski, and Marjorie Freedman

work page 2020

[5] [5]

KagNet: Knowledge-aware graph networks for commonsense reasoning. In Proceed- ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Inter- national Joint Conference on Natural Language Pro- cessing (EMNLP-IJCNLP), pages 2829–2839, Hong Kong, China. Association for Computational Lin- guistics. Ying Lin, Heng Ji, Fei Hu...

work page 2019

[6] [6]

In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Process- ing, pages 392–402, Austin, Texas

Event detection and co-reference with minimal su- pervision. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Process- ing, pages 392–402, Austin, Texas. Association for Computational Linguistics. Dragomir R. Radev, Hong Qi, Harris Wu, and Weiguo Fan

work page 2016

[7] [7]

Sentence- BERT: Sentence embeddings using Siamese BERT- networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natu- ral Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics. Rachel Rudinger, Pushpendre Ras...

work page 2019

[8] [8]

In Proceedings of the 2015 Con- ference on Empirical Methods in Natural Language Processing, pages 1681–1686, Lisbon, Portugal

Script induction as language modeling. In Proceedings of the 2015 Con- ference on Empirical Methods in Natural Language Processing, pages 1681–1686, Lisbon, Portugal. As- sociation for Computational Linguistics. Roger C Schank and Robert P Abelson

work page 2015

[9] [9]

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4615–4629, Online

Unsupervised commonsense question answering with self-talk. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4615–4629, Online. Association for Computa- tional Linguistics. Zhiyi Song, Ann Bies, Stephanie Strassel, Tom Riese, Justin Mott, Joe Ellis, Jonathan Wright, Seth Kulick, Neville Ryant, and Xiaoyi Ma

work page 2020

[10] [10]

Association for Computa- tional Linguistics

On NMT search errors and model errors: Cat got your tongue? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Lan- guage Processing (EMNLP-IJCNLP) , pages 3356– 3362, Hong Kong, China. Association for Computa- tional Linguistics. David Wadden, Ulme Wennberg, Yi L...

work page 2019

[11] [11]

Entity, relation, and event extraction with contextualized span representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Lan- guage Processing (EMNLP-IJCNLP) , pages 5784– 5789, Hong Kong, China. Association for Computa- tional Linguistics. Christopher ...

work page 2019

[12] [12]

Web Down- load

ACE 2005 Multilin- gual Training Corpus LDC2006T06. Web Down- load. Philadelphia: Linguistic Data Consortium. Noah Weber, Rachel Rudinger, and Benjamin Van Durme

work page 2005

[13] [13]

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7583–7596, Online

Causal inference of script knowl- edge. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7583–7596, Online. Association for Computational Linguistics. Noah Weber, Leena Shekhar, Niranjan Balasubrama- nian, and Nathanael Chambers

work page 2020

[14] [14]

In Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing , pages 3783–3792, Brussels, Belgium

Hierarchi- cal quantized representations for script generation. In Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing , pages 3783–3792, Brussels, Belgium. Association for Computational Linguistics. Adina Williams, Nikita Nangia, and Samuel Bowman

work page 2018

[15] [15]

A broad-coverage challenge corpus for sen- tence understanding through inference. In Proceed- ings of the 2018 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies, Volume 1 (Long Papers) , pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguis- tics

work page 2018