Script2Screen: Supporting Dialogue Scriptwriting with Interactive Audiovisual Generation

Bryan Wang; Eitan Grinspun; Jiaju Ma; Tovi Grossman; Zhecheng Wang

arxiv: 2504.14776 · v2 · submitted 2025-04-21 · 💻 cs.HC

Script2Screen: Supporting Dialogue Scriptwriting with Interactive Audiovisual Generation

Zhecheng Wang , Jiaju Ma , Eitan Grinspun , Tovi Grossman , Bryan Wang This is my paper

Pith reviewed 2026-05-22 19:32 UTC · model grok-4.3

classification 💻 cs.HC

keywords scriptwritingdialogue writingAI-assisted creationaudiovisual generationinteractive interfacesuser studyhuman-computer interactioniterative refinement

0 comments

The pith

Script2Screen links text dialogue scripts to AI-generated audiovisual scenes so writers can see and adjust emotional speech, gestures, and camera angles during iteration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Script2Screen as a tool that unifies scriptwriting with real-time audiovisual generation focused on dialogues. It uses a text-to-audiovisual pipeline to create scenes with animated characters delivering emotional speeches, then lets users fine-tune gestures, emotions, and camera angles through the interface. A user study with novice and professional writers across domains found that this setup supports iterative refinement of scripts while complementing rather than replacing the writer's creative process. Traditional text-only scriptwriting gives only partial sense of the final audiovisual result, so connecting the two modalities can aid ideation especially for dialogue-heavy work.

Core claim

Script2Screen integrates scriptwriting with audiovisual scene creation through a novel text-to-audiovisual-scene pipeline that generates expressive scenes featuring emotional speeches and animated characters; the interface supplies fine-grained controls over character gestures, speech emotions, and camera angles, and a user study with novice and professional writers from various domains shows the approach enhances the scriptwriting process by enabling iterative refinement while complementing creative efforts.

What carries the argument

The text-to-audiovisual-scene pipeline that converts dialogue text into synchronized scenes with emotional speech and animated characters, paired with an interface for direct adjustment of gestures, emotions, and camera angles.

If this is right

Writers gain the ability to iterate on dialogue by immediately viewing and tweaking the corresponding audiovisual elements rather than relying on imagination alone.
The interactive controls allow fine adjustments to emotional tone and visual framing without leaving the scriptwriting environment.
Both novice and experienced writers can use the system to explore scene ideas while retaining full creative control over the final text.
The workflow treats audiovisual generation as a feedback aid that supports refinement cycles instead of an automated replacement for human authorship.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the same synchronized pipeline to non-dialogue elements like action descriptions or scene transitions could broaden the tool's reach beyond its current dialogue focus.
Embedding the controls into existing professional screenwriting software might reduce the learning curve and increase adoption among working writers.
Collecting logged failure cases from actual use sessions would let developers target specific improvements in generation quality that the current formative study left unspecified.

Load-bearing premise

The generated audiovisual outputs for speech emotion, gestures, and camera angles are accurate and controllable enough to give useful feedback instead of misleading or distracting the writer.

What would settle it

A controlled study in which writers using the tool report that inaccurate or uncontrollable generations led to more wasted time or poorer final scripts than text-only writing would show the central claim is false.

Figures

Figures reproduced from arXiv: 2504.14776 by Bryan Wang, Eitan Grinspun, Jiaju Ma, Tovi Grossman, Zhecheng Wang.

**Figure 1.** Figure 1: The Script2Screen interface, captured from a writing session. Script2Screen allows screenwriters to seamlessly visualize and hear their scenes. Writers can select a workspace from the Scenes navigation bar (A), where the corresponding script loads into the Script Editor (B). After the script is processed, the Screen Preview (C) offers a live 3D animation with synced audio, complete with customizable camera… view at source ↗

**Figure 2.** Figure 2: Design space of audiovisual storytelling systems positioned by gran [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: In each dialogue card, when the user hovers over the text area, [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Overview of the Script2Screen text-to-audiovisual generation pipeline. The pipeline begins with the user’s written script (1), which is processed by the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The text annotation module parses the script using Large Language Models (LLMs) to extract dialogue, speaker names, and other narrative elements [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: The system generates rhythmic character gestures synchronized with spoken dialogue, enhancing the expressiveness of the scene. The highlighted [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of average ratings between baseline and Script2Screen across the Creative Support Index. Error bars represent the standard deviation. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: This bar plot illustrates the feedback from participants on various fea [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

Scriptwriting has traditionally been text-centric, a modality that only partially conveys the produced audiovisual experience. A formative study with professional writers informed us that connecting textual and audiovisual modalities can aid ideation and iteration, especially for writing dialogues. In this work, we present Script2Screen, an AI-assisted tool that integrates scriptwriting with audiovisual scene creation in a unified, synchronized workflow. Focusing on dialogues in scripts, Script2Screen generates expressive scenes with emotional speeches and animated characters through a novel text-to-audiovisual-scene pipeline. The user interface provides fine-grained controls, allowing writers to fine-tune audiovisual elements such as character gestures, speech emotions, and camera angles. A user study with both novice and professional writers from various domains demonstrated that Script2Screen's interactive audiovisual generation enhances the scriptwriting process, facilitating iterative refinement while complementing, rather than replacing, their creative efforts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Script2Screen provides an interactive way to generate and refine audiovisual dialogue scenes from scripts, though the user study lacks the controls needed to strongly support its effectiveness claims.

read the letter

The main thing to know is that Script2Screen is a tool that generates audiovisual scenes from dialogue scripts and lets writers interactively adjust gestures, speech emotions, and camera angles in a unified interface, with a user study claiming it helps with iterative refinement. What the paper does well is present a practical system that addresses the gap between text-based scriptwriting and the audiovisual nature of the final product. The formative study with professionals seems to have guided the design toward useful controls, and testing with both novices and pros shows some thought about different user groups. The pipeline for text-to-audiovisual generation with synchronization is a solid engineering contribution for this domain. The soft spots are in the evaluation. The abstract and description provide no details on the number of participants, specific metrics used, statistical analysis, or a control condition like writing without the audiovisual features. This makes it difficult to separate the tool's actual impact from effects like novelty or demand characteristics. Without objective measures of script quality or iteration efficiency, the claim that it enhances the process rests heavily on unblinded feedback. This paper is for HCI researchers and developers building creative support tools, particularly those interested in multimodal AI for media. A reader working on scriptwriting software or generative interfaces would get value from the workflow integration and control mechanisms. It deserves a serious referee because it offers a concrete implementation and user feedback on a relevant problem. I would recommend engaging with this work in peer review. The core idea is sound and the system appears functional, even if the study needs more structure to fully support the conclusions.

Referee Report

2 major / 2 minor

Summary. The paper presents Script2Screen, an AI-assisted tool that integrates textual scriptwriting with a text-to-audiovisual-scene pipeline to generate expressive dialogue scenes featuring emotional speech, animated characters, gestures, and camera angles. Fine-grained UI controls allow writers to adjust these elements. A formative study with professional writers informed the design, and a subsequent user study with both novice and professional writers from various domains is claimed to demonstrate that the interactive audiovisual generation enhances the scriptwriting process, supports iterative refinement, and complements rather than replaces creative efforts.

Significance. If the empirical claims hold under closer scrutiny, the work could contribute to HCI by showing how synchronized multimodal AI generation can support creative workflows in scriptwriting, a domain traditionally limited to text. The emphasis on user-controllable elements like gestures and emotions represents a practical strength that could generalize to other narrative tools.

major comments (2)

[User Study] User Study section: The abstract and available description state that the user study demonstrated enhancement of the scriptwriting process, but provide no sample size, quantitative metrics, study design details (e.g., within- or between-subjects control condition such as text-only baseline), objective measures (e.g., expert-rated dialogue quality, revision counts, or completion time), or statistical tests. This leaves the central claim dependent on unblinded self-report and vulnerable to novelty or demand effects.
[Abstract and System Description] Abstract and System Description: The claim that AI-generated audiovisual outputs (emotional speech, gestures, camera angles) provide useful feedback rests on an assumption of sufficient accuracy and controllability, informed by a formative study. However, no details on generation quality, failure cases, or validation of the pipeline's reliability are supplied, which is load-bearing for whether the interactive loop aids rather than misleads writers.

minor comments (2)

[Related Work] Consider expanding the related work section to include more citations on prior multimodal creative tools or dialogue generation systems to better contextualize the novelty of the synchronized workflow.
[Figures] Ensure that any figures depicting the UI or pipeline include detailed captions that label all interactive controls and generation stages for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below and indicate planned revisions to improve the clarity, rigor, and transparency of the work.

read point-by-point responses

Referee: [User Study] User Study section: The abstract and available description state that the user study demonstrated enhancement of the scriptwriting process, but provide no sample size, quantitative metrics, study design details (e.g., within- or between-subjects control condition such as text-only baseline), objective measures (e.g., expert-rated dialogue quality, revision counts, or completion time), or statistical tests. This leaves the central claim dependent on unblinded self-report and vulnerable to novelty or demand effects.

Authors: We agree that the user study presentation would benefit from greater detail and explicit discussion of its limitations. The study was exploratory and qualitative in focus, involving both novice and professional writers who completed dialogue writing tasks with the tool to observe iterative refinement. Data were gathered primarily through interviews, think-aloud protocols, and session observations rather than through controlled quantitative metrics or statistical testing. We will revise the User Study section to specify the participant count and backgrounds, clarify the within-subjects structure with opportunities for comparison to text-only writing, report any observable patterns (such as iteration frequency), and add an explicit discussion of reliance on self-reports along with potential biases including novelty and demand effects. This will better contextualize the claims without overstating the evidence. revision: partial
Referee: [Abstract and System Description] Abstract and System Description: The claim that AI-generated audiovisual outputs (emotional speech, gestures, camera angles) provide useful feedback rests on an assumption of sufficient accuracy and controllability, informed by a formative study. However, no details on generation quality, failure cases, or validation of the pipeline's reliability are supplied, which is load-bearing for whether the interactive loop aids rather than misleads writers.

Authors: The formative study with professional writers directly shaped the choice of controllable parameters in the text-to-audiovisual pipeline. Subsequent user study participants reported that the generated outputs supported their creative iteration despite imperfections. We acknowledge that the manuscript currently lacks explicit discussion of generation quality, failure modes, and pipeline validation. We will add a dedicated subsection (likely within System Description or Limitations) that describes the underlying models, reports observed quality characteristics from the studies, and enumerates common failure cases such as emotion-speech mismatches or gesture desynchronization, along with how the fine-grained UI controls allow users to correct or work around them. This will make the assumptions more transparent and demonstrate how the interactive loop remains useful even when outputs are imperfect. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

This is an applied HCI system paper describing an interactive tool (Script2Screen) and its evaluation via a user study with novice and professional writers. The abstract and available text contain no equations, mathematical derivations, fitted parameters, predictions, or first-principles results. Central claims rest on empirical user feedback about process enhancement and iterative refinement rather than any derivational chain. No self-citation load-bearing steps, ansatz smuggling, or reductions of outputs to inputs by construction are present. The paper is self-contained against external benchmarks (user study results) and receives a normal non-finding for circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that current generative models can produce usable expressive audiovisual scenes from dialogue text, and that user studies can validate creative support without detailed quantitative validation in the provided summary.

axioms (1)

domain assumption Connecting textual and audiovisual modalities aids ideation and iteration in dialogue scriptwriting
Stated as informed by the formative study with professional writers.

pith-pipeline@v0.9.0 · 5687 in / 1281 out tokens · 48525 ms · 2026-05-22T19:32:10.908902+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 2 internal anchors

[1]

Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Tee- van, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk)(CHI ’19). Assoc...

work page doi:10.1145/3290605.3300233 2019
[2]

Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, and Libin Liu. 2022. Rhyth- mic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings.ACM Trans. Graph.41, 6, Article 209 (nov 2022), 19 pages. https://doi.org/10.1145/3550454.3555435

work page doi:10.1145/3550454.3555435 2022
[3]

Bernstein, Greg Little, Robert C

Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2015. Soylent: a word processor with a crowd inside.Commun. ACM58, 8 (July 2015), 85–94. https://doi.org/10.1145/2791285

work page doi:10.1145/2791285 2015
[4]

Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Grossman

work page
[5]

InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23)

Promptify: Text-to-Image Generation through Interactive Prompt Explo- ration with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 96, 14 pages. https://doi.org/10.1145/3586183.3606725

work page doi:10.1145/3586183.3606725
[6]

2010.Sketching user experiences: getting the design right and the right design

Bill Buxton. 2010.Sketching user experiences: getting the design right and the right design. Morgan kaufmann

work page 2010
[7]

Michelle Cavaleri and Saib Dianati. 2016. You want me to check your grammar again?: The usefulness of an online grammar checker as perceived by students. Journal of Academic Language and Learning10, 1 (2016), A223–A236

work page 2016
[8]

Kang Chen, Zhipeng Tan, Jin Lei, Song-Hai Zhang, Yuan-Chen Guo, Weidong Zhang, and Shi-Min Hu. 2021. ChoreoMaster: choreography-oriented music- driven dance synthesis.ACM Trans. Graph.40, 4, Article 145 (July 2021), 13 pages. https://doi.org/10.1145/3450626.3459932

work page doi:10.1145/3450626.3459932 2021
[9]

Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index.ACM Trans. Comput.-Hum. Interact.21, 4, Article 21 (June 2014), 25 pages. https://doi.org/10.1145/2617588

work page doi:10.1145/2617588 2014
[10]

Jean-Peïc Chou, Alexa Fay Siu, Nedim Lipka, Ryan Rossi, Franck Dernoncourt, and Maneesh Agrawala. 2023. TaleStream: Supporting Story Ideation with Trope Knowledge. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–12

work page 2023
[11]

John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories with Generative Pretrained Language Models. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 209, 19 pag...

work page doi:10.1145/3491102.3501819 2022
[12]

John Joon Young Chung and Max Kreminski. 2024. Patchview: LLM-Powered Worldbuilding with Generative Dust and Magnet Visualization.arXiv preprint arXiv:2408.04112(2024)

work page arXiv 2024
[13]

Marc Davis. 2003. Active capture: integrating human-computer interaction and computer vision/audition to automate media capture. In2003 International Conference on Multimedia and Expo. ICME’03. Proceedings (Cat. No. 03TH8698), Vol. 2. IEEE, II–185

work page 2003
[14]

Rudresh Dwivedi, Devam Dave, Het Naik, Smiti Singhal, Rana Omer, Pankesh Patel, Bin Qian, Zhenyu Wen, Tejal Shah, Graham Morgan, et al. 2023. Explainable AI (XAI): Core ideas, techniques, and solutions.Comput. Surveys55, 9 (2023), 1–33

work page 2023
[15]

ElevenLabs. 2024. ElevenLabs. https://elevenlabs.io TTS generation web service

work page 2024
[16]

Michael L Epstein, Amber D Lazarus, Tammy B Calvano, Kelly A Matthews, Rachel A Hendel, Beth B Epstein, and Gary M Brosvic. 2002. Immediate feedback assessment technique promotes learning and corrects inaccurate first responses. The Psychological Record52 (2002), 187–201

work page 2002
[17]

2005.Screenplay: The foundations of screenwriting

Syd Field. 2005.Screenplay: The foundations of screenwriting. Delta

work page 2005
[18]

Troje, and Marc-André Carbonneau

Saeed Ghorbani, Ylva Ferstl, Daniel Holden, Nikolaus F. Troje, and Marc-André Carbonneau. 2023. ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech.Computer Graphics Forum42, 1 (2023), 206–216. https://doi.org/10. 1111/cgf.14734 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14734

work page doi:10.1111/cgf.14734 2023
[19]

Chieh-Yang Huang, Shih-Hong Huang, and Ting-Hao Kenneth Huang. 2020. Heteroglossia: In-situ story ideation with the crowd. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12

work page 2020
[20]

Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, and Dahua Lin. 2020. MovieNet: A Holistic Dataset for Movie Understanding. InComputer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV(Glasgow, United Kingdom). Springer-Verlag, Berlin, Heidelberg, 709–727. https://doi.org/10.1007/978-3-030-58548-8_41

work page doi:10.1007/978-3-030-58548-8_41 2020
[21]

Paul Huffman and James Hutson. 2024. Enhancing History Education with Google NotebookLM: Case Study of Mary Easton Sibley’s Diary for Multimedia Content and Podcast Creation.ISRG Journal of Arts, Humanities and Social Sciences2, 5 (2024)

work page 2024
[22]

Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman

work page
[23]

In Proceedings of the 2023 CHI conference on human factors in computing systems

Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI conference on human factors in computing systems. 1–15

work page 2023
[24]

Zeyu Jin, Gautham J Mysore, Stephen Diverdi, Jingwan Lu, and Adam Finkelstein

work page
[25]

Voco: Text-based insertion and replacement in audio narration.ACM Transactions on Graphics (TOG)36, 4 (2017), 1–13

work page 2017
[26]

Hye-Young Jo, Ryo Suzuki, and Yoonji Kim. 2024. CollageVis: Rapid Previsualiza- tion Tool for Indie Filmmaking using Video Collages. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 164, 16 pages. https://doi.org/10.1145/3613904.3642575

work page doi:10.1145/3613904.3642575 2024
[27]

Jun Kato, Kenta Hara, and Nao Hirasawa. 2024. Griffith: A Storyboarding Tool Designed with Japanese Animation Professionals. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 233, 14 pages. https://doi.org/10.1145/3613904.3642121

work page doi:10.1145/3613904.3642121 2024
[28]

Script2Screen: Supporting Dialogue Scriptwriting with Interactive Audiovisual Generation•15

Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, and Neil Zeghidour. Script2Screen: Supporting Dialogue Scriptwriting with Interactive Audiovisual Generation•15

work page
[29]

Speak, read and prompt: High-fidelity text-to-speech with minimal super- vision.Transactions of the Association for Computational Linguistics11 (2023), 1703–1718

work page 2023
[30]

Han-Jong Kim, Chang Min Kim, and Tek-Jin Nam. 2018. SketchStudio: Experience Prototyping with 2.5-Dimensional Animated Design Scenarios. InProceedings of the 2018 Designing Interactive Systems Conference(Hong Kong, China)(DIS ’18). Association for Computing Machinery, New York, NY, USA, 831–843. https: //doi.org/10.1145/3196709.3196736

work page doi:10.1145/3196709.3196736 2018
[31]

Philippe Laban, Elicia Ye, Srujay Korlakunta, John Canny, and Marti Hearst. 2022. Newspod: Automatic and interactive news podcasts. InProceedings of the 27th International Conference on Intelligent User Interfaces. 691–706

work page 2022
[32]

Stefan Larsson and Fredrik Heintz. 2020. Transparency in artificial intelligence. Internet policy review9, 2 (2020)

work page 2020
[33]

Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Com- putational video editing for dialogue-driven scenes.ACM Trans. Graph.36, 4, Article 130 (July 2017), 14 pages. https://doi.org/10.1145/3072959.3073653

work page doi:10.1145/3072959.3073653 2017
[34]

Kim, and Maneesh Agrawala

Mackenzie Leake, Hijung Valentina Shin, Joy O. Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/331...

work page doi:10.1145/3313831.3376519 2020
[35]

Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 388, 19 pages. https://doi.org/1...

work page doi:10.1145/3491102.3502030 2022
[36]

Susan Lin, Jeremy Warner, JD Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Shanqing Cai, Piyawat Lertvittayakumjorn, Michael Xuelin Huang, Shumin Zhai, Björn Hartmann, et al . 2024. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. InProceedings of the CHI Conference on Human Factors in Computing Systems. 1–19

work page 2024
[37]

Vivian Liu, Tao Long, Nathan Raw, and Lydia Chilton. 2023. Generative disco: Text-to-video generation for music visualization.arXiv preprint arXiv:2304.08551 (2023)

work page arXiv 2023
[38]

Jiaju Ma, Li-Yi Wei, and Rubaiat Habib Kazi. 2022. A Layered Authoring Tool for Stylized 3D animations. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 383, 14 pages. https: //doi.org/10.1145/3491102.3501894

work page doi:10.1145/3491102.3501894 2022
[39]

Marcel Marti, Jodok Vieli, Wojciech Witoń, Rushit Sanghrajka, Daniel Inversini, Diana Wotruba, Isabel Simo, Sasha Schriber, Mubbasir Kapadia, and Markus Gross

work page
[40]

InProceedings of the 23rd International Conference on Intelligent User Interfaces(Tokyo, Japan) (IUI ’18)

CARDINAL: Computer Assisted Authoring of Movie Scripts. InProceedings of the 23rd International Conference on Intelligent User Interfaces(Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 509–519. https://doi.org/10.1145/3172944.3172972

work page doi:10.1145/3172944.3172972
[41]

Justin Matejka, Michael Glueck, Erin Bradner, Ali Hashemi, Tovi Grossman, and George Fitzmaurice. 2018. Dream Lens: Exploration and Visualization of Large-Scale Generative Design Datasets. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems(Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1...

work page doi:10.1145/3173574.3173943 2018
[42]

Robert McKee. 1997. Substance, structure, style, and the principles of screenwrit- ing.Alba Editorial(1997)

work page 1997
[43]

There’s so much responsibility on users right now:

Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, and Richard Evans. 2023. Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 355, 34...

work page doi:10.1145/3544548.3581225 2023
[44]

Bryan, Juan-Pablo Caceres, and Bryan Pardo

Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, and Bryan Pardo. 2021. Context-Aware Prosody Correction for Text-Based Speech Editing. arXiv:2102.08328 [eess.AS] https://arxiv.org/abs/2102.08328

work page arXiv 2021
[45]

Daniel Naber et al. 2003. A rule-based style and grammar checker. (2003)

work page 2003
[46]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Amy Pavel, Gabriel Reyes, and Jeffrey P Bigham. 2020. Rescribe: Authoring and automatically editing audio descriptions. InProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 747–759

work page 2020
[48]

Anyi Rao, Jean-Peïc Chou, and Maneesh Agrawala. 2024. ScriptViz: A Vi- sualization Tool to Aid Scriptwriting based on a Large Movie Database. arXiv:2410.03224 [cs.HC] https://arxiv.org/abs/2410.03224

work page arXiv 2024
[49]

Anyi Rao, Jean-Peïc Chou, and Maneesh Agrawala. 2024. ScriptViz: A Visualiza- tion Tool to Aid Scriptwriting based on a Large Movie Database. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

work page 2024
[50]

Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, and Bo Dai. 2023. Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production. InACM SIGGRAPH 2023 Posters(Los Angeles, CA, USA)(SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, Article 40, 2 pages. https://doi.org/10.1145/35...

work page doi:10.1145/3588028.3603647 2023
[51]

Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2019. Fastspeech: Fast, robust and controllable text to speech.Advances in neural information processing systems32 (2019)

work page 2019
[52]

Mohi Reza, Nathan M Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan “Michael” Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, and Joseph Jay Williams. 2024. ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models. InProceedings of the 2024 CHI Conference on ...

work page doi:10.1145/3613904.3641899 2024
[53]

Steve Rubin and Maneesh Agrawala. 2014. Generating emotionally relevant musical scores for audio stories. InProceedings of the 27th annual ACM symposium on User interface software and technology. 439–448

work page 2014
[54]

Steve Rubin, Floraine Berthouzoz, Gautham J Mysore, and Maneesh Agrawala

work page
[55]

InProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology

Capture-time feedback for recording scripted narration. InProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 191–199

work page
[56]

Steve Rubin, Floraine Berthouzoz, Gautham J Mysore, Wilmot Li, and Maneesh Agrawala. 2013. Content-based tools for editing audio stories. InProceedings of the 26th annual ACM symposium on User interface software and technology. 113–122

work page 2013
[57]

Prem Seetharaman, Gautham Mysore, Bryan Pardo, Paris Smaragdis, and Celso Gomes. 2019. Voiceassist: Guiding users to high-quality voice recordings. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–6

work page 2019
[58]

Hijung Valentina Shin, Wilmot Li, and Frédo Durand. 2016. Dynamic authoring of audio with linked scripts. InProceedings of the 29th Annual Symposium on User Interface Software and Technology. 509–516

work page 2016
[59]

Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, and Tan Lee

work page
[60]

In2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Editspeech: A text based speech editing system using partial inference and bidirectional fusion. In2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 626–633

work page
[61]

Shrikant Venkataramani, Paris Smaragdis, and Gautham Mysore. 2017. AutoDub: Automatic Redubbing for Voiceover Editing. InProceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 533–538

work page 2017
[62]

Bryan Wang, Zeyu Jin, and Gautham Mysore. 2022. Record Once, Post Every- where: Automatic Shortening of Audio Stories for Social Media. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–11

work page 2022
[63]

Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, and Raj Sodhi. 2024. LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing. InProceedings of the 29th International Conference on Intelligent User Interfaces(Greenville, SC, USA)(IUI ’24). Association for Computing Machinery, New York, NY, USA, 699–714. https://doi.org/10.11...

work page doi:10.1145/3640543.3645143 2024
[64]

Bryan Wang, Meng Yu Yang, and Tovi Grossman. 2021. Soloist: Generating mixed-initiative tutorials from existing guitar instructional videos through au- dio processing. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14

work page 2021
[65]

Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, et al . 2017. Tacotron: Towards end-to-end speech synthesis.arXiv preprint arXiv:1703.10135 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[66]

Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, and Tianyi Zhang. 2024. PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement. InProceedings of the 2024 CHI Conference on Human Factors in Com- puting Systems

work page 2024
[67]

Haijun Xia. 2020. Crosspower: Bridging Graphics and Linguistics. InProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 722–734

work page 2020
[68]

Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: Adding Visuals to Audio Travel Podcasts. InProceedings of the 33rd Annual ACM Sympo- sium on User Interface Software and Technology(Virtual Event, USA)(UIST ’20). Association for Computing Machinery, New York, NY, USA, 735–746. https: //doi.org/10.1145/3379337.3415882

work page doi:10.1145/3379337.3415882 2020
[69]

Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. 2022. Wordcraft: story writing with large language models. InProceedings of the 27th International Conference on Intelligent User Interfaces. 841–852

work page 2022
[70]

id " :

Yeshuang Zhu, Shichao Yue, Chun Yu, and Yuanchun Shi. 2017. CEPT: Collabora- tive editing tool for non-native authors. InProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 273–285. 16•Wang, et al. A PROMPT DESIGN AND EXAMPLES In this appendix, we present the detailed prompts and examples used for suggesting...

work page 2017

[1] [1]

Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Tee- van, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk)(CHI ’19). Assoc...

work page doi:10.1145/3290605.3300233 2019

[2] [2]

Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, and Libin Liu. 2022. Rhyth- mic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings.ACM Trans. Graph.41, 6, Article 209 (nov 2022), 19 pages. https://doi.org/10.1145/3550454.3555435

work page doi:10.1145/3550454.3555435 2022

[3] [3]

Bernstein, Greg Little, Robert C

Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2015. Soylent: a word processor with a crowd inside.Commun. ACM58, 8 (July 2015), 85–94. https://doi.org/10.1145/2791285

work page doi:10.1145/2791285 2015

[4] [4]

Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Grossman

work page

[5] [5]

InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23)

Promptify: Text-to-Image Generation through Interactive Prompt Explo- ration with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 96, 14 pages. https://doi.org/10.1145/3586183.3606725

work page doi:10.1145/3586183.3606725

[6] [6]

2010.Sketching user experiences: getting the design right and the right design

Bill Buxton. 2010.Sketching user experiences: getting the design right and the right design. Morgan kaufmann

work page 2010

[7] [7]

Michelle Cavaleri and Saib Dianati. 2016. You want me to check your grammar again?: The usefulness of an online grammar checker as perceived by students. Journal of Academic Language and Learning10, 1 (2016), A223–A236

work page 2016

[8] [8]

Kang Chen, Zhipeng Tan, Jin Lei, Song-Hai Zhang, Yuan-Chen Guo, Weidong Zhang, and Shi-Min Hu. 2021. ChoreoMaster: choreography-oriented music- driven dance synthesis.ACM Trans. Graph.40, 4, Article 145 (July 2021), 13 pages. https://doi.org/10.1145/3450626.3459932

work page doi:10.1145/3450626.3459932 2021

[9] [9]

Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index.ACM Trans. Comput.-Hum. Interact.21, 4, Article 21 (June 2014), 25 pages. https://doi.org/10.1145/2617588

work page doi:10.1145/2617588 2014

[10] [10]

Jean-Peïc Chou, Alexa Fay Siu, Nedim Lipka, Ryan Rossi, Franck Dernoncourt, and Maneesh Agrawala. 2023. TaleStream: Supporting Story Ideation with Trope Knowledge. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–12

work page 2023

[11] [11]

John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories with Generative Pretrained Language Models. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 209, 19 pag...

work page doi:10.1145/3491102.3501819 2022

[12] [12]

John Joon Young Chung and Max Kreminski. 2024. Patchview: LLM-Powered Worldbuilding with Generative Dust and Magnet Visualization.arXiv preprint arXiv:2408.04112(2024)

work page arXiv 2024

[13] [13]

Marc Davis. 2003. Active capture: integrating human-computer interaction and computer vision/audition to automate media capture. In2003 International Conference on Multimedia and Expo. ICME’03. Proceedings (Cat. No. 03TH8698), Vol. 2. IEEE, II–185

work page 2003

[14] [14]

Rudresh Dwivedi, Devam Dave, Het Naik, Smiti Singhal, Rana Omer, Pankesh Patel, Bin Qian, Zhenyu Wen, Tejal Shah, Graham Morgan, et al. 2023. Explainable AI (XAI): Core ideas, techniques, and solutions.Comput. Surveys55, 9 (2023), 1–33

work page 2023

[15] [15]

ElevenLabs. 2024. ElevenLabs. https://elevenlabs.io TTS generation web service

work page 2024

[16] [16]

Michael L Epstein, Amber D Lazarus, Tammy B Calvano, Kelly A Matthews, Rachel A Hendel, Beth B Epstein, and Gary M Brosvic. 2002. Immediate feedback assessment technique promotes learning and corrects inaccurate first responses. The Psychological Record52 (2002), 187–201

work page 2002

[17] [17]

2005.Screenplay: The foundations of screenwriting

Syd Field. 2005.Screenplay: The foundations of screenwriting. Delta

work page 2005

[18] [18]

Troje, and Marc-André Carbonneau

Saeed Ghorbani, Ylva Ferstl, Daniel Holden, Nikolaus F. Troje, and Marc-André Carbonneau. 2023. ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech.Computer Graphics Forum42, 1 (2023), 206–216. https://doi.org/10. 1111/cgf.14734 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14734

work page doi:10.1111/cgf.14734 2023

[19] [19]

Chieh-Yang Huang, Shih-Hong Huang, and Ting-Hao Kenneth Huang. 2020. Heteroglossia: In-situ story ideation with the crowd. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12

work page 2020

[20] [20]

Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, and Dahua Lin. 2020. MovieNet: A Holistic Dataset for Movie Understanding. InComputer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV(Glasgow, United Kingdom). Springer-Verlag, Berlin, Heidelberg, 709–727. https://doi.org/10.1007/978-3-030-58548-8_41

work page doi:10.1007/978-3-030-58548-8_41 2020

[21] [21]

Paul Huffman and James Hutson. 2024. Enhancing History Education with Google NotebookLM: Case Study of Mary Easton Sibley’s Diary for Multimedia Content and Podcast Creation.ISRG Journal of Arts, Humanities and Social Sciences2, 5 (2024)

work page 2024

[22] [22]

Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman

work page

[23] [23]

In Proceedings of the 2023 CHI conference on human factors in computing systems

Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI conference on human factors in computing systems. 1–15

work page 2023

[24] [24]

Zeyu Jin, Gautham J Mysore, Stephen Diverdi, Jingwan Lu, and Adam Finkelstein

work page

[25] [25]

Voco: Text-based insertion and replacement in audio narration.ACM Transactions on Graphics (TOG)36, 4 (2017), 1–13

work page 2017

[26] [26]

Hye-Young Jo, Ryo Suzuki, and Yoonji Kim. 2024. CollageVis: Rapid Previsualiza- tion Tool for Indie Filmmaking using Video Collages. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 164, 16 pages. https://doi.org/10.1145/3613904.3642575

work page doi:10.1145/3613904.3642575 2024

[27] [27]

Jun Kato, Kenta Hara, and Nao Hirasawa. 2024. Griffith: A Storyboarding Tool Designed with Japanese Animation Professionals. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 233, 14 pages. https://doi.org/10.1145/3613904.3642121

work page doi:10.1145/3613904.3642121 2024

[28] [28]

Script2Screen: Supporting Dialogue Scriptwriting with Interactive Audiovisual Generation•15

Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, and Neil Zeghidour. Script2Screen: Supporting Dialogue Scriptwriting with Interactive Audiovisual Generation•15

work page

[29] [29]

Speak, read and prompt: High-fidelity text-to-speech with minimal super- vision.Transactions of the Association for Computational Linguistics11 (2023), 1703–1718

work page 2023

[30] [30]

Han-Jong Kim, Chang Min Kim, and Tek-Jin Nam. 2018. SketchStudio: Experience Prototyping with 2.5-Dimensional Animated Design Scenarios. InProceedings of the 2018 Designing Interactive Systems Conference(Hong Kong, China)(DIS ’18). Association for Computing Machinery, New York, NY, USA, 831–843. https: //doi.org/10.1145/3196709.3196736

work page doi:10.1145/3196709.3196736 2018

[31] [31]

Philippe Laban, Elicia Ye, Srujay Korlakunta, John Canny, and Marti Hearst. 2022. Newspod: Automatic and interactive news podcasts. InProceedings of the 27th International Conference on Intelligent User Interfaces. 691–706

work page 2022

[32] [32]

Stefan Larsson and Fredrik Heintz. 2020. Transparency in artificial intelligence. Internet policy review9, 2 (2020)

work page 2020

[33] [33]

Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Com- putational video editing for dialogue-driven scenes.ACM Trans. Graph.36, 4, Article 130 (July 2017), 14 pages. https://doi.org/10.1145/3072959.3073653

work page doi:10.1145/3072959.3073653 2017

[34] [34]

Kim, and Maneesh Agrawala

Mackenzie Leake, Hijung Valentina Shin, Joy O. Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/331...

work page doi:10.1145/3313831.3376519 2020

[35] [35]

Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 388, 19 pages. https://doi.org/1...

work page doi:10.1145/3491102.3502030 2022

[36] [36]

Susan Lin, Jeremy Warner, JD Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Shanqing Cai, Piyawat Lertvittayakumjorn, Michael Xuelin Huang, Shumin Zhai, Björn Hartmann, et al . 2024. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. InProceedings of the CHI Conference on Human Factors in Computing Systems. 1–19

work page 2024

[37] [37]

Vivian Liu, Tao Long, Nathan Raw, and Lydia Chilton. 2023. Generative disco: Text-to-video generation for music visualization.arXiv preprint arXiv:2304.08551 (2023)

work page arXiv 2023

[38] [38]

Jiaju Ma, Li-Yi Wei, and Rubaiat Habib Kazi. 2022. A Layered Authoring Tool for Stylized 3D animations. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 383, 14 pages. https: //doi.org/10.1145/3491102.3501894

work page doi:10.1145/3491102.3501894 2022

[39] [39]

Marcel Marti, Jodok Vieli, Wojciech Witoń, Rushit Sanghrajka, Daniel Inversini, Diana Wotruba, Isabel Simo, Sasha Schriber, Mubbasir Kapadia, and Markus Gross

work page

[40] [40]

InProceedings of the 23rd International Conference on Intelligent User Interfaces(Tokyo, Japan) (IUI ’18)

CARDINAL: Computer Assisted Authoring of Movie Scripts. InProceedings of the 23rd International Conference on Intelligent User Interfaces(Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 509–519. https://doi.org/10.1145/3172944.3172972

work page doi:10.1145/3172944.3172972

[41] [41]

Justin Matejka, Michael Glueck, Erin Bradner, Ali Hashemi, Tovi Grossman, and George Fitzmaurice. 2018. Dream Lens: Exploration and Visualization of Large-Scale Generative Design Datasets. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems(Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1...

work page doi:10.1145/3173574.3173943 2018

[42] [42]

Robert McKee. 1997. Substance, structure, style, and the principles of screenwrit- ing.Alba Editorial(1997)

work page 1997

[43] [43]

There’s so much responsibility on users right now:

Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, and Richard Evans. 2023. Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 355, 34...

work page doi:10.1145/3544548.3581225 2023

[44] [44]

Bryan, Juan-Pablo Caceres, and Bryan Pardo

Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, and Bryan Pardo. 2021. Context-Aware Prosody Correction for Text-Based Speech Editing. arXiv:2102.08328 [eess.AS] https://arxiv.org/abs/2102.08328

work page arXiv 2021

[45] [45]

Daniel Naber et al. 2003. A rule-based style and grammar checker. (2003)

work page 2003

[46] [46]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[47] [47]

Amy Pavel, Gabriel Reyes, and Jeffrey P Bigham. 2020. Rescribe: Authoring and automatically editing audio descriptions. InProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 747–759

work page 2020

[48] [48]

Anyi Rao, Jean-Peïc Chou, and Maneesh Agrawala. 2024. ScriptViz: A Vi- sualization Tool to Aid Scriptwriting based on a Large Movie Database. arXiv:2410.03224 [cs.HC] https://arxiv.org/abs/2410.03224

work page arXiv 2024

[49] [49]

Anyi Rao, Jean-Peïc Chou, and Maneesh Agrawala. 2024. ScriptViz: A Visualiza- tion Tool to Aid Scriptwriting based on a Large Movie Database. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

work page 2024

[50] [50]

Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, and Bo Dai. 2023. Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production. InACM SIGGRAPH 2023 Posters(Los Angeles, CA, USA)(SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, Article 40, 2 pages. https://doi.org/10.1145/35...

work page doi:10.1145/3588028.3603647 2023

[51] [51]

Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2019. Fastspeech: Fast, robust and controllable text to speech.Advances in neural information processing systems32 (2019)

work page 2019

[52] [52]

Mohi Reza, Nathan M Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan “Michael” Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, and Joseph Jay Williams. 2024. ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models. InProceedings of the 2024 CHI Conference on ...

work page doi:10.1145/3613904.3641899 2024

[53] [53]

Steve Rubin and Maneesh Agrawala. 2014. Generating emotionally relevant musical scores for audio stories. InProceedings of the 27th annual ACM symposium on User interface software and technology. 439–448

work page 2014

[54] [54]

Steve Rubin, Floraine Berthouzoz, Gautham J Mysore, and Maneesh Agrawala

work page

[55] [55]

InProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology

Capture-time feedback for recording scripted narration. InProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 191–199

work page

[56] [56]

Steve Rubin, Floraine Berthouzoz, Gautham J Mysore, Wilmot Li, and Maneesh Agrawala. 2013. Content-based tools for editing audio stories. InProceedings of the 26th annual ACM symposium on User interface software and technology. 113–122

work page 2013

[57] [57]

Prem Seetharaman, Gautham Mysore, Bryan Pardo, Paris Smaragdis, and Celso Gomes. 2019. Voiceassist: Guiding users to high-quality voice recordings. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–6

work page 2019

[58] [58]

Hijung Valentina Shin, Wilmot Li, and Frédo Durand. 2016. Dynamic authoring of audio with linked scripts. InProceedings of the 29th Annual Symposium on User Interface Software and Technology. 509–516

work page 2016

[59] [59]

Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, and Tan Lee

work page

[60] [60]

In2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Editspeech: A text based speech editing system using partial inference and bidirectional fusion. In2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 626–633

work page

[61] [61]

Shrikant Venkataramani, Paris Smaragdis, and Gautham Mysore. 2017. AutoDub: Automatic Redubbing for Voiceover Editing. InProceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 533–538

work page 2017

[62] [62]

Bryan Wang, Zeyu Jin, and Gautham Mysore. 2022. Record Once, Post Every- where: Automatic Shortening of Audio Stories for Social Media. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–11

work page 2022

[63] [63]

Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, and Raj Sodhi. 2024. LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing. InProceedings of the 29th International Conference on Intelligent User Interfaces(Greenville, SC, USA)(IUI ’24). Association for Computing Machinery, New York, NY, USA, 699–714. https://doi.org/10.11...

work page doi:10.1145/3640543.3645143 2024

[64] [64]

Bryan Wang, Meng Yu Yang, and Tovi Grossman. 2021. Soloist: Generating mixed-initiative tutorials from existing guitar instructional videos through au- dio processing. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14

work page 2021

[65] [65]

Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, et al . 2017. Tacotron: Towards end-to-end speech synthesis.arXiv preprint arXiv:1703.10135 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[66] [66]

Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, and Tianyi Zhang. 2024. PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement. InProceedings of the 2024 CHI Conference on Human Factors in Com- puting Systems

work page 2024

[67] [67]

Haijun Xia. 2020. Crosspower: Bridging Graphics and Linguistics. InProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 722–734

work page 2020

[68] [68]

Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: Adding Visuals to Audio Travel Podcasts. InProceedings of the 33rd Annual ACM Sympo- sium on User Interface Software and Technology(Virtual Event, USA)(UIST ’20). Association for Computing Machinery, New York, NY, USA, 735–746. https: //doi.org/10.1145/3379337.3415882

work page doi:10.1145/3379337.3415882 2020

[69] [69]

Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. 2022. Wordcraft: story writing with large language models. InProceedings of the 27th International Conference on Intelligent User Interfaces. 841–852

work page 2022

[70] [70]

id " :

Yeshuang Zhu, Shichao Yue, Chun Yu, and Yuanchun Shi. 2017. CEPT: Collabora- tive editing tool for non-native authors. InProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 273–285. 16•Wang, et al. A PROMPT DESIGN AND EXAMPLES In this appendix, we present the detailed prompts and examples used for suggesting...

work page 2017