A Good Talk Does not Look Like a Summary, It Teaches You! Measuring Takeaways from Paper-to-Video Talks

Ananya Sai; Aparna Garimella; Ishani Mondal; Jordan Boyd-Graber; Pannaga Shivaswamy

arxiv: 2606.28531 · v1 · pith:CZ6AT6TMnew · submitted 2026-06-26 · 💻 cs.MM · cs.CL

A Good Talk Does not Look Like a Summary, It Teaches You! Measuring Takeaways from Paper-to-Video Talks

Ishani Mondal , Aparna Garimella , Ananya Sai , Pannaga Shivaswamy , Jordan Boyd-Graber This is my paper

Pith reviewed 2026-06-30 00:57 UTC · model grok-4.3

classification 💻 cs.MM cs.CL

keywords paper-to-video generationinstructional quality evaluationscientific presentation videosEffectivePresentationScorerexplanatory contentbackground conceptsvideo evaluation metrics

0 comments

The pith

Generated videos from papers mention key topics and follow structure but fail to explain background concepts or why methods work.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EffectivePresentationScorer to measure whether paper-to-video talks actually teach viewers rather than merely summarizing content. It evaluates videos on three criteria: clear explanation of main ideas, introduction of prerequisite background concepts, and explicit connection of technical details back to the paper's core contribution. When applied to existing generation systems, the scorer finds that videos pass topic-presence and structure checks yet routinely omit needed explanations. This gap matters because such videos are promoted for education and research dissemination, where viewers need understanding rather than lists of points. Existing metrics that focus only on visual quality or keyword overlap therefore miss the instructional shortfall.

Core claim

EffectivePresentationScorer shows that current paper-to-video systems produce outputs that mention the correct topics and follow the paper's outline but fail to introduce prerequisite concepts or clarify the rationale behind the method, and these instructional failures are not detected by prior evaluation metrics that emphasize content presence instead of explanatory quality.

What carries the argument

EffectivePresentationScorer, a framework that applies three checks for instructional quality: clear explanation of main ideas, introduction of needed background concepts, and connection of technical details to the main contribution.

If this is right

Video generation systems must add explicit mechanisms for background explanation and rationale rather than relying on paper structure alone.
Evaluation of scientific videos should incorporate instructional checks instead of stopping at topic coverage or visual metrics.
Educational use of automated paper videos requires new quality thresholds that penalize missing prerequisites.
Designers of paper-to-video tools need to model viewer knowledge gaps that the original paper assumes are already filled.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scorer could be adapted to evaluate other formats such as slide decks or podcast scripts from papers.
If generation models were trained with an auxiliary loss that rewards background introduction, the instructional scores might rise without changing topic coverage.
Longer videos that insert short explanatory segments before technical sections could address the observed gaps.

Load-bearing premise

The three checks for clear explanations, background concepts, and contribution links are enough to judge whether a video teaches effectively.

What would settle it

A controlled study in which viewers watch the generated videos and then answer questions about prerequisites and method rationale, compared against the scorer's pass/fail decisions on the same videos.

Figures

Figures reproduced from arXiv: 2606.28531 by Ananya Sai, Aparna Garimella, Ishani Mondal, Jordan Boyd-Graber, Pannaga Shivaswamy.

**Figure 1.** Figure 1: Step-by-step illustration of EffectivePresentationScorer using a running DefSent example. The pipeline starts from a source paper and an evaluation question (“Why does DefSent improve generalization?”), which is decomposed into a dependency-structured set of explanatory claims (C1 → C2 → C3). For each video variant (vA, vB, vC ) and for each claim, claim-wise agents assess Presence, Faithfulness, and Claim… view at source ↗

**Figure 2.** Figure 2: Prompt used to generate instructional-level [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

**Figure 4.** Figure 4: Prompt used to detect whether a papergrounded claim is explicitly present in the video timeline based on frame-level visual descriptions and aligned narration. Component / Agent Calls/Q Notes Claim Decomposition 1 Per (paper, question) Importance Agent 1 Per question Presence Agent 1 Checks all claims jointly Faithfulness Agent 1 Checks all claims jointly Coherence Agent 1 Ordering evaluation Delivery Ag… view at source ↗

**Figure 3.** Figure 3: Prompt used for paper-grounded question decomposition into dependency-structured claims. Prompt Task: Claim Presence Detection from Video You are given a single claim derived from a scientific paper and a time-ordered video timeline. Each video segment contains: (1) A textual description of the visual frame(s); (2) The narration spoken during the same timestamp. Your task is to determine whether the claim… view at source ↗

**Figure 6.** Figure 6: Prompt used to evaluate cross-claim coherence [PITH_FULL_IMAGE:figures/full_fig_p040_6.png] view at source ↗

**Figure 8.** Figure 8: Prompt used for single-agent question answer [PITH_FULL_IMAGE:figures/full_fig_p041_8.png] view at source ↗

**Figure 7.** Figure 7: Prompt used by the Meta-Evaluator to aggre [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗

**Figure 9.** Figure 9: Prompt used for single-agent question answer [PITH_FULL_IMAGE:figures/full_fig_p041_9.png] view at source ↗

**Figure 10.** Figure 10: Prompt used to score model-generated an [PITH_FULL_IMAGE:figures/full_fig_p042_10.png] view at source ↗

**Figure 12.** Figure 12: Prompt used to map human-written feedback [PITH_FULL_IMAGE:figures/full_fig_p042_12.png] view at source ↗

read the original abstract

Automatically generated videos from scientific papers are increasingly used for education and research dissemination. However, existing evaluation metrics mainly measure visual quality or whether key points from the paper appear in the video without assessing whether the video actually helps viewers understand the ideas. We introduce EffectivePresentationScorer, a framework for evaluating the instructional quality of scientific presentation videos. It checks whether a video explains the main ideas clearly, introduces needed background concepts, and connects technical details to the main contribution of the paper. When we apply EffectivePresentationScorer to the existing paper-to-video generation systems, we find that generated videos mention the correct topics and follow the structure of the paper but fail to explain prerequisite concepts or clarify why the method works. These failures are often ignored by existing video evaluation metrics, which focus on content presence rather than explanatory quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The scorer flags that paper-to-video videos list topics but skip explanations, yet it rests on unvalidated checks with no human correlation data shown.

read the letter

The paper's main point is that current paper-to-video systems produce videos that hit the right topics and follow paper structure but skip explaining prerequisites or why a method works. EffectivePresentationScorer tries to catch this with three checks: clear main-idea explanation, background introduction, and linking technical details to the contribution.

What is new is the shift from content-presence or visual-quality metrics to instructional quality. The abstract shows the authors applied it to existing systems and found the explanatory gaps that prior metrics miss. That distinction is useful for anyone building these tools.

The soft spot is the lack of any reported validation. The three checks are defined, but there are no details on implementation, no inter-rater numbers, and no tests against actual viewer comprehension. Without that, it is unclear whether the scorer measures teaching value or just rephrases presence metrics in new language. The central claim depends on the scorer being reliable, so this gap matters.

This is for researchers working on AI-generated educational content or science communication tools. A reader who needs a concrete way to evaluate explanatory quality could get value once the method is fleshed out.

It deserves peer review because it names a practical evaluation problem that existing work overlooks. The authors should be asked to add validation experiments and implementation specifics before acceptance.

Referee Report

2 major / 0 minor

Summary. The paper introduces EffectivePresentationScorer, a framework for evaluating the instructional quality of automatically generated scientific presentation videos. It defines three checks—whether the video explains main ideas clearly, introduces needed background concepts, and connects technical details to the paper's main contribution—and applies the scorer to existing paper-to-video systems. The central finding is that generated videos mention correct topics and follow paper structure but fail to explain prerequisites or clarify why methods work, a shortcoming not captured by prior metrics that focus on content presence rather than explanatory quality.

Significance. If the scorer can be shown to correlate with human judgments of teaching effectiveness, the work would provide a needed distinction between content-coverage metrics and instructional value, potentially guiding future paper-to-video systems toward better educational outcomes. The absence of any validation data or implementation details in the current manuscript leaves this potential unrealized.

major comments (2)

[Abstract] Abstract: The central claim that generated videos 'fail to explain prerequisite concepts or clarify why the method works' rests entirely on EffectivePresentationScorer being a reliable indicator of instructional quality. No implementation details, inter-rater agreement, correlation with viewer comprehension tests, or human validation data are reported for the three checks, so the reported distinction from existing content-presence metrics is unsupported.
[Abstract] Abstract: The three checks are presented as sufficient indicators of teaching value, yet the manuscript supplies no evidence that they are independent of simple topic-presence detection or that they predict actual viewer understanding; this makes the finding that existing systems 'mention the correct topics... but fail to explain' circular with the scorer definition itself.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for validation of EffectivePresentationScorer. We address each major comment below and outline planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that generated videos 'fail to explain prerequisite concepts or clarify why the method works' rests entirely on EffectivePresentationScorer being a reliable indicator of instructional quality. No implementation details, inter-rater agreement, correlation with viewer comprehension tests, or human validation data are reported for the three checks, so the reported distinction from existing content-presence metrics is unsupported.

Authors: We agree that the current manuscript does not report inter-rater agreement, correlation with comprehension tests, or human validation data for the three checks. The implementation details of the checks (clear explanation of ideas, prerequisite introduction, and contribution linkage) are defined in Section 3 based on established educational principles from the learning sciences literature. However, the absence of empirical validation against human judgments is a limitation. We will add a new subsection detailing the operational definitions with examples and include a small-scale human validation study correlating scorer outputs with viewer comprehension scores in the revised version. revision: yes
Referee: [Abstract] Abstract: The three checks are presented as sufficient indicators of teaching value, yet the manuscript supplies no evidence that they are independent of simple topic-presence detection or that they predict actual viewer understanding; this makes the finding that existing systems 'mention the correct topics... but fail to explain' circular with the scorer definition itself.

Authors: The checks are intended to assess explanatory depth rather than mere presence (e.g., by requiring that prerequisites appear before technical details and that method steps are explicitly tied to the paper's core claim). We acknowledge that without supporting evidence this distinction can appear definitional. In revision we will add concrete video examples demonstrating cases where topic coverage is high yet the checks flag explanatory gaps, and we will report initial correlation results from the planned human study to show independence from presence-only metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: new scorer defined independently and applied to external systems

full rationale

The paper defines EffectivePresentationScorer via three explicit checks (clear main-idea explanation, background introduction, technical-to-contribution linkage) and applies it to outputs from prior paper-to-video systems. No equations, parameters, or claims reduce by construction to the inputs; the finding that videos mention topics but fail explanatory checks follows directly from the new definition without self-referential fitting, renaming, or self-citation chains. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no specific free parameters, axioms, or invented entities are identifiable or extractable; the scorer criteria are presented at a conceptual level without implementation details.

pith-pipeline@v0.9.1-grok · 5688 in / 1108 out tokens · 34838 ms · 2026-06-30T00:57:21.228683+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 35 canonical work pages · 6 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[3]

2025 , eprint=

Paper2Video: Automatic Video Generation from Scientific Papers , author=. 2025 , eprint=

2025
[5]

Bloom's taxonomy of cognitive learning objectives

Adams, Nancy E. Bloom's taxonomy of cognitive learning objectives. J Med Libr Assoc
[6]

2024 , eprint=

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models , author=. 2024 , eprint=

2024
[7]

Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei , booktitle=
[8]

CLIPS core: A Reference-free Evaluation Metric for Image Captioning

Hessel, Jack and Holtzman, Ari and Forbes, Maxwell and Le Bras, Ronan and Choi, Yejin. CLIPS core: A Reference-free Evaluation Metric for Image Captioning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.595

work page doi:10.18653/v1/2021.emnlp-main.595 2021
[9]

Publications Manual , year = "1983", publisher =

1983
[10]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[11]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[12]

Dan Gusfield , title =. 1997

1997
[13]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[14]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[15]

2019 , eprint=

Towards Accurate Generative Models of Video: A New Metric & Challenges , author=. 2019 , eprint=

2019
[16]

2021 , eprint=

Item Response Theory -- A Statistical Framework for Educational and Psychological Measurement , author=. 2021 , eprint=

2021
[17]

2025 , eprint=

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer , author=. 2025 , eprint=

2025
[18]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Lingen: Towards high-resolution minute-length text-to-video generation with linear computational complexity , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[19]

arXiv preprint arXiv:2502.11079 , year=

Phantom: Subject-consistent video generation via cross-modal alignment , author=. arXiv preprint arXiv:2502.11079 , year=

work page arXiv
[20]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Identity-preserving text-to-video generation by frequency decomposition , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[21]

2022 , eprint=

Make-A-Video: Text-to-Video Generation without Text-Video Data , author=. 2022 , eprint=

2022
[22]

2021 , eprint=

Learning Transferable Visual Models From Natural Language Supervision , author=. 2021 , eprint=

2021
[23]

2021 , eprint=

Emerging Properties in Self-Supervised Vision Transformers , author=. 2021 , eprint=

2021
[25]

2024 , eprint=

VideoPhy: Evaluating Physical Commonsense for Video Generation , author=. 2024 , eprint=

2024
[27]

2024 , eprint=

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection , author=. 2024 , eprint=

2024
[28]

2025 , eprint=

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model , author=. 2025 , eprint=

2025
[30]

2023 , eprint=

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. 2023 , eprint=

2023
[31]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024
[32]

Annotating Educational Questions for Student Response Analysis

Godea, Andreea and Nielsen, Rodney. Annotating Educational Questions for Student Response Analysis. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018
[33]

Sustainability , VOLUME =

Timbi-Sisalima, Cristian and Sánchez-Gordón, Mary and Otón-Tortosa, Salvador and Mendoza-González, Ricardo , TITLE =. Sustainability , VOLUME =. 2024 , NUMBER =

2024
[34]

Proceedings of the 11th International Conference on Intelligent Tutoring Systems , pages =

Long, Yanjin and Aleven, Vincent , title =. Proceedings of the 11th International Conference on Intelligent Tutoring Systems , pages =. 2012 , isbn =. doi:10.1007/978-3-642-30950-2_115 , abstract =

work page doi:10.1007/978-3-642-30950-2_115 2012
[35]

C. P. Ormell , title =. Educational Research , volume =. 1974 , publisher =. doi:10.1080/0013188740170101 , URL =

work page doi:10.1080/0013188740170101 1974
[36]

2024 , eprint=

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models , author=. 2024 , eprint=

2024
[38]

Niekrenz, Lukas and Spreckelsen, Cord. How to design effective educational videos for teaching evidence-based medicine to undergraduate learners - systematic review with complementing qualitative research to develop a practicable guide. Med Educ Online
[43]

Students' Perceptions of Creating Educational Videos as a Teaching and Learning Strategy

Ram \'o n-Arbu \'e s, Enrique and Bl \'a zquez-Ornat, Isabel and Sagarra-Romero, Luc \' a and Benito-Ruiz, Eva and Ant \'o n-Solanas, Isabel and G \'o mez-Torres, Piedad. Students' Perceptions of Creating Educational Videos as a Teaching and Learning Strategy. Nurse Educ
[45]

2024 , eprint=

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models , author=. 2024 , eprint=

2024
[47]

2025 , eprint=

Video models are zero-shot learners and reasoners , author=. 2025 , eprint=

2025
[48]

ArXiv , year =

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation , author =. ArXiv , year =
[49]

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Reimers, Nils and Gurevych, Iryna. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020

2020
[50]

2008--2025 , archivePrefix =

GROBID , howpublished =. 2008--2025 , archivePrefix =

2008
[53]

Psychology of learning and motivation , volume=

Multimedia learning , author=. Psychology of learning and motivation , volume=. 2002 , publisher=

2002
[55]

2017 , school=

The Influence of Testing and Content Presentation Method on Mandatory Federal Employee Training , author=. 2017 , school=

2017
[56]

Lysne and Brant G

Steven J. Lysne and Brant G. Miller , title =. Journal of College Science Teaching , volume =. 2017 , publisher =. doi:10.2505/4/jcst17\_046\_06\_100 , URL =

work page doi:10.2505/4/jcst17 2017
[58]

and Liu, Lei and Ober, Teresa M

Watts, Field M. and Liu, Lei and Ober, Teresa M. and Song, Yi and Jusino-Del Valle, Euvelisse and Zhai, Xiaoming and Wang, Yun and Liu, Ninghao , TITLE =. Education Sciences , VOLUME =. 2025 , NUMBER =

2025
[59]

2026 , eprint=

Developing Authentic Simulated Learners for Mathematics Teacher Learning: Insights from Three Approaches with Large Language Models , author=. 2026 , eprint=

2026
[60]

Journal of Geoscience Education , year=

Instructional Utility and Learning Efficacy of Common Active Learning Strategies , author=. Journal of Geoscience Education , year=
[61]

2025 , eprint=

LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning , author=. 2025 , eprint=

2025
[62]

de Koning , and Halszka Jarodzka

Kevin Ackermans, Björn B. de Koning , and Halszka Jarodzka. 2025. https://doi.org/10.1016/j.learninstruc.2025.102137 Instructional videos and deeper processing: Insights and applications . Learning and Instruction, 98:102137

work page doi:10.1016/j.learninstruc.2025.102137 2025
[63]

Nancy E Adams. 2015. Bloom's taxonomy of cognitive learning objectives. J Med Libr Assoc, 103(3):152--153

2015
[64]

Paul Ayres and Kevin Ackermans. 2025. https://doi.org/10.1016/j.learninstruc.2024.102077 Some do's and don'ts of educational videos . Learning and Instruction, 96:102077

work page doi:10.1016/j.learninstruc.2024.102077 2025
[65]

Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, and Aditya Grover. 2024. https://arxiv.org/abs/2406.03520 Videophy: Evaluating physical commonsense for video generation . Preprint, arXiv:2406.03520

work page internal anchor Pith review Pith/arXiv arXiv 2024
[66]

Jie Cao, Ha Nguyen, Selim Yavuz, Boran Yu, Shuguang Wang, Pavneet Kaur Bharaj, and Dionne Cross Francis. 2026. https://arxiv.org/abs/2604.04361 Developing authentic simulated learners for mathematics teacher learning: Insights from three approaches with large language models . Preprint, arXiv:2604.04361

work page internal anchor Pith review Pith/arXiv arXiv 2026
[67]

Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Bill Yuchen Lin, and Wenhu Chen. 2024 a . https://doi.org/10.18653/v1/2024.emnlp-main.127 V ideo S core: Building automatic metrics to sim...

work page doi:10.18653/v1/2024.emnlp-main.127 2024
[68]

Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, and Wenhu Chen. 2024 b . https://arxiv.org/abs/2406.15252 Videoscore: Building automatic metrics to simulate fine-grained huma...

work page arXiv 2024
[69]

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. 2024 a . VBench : Comprehensive benchmark suite for video generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Patter...

2024
[70]

Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. 2024 b . https://arxiv.org/abs/2411.13503 Vbench++: Comprehensive and versatile benchmark suite for video generative models . Preprint, arX...

work page arXiv 2024
[71]

Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen. 2024. https://doi.org/10.18653/v1/2024.acl-long.663 VIES core: Towards explainable metrics for conditional image synthesis evaluation . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12268--12290, Bangkok, Thailand. Associa...

work page doi:10.18653/v1/2024.acl-long.663 2024
[72]

Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, and Jingdong Wang. 2024. Evaluation of text-to-video generation models: A dynamics perspective. arXiv preprint arXiv:2407.01094

work page arXiv 2024
[73]

Dongqi Liu, Chenxi Whitehouse, Xi Yu, Louis Mahon, Rohit Saxena, Zheng Zhao, Yifu Qiu, Mirella Lapata, and Vera Demberg. 2025. https://doi.org/10.18653/v1/2025.acl-long.310 What is that talk about? a video-to-text summarization dataset for scientific presentations . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics...

work page doi:10.18653/v1/2025.acl-long.310 2025
[74]

Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, and Ying Shan. 2024 a . https://arxiv.org/abs/2310.11440 Evalcrafter: Benchmarking and evaluating large video generation models . Preprint, arXiv:2310.11440

work page arXiv 2024
[75]

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun. 2024 b . https://arxiv.org/abs/2402.17177 Sora: A review on background, technology, limitations, and opportunities of large vision models . Preprint, arXiv:2402.17177

work page internal anchor Pith review Pith/arXiv arXiv 2024
[76]

Richard E Mayer. 2002. Multimedia learning. In Psychology of learning and motivation, volume 41, pages 85--139. Elsevier

2002
[77]

McConnell, LeeAnna Young Chapman, Charles Douglas Czajka, Jason P

David A. McConnell, LeeAnna Young Chapman, Charles Douglas Czajka, Jason P. Jones, Katherine Ryker, and Jennifer Wiggen. 2017. https://api.semanticscholar.org/CorpusID:85462730 Instructional utility and learning efficacy of common active learning strategies . Journal of Geoscience Education, 65:604 -- 625

2017
[78]

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. 2024. https://doi.org/10.18653/v1/2024.eacl-long.163 Presentations by the humans and for the humans: Harnessing LLM s for generating persona-aware slides from documents . In Proceedings of the 18th Conference of the European Chapter of the As...

work page doi:10.18653/v1/2024.eacl-long.163 2024
[79]

Lukas Niekrenz and Cord Spreckelsen. 2024. How to design effective educational videos for teaching evidence-based medicine to undergraduate learners - systematic review with complementing qualitative research to develop a practicable guide. Med Educ Online, 29(1):2339569

2024
[80]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. https://arxiv.org/abs/2103.00020 Learning transferable visual models from natural language supervision . Preprint, arXiv:2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[81]

Enrique Ram \'o n-Arbu \'e s, Isabel Bl \'a zquez-Ornat, Luc \' a Sagarra-Romero, Eva Benito-Ruiz, Isabel Ant \'o n-Solanas, and Piedad G \'o mez-Torres. 2025. Students' perceptions of creating educational videos as a teaching and learning strategy. Nurse Educ, 50(4):E219--E224

2025
[82]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019
[83]

Nils Reimers and Iryna Gurevych. 2020. https://arxiv.org/abs/2004.09813 Making monolingual sentence embeddings multilingual using knowledge distillation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics

work page arXiv 2020
[84]

Ehud Reiter. 2025. https://doi.org/10.1162/coli.a.18 We should evaluate real-world impact . Computational Linguistics, 51(4):1419--1431

work page doi:10.1162/coli.a.18 2025
[85]

Edward Sun, Yufang Hou, Dakuo Wang, Yunfeng Zhang, and Nancy X. R. Wang. 2021. https://doi.org/10.18653/v1/2021.naacl-main.111 D 2 S : Document-to-slide generation via query-based text summarization . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 140...

work page doi:10.18653/v1/2021.naacl-main.111 2021
[86]

John Sweller. 1994. https://doi.org/10.1016/0959-4752(94)90003-5 Cognitive load theory, learning difficulty, and instructional design . Learning and Instruction, 4(4):295--312

work page doi:10.1016/0959-4752(94)90003-5 1994
[87]

John Sweller. 2024. https://doi.org/10.1016/j.lindif.2024.102423 Cognitive load theory and individual differences . Learning and Individual Differences, 110:102423

work page doi:10.1016/j.lindif.2024.102423 2024
[88]

Hayato Tsukagoshi, Ryohei Sasano, and Koichi Takeda. 2021. https://doi.org/10.18653/v1/2021.acl-short.52 D ef S ent: Sentence embeddings using definition sentences . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers),...

work page doi:10.18653/v1/2021.acl-short.52 2021
[89]

Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. 2019. https://arxiv.org/abs/1812.01717 Towards accurate generative models of video: A new metric & challenges . Preprint, arXiv:1812.01717

work page internal anchor Pith review Pith/arXiv arXiv 2019
[90]

Watts, Lei Liu, Teresa M

Field M. Watts, Lei Liu, Teresa M. Ober, Yi Song, Euvelisse Jusino-Del Valle, Xiaoming Zhai, Yun Wang, and Ninghao Liu. 2025. https://doi.org/10.3390/educsci15111507 A framework for designing an ai chatbot to support scientific argumentation . Education Sciences, 15(11)

work page doi:10.3390/educsci15111507 2025
[91]

Thaddäus Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, and Robert Geirhos. 2025. https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners . Preprint, arXiv:2509.20328

work page internal anchor Pith review Pith/arXiv arXiv 2025
[92]

Yuxiang Wu, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2021. https://doi.org/10.18653/v1/2021.acl-short.57 Training adaptive computation for open-domain question answering with computational constraints . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference ...

work page doi:10.18653/v1/2021.acl-short.57 2021
[93]

Yuhang Yang, Ke Fan, Shangkun Sun, Hongxiang Li, Ailing Zeng, FeiLin Han, Wei Zhai, Wei Liu, Yang Cao, and Zheng-Jun Zha. 2025. Videogen-eval: Agent-based system for video generation evaluation. arXiv preprint arXiv:2503.23452

work page arXiv 2025
[94]

Joy Lim Jia Yin, Daniel Zhang-Li, Jifan Yu, Haoxuan Li, Shangqing Tu, Yuanchun Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li, and Bin Xu. 2025. https://arxiv.org/abs/2505.02078 Leceval: An automated metric for multimodal knowledge acquisition in multimedia learning . Preprint, arXiv:2505.02078

work page arXiv 2025
[95]

Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.728 PPTA gent: Generating and evaluating presentations beyond text-to-slides . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 14413--14...

work page doi:10.18653/v1/2025.emnlp-main.728 2025
[96]

Zeyu Zhu, Kevin Qinghong Lin, and Mike Zheng Shou. 2025. https://arxiv.org/abs/2510.05096 Paper2video: Automatic video generation from scientific papers . Preprint, arXiv:2510.05096

work page arXiv 2025

[1] [1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972

[2] [3]

2025 , eprint=

Paper2Video: Automatic Video Generation from Scientific Papers , author=. 2025 , eprint=

2025

[3] [5]

Bloom's taxonomy of cognitive learning objectives

Adams, Nancy E. Bloom's taxonomy of cognitive learning objectives. J Med Libr Assoc

[4] [6]

2024 , eprint=

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models , author=. 2024 , eprint=

2024

[5] [7]

Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei , booktitle=

[6] [8]

CLIPS core: A Reference-free Evaluation Metric for Image Captioning

Hessel, Jack and Holtzman, Ari and Forbes, Maxwell and Le Bras, Ronan and Choi, Yejin. CLIPS core: A Reference-free Evaluation Metric for Image Captioning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.595

work page doi:10.18653/v1/2021.emnlp-main.595 2021

[7] [9]

Publications Manual , year = "1983", publisher =

1983

[8] [10]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[9] [11]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

[10] [12]

Dan Gusfield , title =. 1997

1997

[11] [13]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015

[12] [14]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

[13] [15]

2019 , eprint=

Towards Accurate Generative Models of Video: A New Metric & Challenges , author=. 2019 , eprint=

2019

[14] [16]

2021 , eprint=

Item Response Theory -- A Statistical Framework for Educational and Psychological Measurement , author=. 2021 , eprint=

2021

[15] [17]

2025 , eprint=

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer , author=. 2025 , eprint=

2025

[16] [18]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Lingen: Towards high-resolution minute-length text-to-video generation with linear computational complexity , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

[17] [19]

arXiv preprint arXiv:2502.11079 , year=

Phantom: Subject-consistent video generation via cross-modal alignment , author=. arXiv preprint arXiv:2502.11079 , year=

work page arXiv

[18] [20]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Identity-preserving text-to-video generation by frequency decomposition , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

[19] [21]

2022 , eprint=

Make-A-Video: Text-to-Video Generation without Text-Video Data , author=. 2022 , eprint=

2022

[20] [22]

2021 , eprint=

Learning Transferable Visual Models From Natural Language Supervision , author=. 2021 , eprint=

2021

[21] [23]

2021 , eprint=

Emerging Properties in Self-Supervised Vision Transformers , author=. 2021 , eprint=

2021

[22] [25]

2024 , eprint=

VideoPhy: Evaluating Physical Commonsense for Video Generation , author=. 2024 , eprint=

2024

[23] [27]

2024 , eprint=

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection , author=. 2024 , eprint=

2024

[24] [28]

2025 , eprint=

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model , author=. 2025 , eprint=

2025

[25] [30]

2023 , eprint=

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. 2023 , eprint=

2023

[26] [31]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024

[27] [32]

Annotating Educational Questions for Student Response Analysis

Godea, Andreea and Nielsen, Rodney. Annotating Educational Questions for Student Response Analysis. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018

[28] [33]

Sustainability , VOLUME =

Timbi-Sisalima, Cristian and Sánchez-Gordón, Mary and Otón-Tortosa, Salvador and Mendoza-González, Ricardo , TITLE =. Sustainability , VOLUME =. 2024 , NUMBER =

2024

[29] [34]

Proceedings of the 11th International Conference on Intelligent Tutoring Systems , pages =

Long, Yanjin and Aleven, Vincent , title =. Proceedings of the 11th International Conference on Intelligent Tutoring Systems , pages =. 2012 , isbn =. doi:10.1007/978-3-642-30950-2_115 , abstract =

work page doi:10.1007/978-3-642-30950-2_115 2012

[30] [35]

C. P. Ormell , title =. Educational Research , volume =. 1974 , publisher =. doi:10.1080/0013188740170101 , URL =

work page doi:10.1080/0013188740170101 1974

[31] [36]

2024 , eprint=

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models , author=. 2024 , eprint=

2024

[32] [38]

Niekrenz, Lukas and Spreckelsen, Cord. How to design effective educational videos for teaching evidence-based medicine to undergraduate learners - systematic review with complementing qualitative research to develop a practicable guide. Med Educ Online

[33] [43]

Students' Perceptions of Creating Educational Videos as a Teaching and Learning Strategy

Ram \'o n-Arbu \'e s, Enrique and Bl \'a zquez-Ornat, Isabel and Sagarra-Romero, Luc \' a and Benito-Ruiz, Eva and Ant \'o n-Solanas, Isabel and G \'o mez-Torres, Piedad. Students' Perceptions of Creating Educational Videos as a Teaching and Learning Strategy. Nurse Educ

[34] [45]

2024 , eprint=

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models , author=. 2024 , eprint=

2024

[35] [47]

2025 , eprint=

Video models are zero-shot learners and reasoners , author=. 2025 , eprint=

2025

[36] [48]

ArXiv , year =

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation , author =. ArXiv , year =

[37] [49]

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Reimers, Nils and Gurevych, Iryna. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020

2020

[38] [50]

2008--2025 , archivePrefix =

GROBID , howpublished =. 2008--2025 , archivePrefix =

2008

[39] [53]

Psychology of learning and motivation , volume=

Multimedia learning , author=. Psychology of learning and motivation , volume=. 2002 , publisher=

2002

[40] [55]

2017 , school=

The Influence of Testing and Content Presentation Method on Mandatory Federal Employee Training , author=. 2017 , school=

2017

[41] [56]

Lysne and Brant G

Steven J. Lysne and Brant G. Miller , title =. Journal of College Science Teaching , volume =. 2017 , publisher =. doi:10.2505/4/jcst17\_046\_06\_100 , URL =

work page doi:10.2505/4/jcst17 2017

[42] [58]

and Liu, Lei and Ober, Teresa M

Watts, Field M. and Liu, Lei and Ober, Teresa M. and Song, Yi and Jusino-Del Valle, Euvelisse and Zhai, Xiaoming and Wang, Yun and Liu, Ninghao , TITLE =. Education Sciences , VOLUME =. 2025 , NUMBER =

2025

[43] [59]

2026 , eprint=

Developing Authentic Simulated Learners for Mathematics Teacher Learning: Insights from Three Approaches with Large Language Models , author=. 2026 , eprint=

2026

[44] [60]

Journal of Geoscience Education , year=

Instructional Utility and Learning Efficacy of Common Active Learning Strategies , author=. Journal of Geoscience Education , year=

[45] [61]

2025 , eprint=

LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning , author=. 2025 , eprint=

2025

[46] [62]

de Koning , and Halszka Jarodzka

Kevin Ackermans, Björn B. de Koning , and Halszka Jarodzka. 2025. https://doi.org/10.1016/j.learninstruc.2025.102137 Instructional videos and deeper processing: Insights and applications . Learning and Instruction, 98:102137

work page doi:10.1016/j.learninstruc.2025.102137 2025

[47] [63]

Nancy E Adams. 2015. Bloom's taxonomy of cognitive learning objectives. J Med Libr Assoc, 103(3):152--153

2015

[48] [64]

Paul Ayres and Kevin Ackermans. 2025. https://doi.org/10.1016/j.learninstruc.2024.102077 Some do's and don'ts of educational videos . Learning and Instruction, 96:102077

work page doi:10.1016/j.learninstruc.2024.102077 2025

[49] [65]

Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, and Aditya Grover. 2024. https://arxiv.org/abs/2406.03520 Videophy: Evaluating physical commonsense for video generation . Preprint, arXiv:2406.03520

work page internal anchor Pith review Pith/arXiv arXiv 2024

[50] [66]

Jie Cao, Ha Nguyen, Selim Yavuz, Boran Yu, Shuguang Wang, Pavneet Kaur Bharaj, and Dionne Cross Francis. 2026. https://arxiv.org/abs/2604.04361 Developing authentic simulated learners for mathematics teacher learning: Insights from three approaches with large language models . Preprint, arXiv:2604.04361

work page internal anchor Pith review Pith/arXiv arXiv 2026

[51] [67]

Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Bill Yuchen Lin, and Wenhu Chen. 2024 a . https://doi.org/10.18653/v1/2024.emnlp-main.127 V ideo S core: Building automatic metrics to sim...

work page doi:10.18653/v1/2024.emnlp-main.127 2024

[52] [68]

Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, and Wenhu Chen. 2024 b . https://arxiv.org/abs/2406.15252 Videoscore: Building automatic metrics to simulate fine-grained huma...

work page arXiv 2024

[53] [69]

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. 2024 a . VBench : Comprehensive benchmark suite for video generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Patter...

2024

[54] [70]

Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. 2024 b . https://arxiv.org/abs/2411.13503 Vbench++: Comprehensive and versatile benchmark suite for video generative models . Preprint, arX...

work page arXiv 2024

[55] [71]

Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen. 2024. https://doi.org/10.18653/v1/2024.acl-long.663 VIES core: Towards explainable metrics for conditional image synthesis evaluation . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12268--12290, Bangkok, Thailand. Associa...

work page doi:10.18653/v1/2024.acl-long.663 2024

[56] [72]

Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, and Jingdong Wang. 2024. Evaluation of text-to-video generation models: A dynamics perspective. arXiv preprint arXiv:2407.01094

work page arXiv 2024

[57] [73]

Dongqi Liu, Chenxi Whitehouse, Xi Yu, Louis Mahon, Rohit Saxena, Zheng Zhao, Yifu Qiu, Mirella Lapata, and Vera Demberg. 2025. https://doi.org/10.18653/v1/2025.acl-long.310 What is that talk about? a video-to-text summarization dataset for scientific presentations . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics...

work page doi:10.18653/v1/2025.acl-long.310 2025

[58] [74]

Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, and Ying Shan. 2024 a . https://arxiv.org/abs/2310.11440 Evalcrafter: Benchmarking and evaluating large video generation models . Preprint, arXiv:2310.11440

work page arXiv 2024

[59] [75]

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun. 2024 b . https://arxiv.org/abs/2402.17177 Sora: A review on background, technology, limitations, and opportunities of large vision models . Preprint, arXiv:2402.17177

work page internal anchor Pith review Pith/arXiv arXiv 2024

[60] [76]

Richard E Mayer. 2002. Multimedia learning. In Psychology of learning and motivation, volume 41, pages 85--139. Elsevier

2002

[61] [77]

McConnell, LeeAnna Young Chapman, Charles Douglas Czajka, Jason P

David A. McConnell, LeeAnna Young Chapman, Charles Douglas Czajka, Jason P. Jones, Katherine Ryker, and Jennifer Wiggen. 2017. https://api.semanticscholar.org/CorpusID:85462730 Instructional utility and learning efficacy of common active learning strategies . Journal of Geoscience Education, 65:604 -- 625

2017

[62] [78]

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. 2024. https://doi.org/10.18653/v1/2024.eacl-long.163 Presentations by the humans and for the humans: Harnessing LLM s for generating persona-aware slides from documents . In Proceedings of the 18th Conference of the European Chapter of the As...

work page doi:10.18653/v1/2024.eacl-long.163 2024

[63] [79]

Lukas Niekrenz and Cord Spreckelsen. 2024. How to design effective educational videos for teaching evidence-based medicine to undergraduate learners - systematic review with complementing qualitative research to develop a practicable guide. Med Educ Online, 29(1):2339569

2024

[64] [80]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. https://arxiv.org/abs/2103.00020 Learning transferable visual models from natural language supervision . Preprint, arXiv:2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021

[65] [81]

Enrique Ram \'o n-Arbu \'e s, Isabel Bl \'a zquez-Ornat, Luc \' a Sagarra-Romero, Eva Benito-Ruiz, Isabel Ant \'o n-Solanas, and Piedad G \'o mez-Torres. 2025. Students' perceptions of creating educational videos as a teaching and learning strategy. Nurse Educ, 50(4):E219--E224

2025

[66] [82]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019

[67] [83]

Nils Reimers and Iryna Gurevych. 2020. https://arxiv.org/abs/2004.09813 Making monolingual sentence embeddings multilingual using knowledge distillation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics

work page arXiv 2020

[68] [84]

Ehud Reiter. 2025. https://doi.org/10.1162/coli.a.18 We should evaluate real-world impact . Computational Linguistics, 51(4):1419--1431

work page doi:10.1162/coli.a.18 2025

[69] [85]

Edward Sun, Yufang Hou, Dakuo Wang, Yunfeng Zhang, and Nancy X. R. Wang. 2021. https://doi.org/10.18653/v1/2021.naacl-main.111 D 2 S : Document-to-slide generation via query-based text summarization . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 140...

work page doi:10.18653/v1/2021.naacl-main.111 2021

[70] [86]

John Sweller. 1994. https://doi.org/10.1016/0959-4752(94)90003-5 Cognitive load theory, learning difficulty, and instructional design . Learning and Instruction, 4(4):295--312

work page doi:10.1016/0959-4752(94)90003-5 1994

[71] [87]

John Sweller. 2024. https://doi.org/10.1016/j.lindif.2024.102423 Cognitive load theory and individual differences . Learning and Individual Differences, 110:102423

work page doi:10.1016/j.lindif.2024.102423 2024

[72] [88]

Hayato Tsukagoshi, Ryohei Sasano, and Koichi Takeda. 2021. https://doi.org/10.18653/v1/2021.acl-short.52 D ef S ent: Sentence embeddings using definition sentences . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers),...

work page doi:10.18653/v1/2021.acl-short.52 2021

[73] [89]

Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. 2019. https://arxiv.org/abs/1812.01717 Towards accurate generative models of video: A new metric & challenges . Preprint, arXiv:1812.01717

work page internal anchor Pith review Pith/arXiv arXiv 2019

[74] [90]

Watts, Lei Liu, Teresa M

Field M. Watts, Lei Liu, Teresa M. Ober, Yi Song, Euvelisse Jusino-Del Valle, Xiaoming Zhai, Yun Wang, and Ninghao Liu. 2025. https://doi.org/10.3390/educsci15111507 A framework for designing an ai chatbot to support scientific argumentation . Education Sciences, 15(11)

work page doi:10.3390/educsci15111507 2025

[75] [91]

Thaddäus Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, and Robert Geirhos. 2025. https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners . Preprint, arXiv:2509.20328

work page internal anchor Pith review Pith/arXiv arXiv 2025

[76] [92]

Yuxiang Wu, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2021. https://doi.org/10.18653/v1/2021.acl-short.57 Training adaptive computation for open-domain question answering with computational constraints . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference ...

work page doi:10.18653/v1/2021.acl-short.57 2021

[77] [93]

Yuhang Yang, Ke Fan, Shangkun Sun, Hongxiang Li, Ailing Zeng, FeiLin Han, Wei Zhai, Wei Liu, Yang Cao, and Zheng-Jun Zha. 2025. Videogen-eval: Agent-based system for video generation evaluation. arXiv preprint arXiv:2503.23452

work page arXiv 2025

[78] [94]

Joy Lim Jia Yin, Daniel Zhang-Li, Jifan Yu, Haoxuan Li, Shangqing Tu, Yuanchun Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li, and Bin Xu. 2025. https://arxiv.org/abs/2505.02078 Leceval: An automated metric for multimodal knowledge acquisition in multimedia learning . Preprint, arXiv:2505.02078

work page arXiv 2025

[79] [95]

Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.728 PPTA gent: Generating and evaluating presentations beyond text-to-slides . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 14413--14...

work page doi:10.18653/v1/2025.emnlp-main.728 2025

[80] [96]

Zeyu Zhu, Kevin Qinghong Lin, and Mike Zheng Shou. 2025. https://arxiv.org/abs/2510.05096 Paper2video: Automatic video generation from scientific papers . Preprint, arXiv:2510.05096

work page arXiv 2025