Restructure This: Using AI to Restructure Onboarding Documents to Reduce Cognitive Overload

Anita Sarma; Igor Steinmacher; Marco Aurelio Gerosa; Prashant Tandan; Zixuan Feng

arxiv: 2605.19174 · v1 · pith:7P5UN7O3new · submitted 2026-05-18 · 💻 cs.SE

Restructure This: Using AI to Restructure Onboarding Documents to Reduce Cognitive Overload

Zixuan Feng , Prashant Tandan , Igor Steinmacher , Marco Aurelio Gerosa , Anita Sarma This is my paper

Pith reviewed 2026-05-20 08:33 UTC · model grok-4.3

classification 💻 cs.SE

keywords onboarding documentationopen source softwarecognitive loadgenerative AIdocumentation restructuringnewcomer experiencemultimedia learning

0 comments

The pith

Restructuring OSS onboarding documents with AI and cognitive principles improves newcomer task success and reduces cognitive load.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Open source software projects often lose potential contributors because their onboarding documentation is dense, fragmented, and inconsistent. This paper tests whether applying principles from the Cognitive Theory of Multimedia Learning through a generative AI pipeline can fix that. The resulting VisDoc prototype breaks documents into task-focused segments, removes repeats, infers workflows, and adds visual and other explanations. In tests, experts found the output reliable and useful, while actual newcomers using the restructured documents succeeded more often, felt less overwhelmed, and rated the materials as more usable.

Core claim

A generative AI pipeline called VisDoc that applies Cognitive Theory of Multimedia Learning strategies to restructure open source onboarding documentation produces materials that experts judge complete and accurate, and that allow newcomers to achieve higher task success rates with lower cognitive load and higher perceived usability.

What carries the argument

VisDoc, the GenAI prototype that segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations.

If this is right

Newcomers achieve higher rates of task success when using the restructured documentation.
Users of VisDoc report significantly lower cognitive load during onboarding tasks.
The restructured documents receive higher usability ratings from participants.
Expert evaluators confirm that the restructured documents maintain completeness and accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This restructuring method could apply to documentation in other domains where newcomers face dense technical materials.
Long-term studies might reveal whether the reduced cognitive load leads to better retention and continued contribution.
Similar GenAI pipelines could be developed for maintaining documentation as projects evolve.

Load-bearing premise

The generative AI pipeline correctly applies the cognitive learning strategies without creating new inaccuracies or confusing content in the documents.

What would settle it

A larger study that finds no difference in task success or cognitive load between groups using original versus VisDoc-restructured documents would indicate the approach does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2605.19174 by Anita Sarma, Igor Steinmacher, Marco Aurelio Gerosa, Prashant Tandan, Zixuan Feng.

**Figure 1.** Figure 1: VisDoc Task Tree UI with tagged features. layout using the Clear button ( F ), returning the interface to a clean, collapsed state. 4.2 CTML-Guided Design Strategies Segmenting and Pretraining for mitigating C1. To reduce essential overload (C1), VisDoc applies CTML’s segmenting and pretraining strategies by breaking complex onboarding documentation into short, task-based units and generating a high-level… view at source ↗

**Figure 2.** Figure 2: VisDoc Infrastructure Overview Chunker (Langchain 2025). We used a ground-truth segmentation of the CONTRIBUTING.md of an OSS project (Kubernetes)2 , annotated independently by two researchers (93.8% agreement (McHugh 2012)). We compared both methods using Pk (Beeferman et al. 1999) and WinDiff (Pevzner and Hearst 2002). LangChain’s Semantic Chunker performed better than RoBERTa (Pk = 0.33 vs. 0.36; WinDi… view at source ↗

**Figure 3.** Figure 3: Two-Phase Evaluation: Expert Evaluation and Between-subject User Study. ing development and our formative evaluation to promote transferability and adaptability across OSS contexts (Guizani et al. 2025). We chose the Transformers project because: (1) It belongs to the AI/ML domain, a very different domain from the Kubernetes-based project, allowing us to assess generalization across technical ecosystems. … view at source ↗

**Figure 4.** Figure 4: Task success rates for each task (T1–T3). VisDoc group (cyan) and documentation+ChatGPT group (blue). Participants’ reflections helped explain the lower failure rates in the VisDoc group. They emphasized that VisDoc’s structured, visual layout and guided task flows reduced uncertainty and steered them away from common errors [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Item-level SUS comparison using half–violin and box plots for VisDoc (cyan) and documentation+ChatGPT (blue). The Y-axis shows normalized Likert ratings (1–5; higher = better), with negatively worded items reverse-scored. Black dots indicate mean scores for each group. Hollow dots are outliers. easily” [P9]. Others highlighted that VisDoc felt coherent and well-structured: “I could visualize the hierarchy.… view at source ↗

read the original abstract

Onboarding documentation is critical for attracting and retaining newcomers in open source software (OSS). However, it is often presented as dense, inconsistently structured, and fragmented presentations that are difficult to understand, which creates cognitive overload leading to frustration, errors, and abandonment. Here, we investigate how Cognitive Theory of Multimedia Learning (CTML) strategies can be used to restructure OSS documentation. We use a GenAI-based pipeline to operationalize these strategies to restructure OSS documentation through our prototype VisDoc. VisDoc segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations. An expert evaluation (N=4) affirmed VisDoc's completeness, accuracy, and adoptability; A between-subjects evaluation (N=14) with newcomers found that VisDoc participants achieved higher task success, had significantly lower cognitive load, and perceived higher usability. The contributions of this work include a CTML-grounded analysis of onboarding challenges, a GenAI-based documentation restructuring pipeline, and empirical evidence that cognitively informed documentation restructuring reduces cognitive load and improves usability and task performance in OSS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VisDoc gives a workable GenAI pipeline for applying CTML to OSS onboarding docs and a small study shows lower cognitive load, but the AI outputs lack thorough checks for errors and the samples are too small to pin down the cause.

read the letter

The paper's main takeaway is a concrete GenAI pipeline, VisDoc, that breaks OSS onboarding docs into task-based segments, removes redundancy, infers workflows, and adds multimodal explanations based on Cognitive Theory of Multimedia Learning. A between-subjects study with 14 newcomers reported higher task success, lower cognitive load, and better usability for the restructured versions compared to the originals. An expert review with four people also backed the completeness and accuracy of the outputs. That combination of theory-driven restructuring and empirical results is the new piece here, and it directly targets a documented pain point in open source newcomer retention. The work does a solid job turning CTML principles into an operational pipeline and testing it on real documentation rather than staying at the level of abstract recommendations. The expert and user evaluations provide at least initial evidence that the approach can improve outcomes. The soft spots are mostly around scale and validation. The user study sits at N=14 with no power analysis or detailed statistical reporting, and the expert check is even smaller. More critically, there is no systematic review for GenAI-specific problems like hallucinated steps, introduced inconsistencies, or misleading multimodal content. If those exist, the measured gains could come from something other than the intended CTML restructuring. No open data or code is shared either, which limits how far others can verify or extend the results. This paper is aimed at software engineering researchers who work on open source onboarding or HCI folks interested in AI-supported documentation tools. A reader who needs practical examples of theory applied to real OSS docs would get usable ideas from the pipeline description and the study setup. It is coherent enough and grounded enough to deserve peer review, even though it will need larger samples and explicit checks on the generated content before the central claims can be taken as settled.

Referee Report

2 major / 2 minor

Summary. The paper claims that a GenAI-based pipeline called VisDoc can operationalize Cognitive Theory of Multimedia Learning (CTML) strategies to restructure OSS onboarding documentation, thereby reducing cognitive overload. This is supported by an expert evaluation (N=4) that affirmed completeness, accuracy, and adoptability of the outputs, plus a between-subjects user study (N=14) with newcomers showing higher task success, significantly lower cognitive load, and higher perceived usability for VisDoc-restructured documents versus originals. Contributions include a CTML-grounded analysis of onboarding challenges, the restructuring pipeline, and empirical evidence of benefits.

Significance. If the results hold, the work is significant for software engineering and HCI, offering a scalable, theory-grounded approach to improving OSS newcomer onboarding. Better documentation could reduce frustration and abandonment, aiding retention and productivity in open-source communities. The combination of CTML with GenAI provides a practical method for addressing cognitive issues in technical docs, with the user-study evidence strengthening real-world applicability.

major comments (2)

[GenAI Pipeline and Evaluation sections] The central empirical claim (higher task success, lower cognitive load, higher usability) in the between-subjects evaluation (N=14) depends on the restructured documents being faithful applications of CTML without GenAI-induced artifacts. The paper reports only an expert evaluation (N=4) on completeness/accuracy/adoptability but provides no systematic validation for hallucinated steps, introduced redundancies, misleading multimodal elements, or inconsistencies. This validation gap is load-bearing for attributing benefits to CTML restructuring rather than other content properties.
[User Study / Between-subjects evaluation] The user study reports positive outcomes but with N=14, no reported statistical details (e.g., specific tests, p-values, effect sizes), power analysis, or open data/code. These omissions limit confidence that the findings reliably support the claim of reduced cognitive overload, especially in a between-subjects design where individual differences could confound results.

minor comments (2)

[Pipeline Description] Clarify the exact CTML strategies implemented in the pipeline (e.g., which principles for segmentation, redundancy removal, and multimodal generation) and how they map to specific GenAI prompts or steps.
[Abstract and Results] The abstract states 'significantly lower cognitive load' without qualifiers; ensure the full text reports the exact measure (e.g., NASA-TLX) and any limitations of the small-sample comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the positive assessment of our work's significance and for the constructive major comments. We appreciate the opportunity to strengthen the manuscript by addressing the validation of the GenAI pipeline outputs and the statistical reporting in the user study.

read point-by-point responses

Referee: [GenAI Pipeline and Evaluation sections] The central empirical claim (higher task success, lower cognitive load, higher usability) in the between-subjects evaluation (N=14) depends on the restructured documents being faithful applications of CTML without GenAI-induced artifacts. The paper reports only an expert evaluation (N=4) on completeness/accuracy/adoptability but provides no systematic validation for hallucinated steps, introduced redundancies, misleading multimodal elements, or inconsistencies. This validation gap is load-bearing for attributing benefits to CTML restructuring rather than other content properties.

Authors: We agree that more targeted validation is needed to rule out GenAI-induced artifacts and support attribution to CTML strategies. The existing expert evaluation (N=4) focused on high-level completeness, accuracy, and adoptability but did not explicitly probe for hallucinated workflow steps, introduced redundancies, or misleading multimodal elements. In the revised manuscript we will augment the evaluation protocol with specific items addressing these issues (e.g., expert ratings on presence of inconsistencies or misleading content) and report the results to provide stronger evidence that benefits derive from the CTML-informed restructuring. revision: yes
Referee: [User Study / Between-subjects evaluation] The user study reports positive outcomes but with N=14, no reported statistical details (e.g., specific tests, p-values, effect sizes), power analysis, or open data/code. These omissions limit confidence that the findings reliably support the claim of reduced cognitive overload, especially in a between-subjects design where individual differences could confound results.

Authors: We acknowledge the limitations of the small sample and the absence of detailed statistical reporting. In the revision we will specify the exact statistical tests performed, report p-values and effect sizes, include a post-hoc power analysis, and make anonymized data and analysis code available via a public repository. We will also expand the limitations section to discuss potential confounds from individual differences in the between-subjects design and how random assignment was used to mitigate them. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results independent of any derivation or self-referential fit

full rationale

The paper describes a GenAI pipeline that applies CTML strategies to restructure OSS onboarding documents, followed by an expert review (N=4) for completeness/accuracy/adoptability and a separate between-subjects user study (N=14) measuring task success, cognitive load, and usability. No equations, parameter fitting, or predictive models are presented whose outputs reduce by construction to the inputs. Central claims rest on these independent empirical evaluations rather than self-definition, self-citation chains, or renamed known results. The work is self-contained against external user-study benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The work rests on established CTML as background theory and introduces one new software artifact; no numerical free parameters are fitted and no new physical entities are postulated.

axioms (1)

domain assumption Cognitive Theory of Multimedia Learning strategies can be effectively operationalized by generative AI to reduce cognitive overload in technical documentation
This premise underpins the design of the VisDoc pipeline and the claim that restructuring improves usability and task performance.

invented entities (1)

VisDoc no independent evidence
purpose: Generative AI prototype that segments, deduplicates, and multimodalizes OSS onboarding documents
New system developed and evaluated in the paper; no independent evidence outside this work is provided.

pith-pipeline@v0.9.0 · 5736 in / 1334 out tokens · 42953 ms · 2026-05-20T08:33:44.244007+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, BranchSelection.lean, AlexanderDuality.lean reality_from_one_distinction; branch_selection; alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

VisDoc segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations... CTML strategies... Segmenting, Pretraining, Aligning, Eliminating, Signaling, Weeding, Off-loading, Synchronizing, Individualizing

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

180 extracted references · 180 canonical work pages · 4 internal anchors

[1]

2015 29th Brazilian Symposium on Software Engineering , pages=

Increasing the self-efficacy of newcomers to Open Source Software projects , author=. 2015 29th Brazilian Symposium on Software Engineering , pages=. 2015 , organization=

work page 2015
[2]

IEEE Transactions on Software Engineering , year=

From First Patch to Long-Term Contributor: Evaluating Onboarding Recommendations for OSS Newcomers , author=. IEEE Transactions on Software Engineering , year=

work page
[3]

Educational psychology review , volume=

Dual coding theory and education , author=. Educational psychology review , volume=. 1991 , publisher=

work page 1991
[4]

IFIP International Conference on Open Source Systems , pages=

Barriers faced by newcomers to open source projects: a systematic review , author=. IFIP International Conference on Open Source Systems , pages=. 2014 , organization=

work page 2014
[5]

Psychology of learning and motivation , volume=

Cognitive load theory , author=. Psychology of learning and motivation , volume=. 2011 , publisher=

work page 2011
[6]

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

Do contributing files provide information about oss newcomers’ onboarding barriers? , author=. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

work page
[7]

IEEE Transactions on Software Engineering , volume=

How gender-biased tools shape newcomer experiences in oss projects , author=. IEEE Transactions on Software Engineering , volume=. 2020 , publisher=

work page 2020
[8]

International Conference on Evaluation of Novel Approaches to Software Engineering , volume=

Evaluating the quality of the documentation of open source software , author=. International Conference on Evaluation of Novel Approaches to Software Engineering , volume=. 2017 , organization=

work page 2017
[9]

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=

Evaluating software documentation quality , author=. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=. 2023 , organization=

work page 2023
[10]

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

Beyond accuracy: Assessing software documentation quality , author=. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

work page
[11]

Transformers: State-of-the-Art Machine Learning Library , year =

work page
[12]

2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages=

Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code , author=. 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages=. 2025 , organization=

work page 2025
[13]

2014 science and information conference , pages=

How many participants are really enough for usability studies? , author=. 2014 science and information conference , pages=. 2014 , organization=

work page 2014
[14]

interactions , volume=

A heuristic evaluation of a World Wide Web prototype , author=. interactions , volume=. 1996 , publisher=

work page 1996
[15]

Proceedings of the 21st annual international conference on Documentation , pages=

Towards a documentation maturity model , author=. Proceedings of the 21st annual international conference on Documentation , pages=

work page
[16]

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , pages=

Automatic code documentation generation using gpt-3 , author=. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , pages=

work page
[17]

ACM Transactions on Software Engineering and Methodology , publisher=

Addressing OSS Community Managers’ Challenges in Contributor Retention , author=. ACM Transactions on Software Engineering and Methodology , publisher=

work page
[18]

arXiv preprint arXiv:2504.08725 , year=

DocAgent: A Multi-Agent System for Automated Code Documentation Generation , author=. arXiv preprint arXiv:2504.08725 , year=

work page arXiv
[19]

Proceedings of the 22nd International Conference on Program Comprehension , pages=

Automatic documentation generation via source code summarization of method context , author=. Proceedings of the 22nd International Conference on Program Comprehension , pages=

work page
[20]

Qualitative research in psychology , volume=

Using thematic analysis in psychology , author=. Qualitative research in psychology , volume=. 2006 , publisher=

work page 2006
[21]

Proceedings of the 17th international conference on evaluation and assessment in software engineering , pages=

Evaluating usage and quality of technical software documentation: an empirical study , author=. Proceedings of the 17th international conference on evaluation and assessment in software engineering , pages=

work page
[22]

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , pages=

Towards Leveraging LLMs for Reducing Open Source Onboarding Information Overload , author=. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , pages=

work page
[23]

Journal of educational psychology , volume=

When learning is just a click away: Does simple user interaction foster deeper understanding of multimedia messages? , author=. Journal of educational psychology , volume=. 2001 , publisher=

work page 2001
[24]

Smart Learning Environments , volume=

Multimedia learning principles in different learning environments: A systematic review , author=. Smart Learning Environments , volume=. 2022 , publisher=

work page 2022
[25]

Educational technology research and development , volume=

Using commonly-available technologies to create online multimedia lessons through the application of the Cognitive Theory of Multimedia Learning , author=. Educational technology research and development , volume=. 2023 , publisher=

work page 2023
[26]

Heliyon , volume=

The implementation of the cognitive theory of multimedia learning in the design and evaluation of an AI educational video assistant utilizing large language models , author=. Heliyon , volume=. 2024 , publisher=

work page 2024
[27]

Medical education , volume=

Applying the cognitive theory of multimedia learning: an analysis of medical animations , author=. Medical education , volume=. 2013 , publisher=

work page 2013
[28]

Educational Psychology Review , volume=

The past, present, and future of the cognitive theory of multimedia learning , author=. Educational Psychology Review , volume=. 2024 , publisher=

work page 2024
[29]

2005 , publisher=

The Cambridge handbook of multimedia learning , author=. 2005 , publisher=

work page 2005
[30]

Empirical Software Engineering , volume=

A field study of API learning obstacles , author=. Empirical Software Engineering , volume=. 2011 , publisher=

work page 2011
[31]

Proceedings of the 38th International Conference on Software Engineering , pages=

Overcoming open source project entry barriers with a portal for newcomers , author=. Proceedings of the 38th International Conference on Software Engineering , pages=

work page
[32]

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=

Measuring the cognitive load of software developers: A systematic mapping study , author=. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=. 2019 , organization=

work page 2019
[33]

Information and Software Technology , volume=

Usefulness and usability of heuristic walkthroughs for evaluating domain-specific developer tools in industry: Evidence from four field simulations , author=. Information and Software Technology , volume=. 2023 , publisher=

work page 2023
[34]

Usability evaluation in industry , volume=

SUS-A quick and dirty usability scale , author=. Usability evaluation in industry , volume=. 1996 , publisher=

work page 1996
[35]

Queue , volume=

GPTs and hallucination: why do large language models hallucinate? , author=. Queue , volume=. 2024 , publisher=

work page 2024
[36]

doi:10.5281/zenodo.17859471 , url =

Feng, Zixuan , title =. doi:10.5281/zenodo.17859471 , url =

work page doi:10.5281/zenodo.17859471
[37]

Proceedings of the 2025 ACM Conference on International Computing Education Research V

The Effects of GitHub Copilot on Computing Students' Programming Effectiveness, Efficiency, and Processes in Brownfield Coding Tasks , author=. Proceedings of the 2025 ACM Conference on International Computing Education Research V. 1 , pages=

work page 2025
[38]

Code Comprehension with GitHub Copilot: Performance Gains, Comprehension Trade-offs, and Behavioral Predictors in Brownfield Programming

Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming: A Replication and Extension , author=. arXiv preprint arXiv:2511.02922 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[39]

arXiv preprint arXiv:2510.17894 , year=

A Systematic Literature Review of the Use of GenAI Assistants for Code Comprehension: Implications for Computing Education Research and Practice , author=. arXiv preprint arXiv:2510.17894 , year=

work page arXiv
[40]

Systems and Soft Computing , pages=

Comparative analysis based on deepseek, chatgpt, and google gemini: Features, techniques, performance, future prospects , author=. Systems and Soft Computing , pages=. 2025 , publisher=

work page 2025
[41]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[42]

and Lopez-Villarejo, Juan J

Kosower, David A. and Lopez-Villarejo, Juan J. and Roubtsov, Serguei , booktitle=. Flowgen: Flowchart-Based Documentation Framework for C++ , year=

work page
[43]

2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA) , pages=

Documentation Practices in Agile Software Development: A Systematic Literature Review , author=. 2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA) , pages=. 2023 , organization=

work page 2023
[44]

IEEE Software , volume=

Let me in: Guidelines for the successful onboarding of newcomers to open source projects , author=. IEEE Software , volume=. 2018 , publisher=

work page 2018
[45]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

work page
[46]

Learning and individual differences , volume=

ChatGPT for good? On opportunities and challenges of large language models for education , author=. Learning and individual differences , volume=. 2023 , publisher=

work page 2023
[47]

Fundamentals of artificial intelligence , pages=

Natural language processing , author=. Fundamentals of artificial intelligence , pages=. 2020 , publisher=

work page 2020
[48]

arXiv preprint arXiv:2102.12727 , year=

What's in a GitHub Repository?--A Software Documentation Perspective , author=. arXiv preprint arXiv:2102.12727 , year=

work page arXiv
[49]

Information and Software Technology , volume=

A systematic literature review on the barriers faced by newcomers to open source software projects , author=. Information and Software Technology , volume=. 2015 , publisher=

work page 2015
[50]

Information and Software Technology , volume=

Guiding the way: A systematic literature review on mentoring practices in open source software projects , author=. Information and Software Technology , volume=. 2024 , publisher=

work page 2024
[51]

2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE) , pages=

The Introduction of README and CONTRIBUTING Files in Open Source Software Development , author=. 2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE) , pages=. 2025 , organization=

work page 2025
[52]

Information and Software Technology , pages=

Community Tapestry: An actionable tool to track turnover and diversity in OSS , author=. Information and Software Technology , pages=. 2025 , publisher=

work page 2025
[53]

Proceedings of the acm/ieee 42nd international conference on software engineering , pages=

Software documentation: the practitioners' perspective , author=. Proceedings of the acm/ieee 42nd international conference on software engineering , pages=

work page
[54]

Educational psychologist , volume=

Nine ways to reduce cognitive load in multimedia learning , author=. Educational psychologist , volume=. 2003 , publisher=

work page 2003
[55]

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , volume=

A layered reference model of the brain (LRMB) , author=. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , volume=. 2006 , publisher=

work page 2006
[56]

The Second IEEE International Conference on Cognitive Informatics, 2003

The cognitive process of comprehension , author=. The Second IEEE International Conference on Cognitive Informatics, 2003. Proceedings. , pages=. 2003 , organization=

work page 2003
[57]

arXiv preprint arXiv:2312.11431 , year=

Make It Make Sense! Understanding and Facilitating Sensemaking in Computational Notebooks , author=. arXiv preprint arXiv:2312.11431 , year=

work page arXiv
[58]

Computer Supported Cooperative Work (CSCW) , volume=

The types, roles, and practices of documentation in data analytics open source software libraries: a collaborative ethnography of documentation work , author=. Computer Supported Cooperative Work (CSCW) , volume=. 2018 , publisher=

work page 2018
[59]

Educational Psychology Review , volume=

Learning from maps and diagrams , author=. Educational Psychology Review , volume=. 1991 , publisher=

work page 1991
[60]

Proceedings of the 2015 10th joint meeting on foundations of software engineering , pages=

Summarizing and measuring development activity , author=. Proceedings of the 2015 10th joint meeting on foundations of software engineering , pages=

work page 2015
[61]

Discover Computing , volume=

Max--Min semantic chunking of documents for RAG application , author=. Discover Computing , volume=. 2025 , publisher=

work page 2025
[62]

2022 , note =

GitHub REST API Documentation (version 2022-11-28) , author =. 2022 , note =

work page 2022
[63]

Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics , pages=

Discourse segmentation of multi-party conversation , author=. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics , pages=

work page
[64]

Biochemia medica , volume=

Interrater reliability: the kappa statistic , author=. Biochemia medica , volume=. 2012 , publisher=

work page 2012
[65]

2025 , note =

The Kubernetes , title =. 2025 , note =

work page 2025
[66]

The Cambridge handbook of multimedia learning , pages=

The signaling (or cueing) principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , pages=. 2021 , publisher=

work page 2021
[67]

, author=

Effects of Example-Problem Pairs on Students' Mathematics Achievements: A Mixed-Method Study. , author=. International Education Studies , volume=. 2021 , publisher=

work page 2021
[68]

Information and software technology , volume=

A mapping study on documentation in Continuous Software Development , author=. Information and software technology , volume=. 2022 , publisher=

work page 2022
[69]

Challenges and Solutions of Free and Open Source Software Documentation: A Systematic Mapping Study , author=. Simp. 2024 , publisher=

work page 2024
[70]

Low-code LLM: Graphical user interface over large language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations) , pages=

work page 2024
[71]

Frontiers in Psychology , volume=

Rethinking pre-training: cognitive load implications for learners with varying prior knowledge , author=. Frontiers in Psychology , volume=. 2025 , publisher=

work page 2025
[72]

Procedia Computer Science , volume=

Automating Software Diagram Generation with Large Language Models , author=. Procedia Computer Science , volume=. 2025 , publisher=

work page 2025
[73]

Information and Software Technology , volume=

Is this GitHub project maintained? Measuring the level of maintenance activity of open-source projects , author=. Information and Software Technology , volume=. 2020 , publisher=

work page 2020
[74]

1991 , publisher=

A theory of goal setting and task performance , author=. 1991 , publisher=

work page 1991
[75]

Procedia Computer Science , volume=

Open source software (OSS) quality assurance: A survey paper , author=. Procedia Computer Science , volume=. 2015 , publisher=

work page 2015
[76]

Informing Science , volume=

A cognitive approach to instructional design for multimedia learning , author=. Informing Science , volume=. 2005 , publisher=

work page 2005
[77]

The handbook of educational theories , pages=

The Cognitive Theory , author=. The handbook of educational theories , pages=. 2013 , publisher=

work page 2013
[78]

The Cambridge handbook of multimedia learning , volume=

The split-attention principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , volume=

work page
[79]

, author=

Reducing cognitive load by mixing auditory and visual presentation modes. , author=. Journal of educational psychology , volume=. 1995 , publisher=

work page 1995
[80]

The Cambridge handbook of multimedia learning , volume=

The modality principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , volume=

work page

Showing first 80 references.

[1] [1]

2015 29th Brazilian Symposium on Software Engineering , pages=

Increasing the self-efficacy of newcomers to Open Source Software projects , author=. 2015 29th Brazilian Symposium on Software Engineering , pages=. 2015 , organization=

work page 2015

[2] [2]

IEEE Transactions on Software Engineering , year=

From First Patch to Long-Term Contributor: Evaluating Onboarding Recommendations for OSS Newcomers , author=. IEEE Transactions on Software Engineering , year=

work page

[3] [3]

Educational psychology review , volume=

Dual coding theory and education , author=. Educational psychology review , volume=. 1991 , publisher=

work page 1991

[4] [4]

IFIP International Conference on Open Source Systems , pages=

Barriers faced by newcomers to open source projects: a systematic review , author=. IFIP International Conference on Open Source Systems , pages=. 2014 , organization=

work page 2014

[5] [5]

Psychology of learning and motivation , volume=

Cognitive load theory , author=. Psychology of learning and motivation , volume=. 2011 , publisher=

work page 2011

[6] [6]

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

Do contributing files provide information about oss newcomers’ onboarding barriers? , author=. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

work page

[7] [7]

IEEE Transactions on Software Engineering , volume=

How gender-biased tools shape newcomer experiences in oss projects , author=. IEEE Transactions on Software Engineering , volume=. 2020 , publisher=

work page 2020

[8] [8]

International Conference on Evaluation of Novel Approaches to Software Engineering , volume=

Evaluating the quality of the documentation of open source software , author=. International Conference on Evaluation of Novel Approaches to Software Engineering , volume=. 2017 , organization=

work page 2017

[9] [9]

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=

Evaluating software documentation quality , author=. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=. 2023 , organization=

work page 2023

[10] [10]

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

Beyond accuracy: Assessing software documentation quality , author=. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=

work page

[11] [11]

Transformers: State-of-the-Art Machine Learning Library , year =

work page

[12] [12]

2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages=

Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code , author=. 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages=. 2025 , organization=

work page 2025

[13] [13]

2014 science and information conference , pages=

How many participants are really enough for usability studies? , author=. 2014 science and information conference , pages=. 2014 , organization=

work page 2014

[14] [14]

interactions , volume=

A heuristic evaluation of a World Wide Web prototype , author=. interactions , volume=. 1996 , publisher=

work page 1996

[15] [15]

Proceedings of the 21st annual international conference on Documentation , pages=

Towards a documentation maturity model , author=. Proceedings of the 21st annual international conference on Documentation , pages=

work page

[16] [16]

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , pages=

Automatic code documentation generation using gpt-3 , author=. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , pages=

work page

[17] [17]

ACM Transactions on Software Engineering and Methodology , publisher=

Addressing OSS Community Managers’ Challenges in Contributor Retention , author=. ACM Transactions on Software Engineering and Methodology , publisher=

work page

[18] [18]

arXiv preprint arXiv:2504.08725 , year=

DocAgent: A Multi-Agent System for Automated Code Documentation Generation , author=. arXiv preprint arXiv:2504.08725 , year=

work page arXiv

[19] [19]

Proceedings of the 22nd International Conference on Program Comprehension , pages=

Automatic documentation generation via source code summarization of method context , author=. Proceedings of the 22nd International Conference on Program Comprehension , pages=

work page

[20] [20]

Qualitative research in psychology , volume=

Using thematic analysis in psychology , author=. Qualitative research in psychology , volume=. 2006 , publisher=

work page 2006

[21] [21]

Proceedings of the 17th international conference on evaluation and assessment in software engineering , pages=

Evaluating usage and quality of technical software documentation: an empirical study , author=. Proceedings of the 17th international conference on evaluation and assessment in software engineering , pages=

work page

[22] [22]

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , pages=

Towards Leveraging LLMs for Reducing Open Source Onboarding Information Overload , author=. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , pages=

work page

[23] [23]

Journal of educational psychology , volume=

When learning is just a click away: Does simple user interaction foster deeper understanding of multimedia messages? , author=. Journal of educational psychology , volume=. 2001 , publisher=

work page 2001

[24] [24]

Smart Learning Environments , volume=

Multimedia learning principles in different learning environments: A systematic review , author=. Smart Learning Environments , volume=. 2022 , publisher=

work page 2022

[25] [25]

Educational technology research and development , volume=

Using commonly-available technologies to create online multimedia lessons through the application of the Cognitive Theory of Multimedia Learning , author=. Educational technology research and development , volume=. 2023 , publisher=

work page 2023

[26] [26]

Heliyon , volume=

The implementation of the cognitive theory of multimedia learning in the design and evaluation of an AI educational video assistant utilizing large language models , author=. Heliyon , volume=. 2024 , publisher=

work page 2024

[27] [27]

Medical education , volume=

Applying the cognitive theory of multimedia learning: an analysis of medical animations , author=. Medical education , volume=. 2013 , publisher=

work page 2013

[28] [28]

Educational Psychology Review , volume=

The past, present, and future of the cognitive theory of multimedia learning , author=. Educational Psychology Review , volume=. 2024 , publisher=

work page 2024

[29] [29]

2005 , publisher=

The Cambridge handbook of multimedia learning , author=. 2005 , publisher=

work page 2005

[30] [30]

Empirical Software Engineering , volume=

A field study of API learning obstacles , author=. Empirical Software Engineering , volume=. 2011 , publisher=

work page 2011

[31] [31]

Proceedings of the 38th International Conference on Software Engineering , pages=

Overcoming open source project entry barriers with a portal for newcomers , author=. Proceedings of the 38th International Conference on Software Engineering , pages=

work page

[32] [32]

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=

Measuring the cognitive load of software developers: A systematic mapping study , author=. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=. 2019 , organization=

work page 2019

[33] [33]

Information and Software Technology , volume=

Usefulness and usability of heuristic walkthroughs for evaluating domain-specific developer tools in industry: Evidence from four field simulations , author=. Information and Software Technology , volume=. 2023 , publisher=

work page 2023

[34] [34]

Usability evaluation in industry , volume=

SUS-A quick and dirty usability scale , author=. Usability evaluation in industry , volume=. 1996 , publisher=

work page 1996

[35] [35]

Queue , volume=

GPTs and hallucination: why do large language models hallucinate? , author=. Queue , volume=. 2024 , publisher=

work page 2024

[36] [36]

doi:10.5281/zenodo.17859471 , url =

Feng, Zixuan , title =. doi:10.5281/zenodo.17859471 , url =

work page doi:10.5281/zenodo.17859471

[37] [37]

Proceedings of the 2025 ACM Conference on International Computing Education Research V

The Effects of GitHub Copilot on Computing Students' Programming Effectiveness, Efficiency, and Processes in Brownfield Coding Tasks , author=. Proceedings of the 2025 ACM Conference on International Computing Education Research V. 1 , pages=

work page 2025

[38] [38]

Code Comprehension with GitHub Copilot: Performance Gains, Comprehension Trade-offs, and Behavioral Predictors in Brownfield Programming

Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming: A Replication and Extension , author=. arXiv preprint arXiv:2511.02922 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

arXiv preprint arXiv:2510.17894 , year=

A Systematic Literature Review of the Use of GenAI Assistants for Code Comprehension: Implications for Computing Education Research and Practice , author=. arXiv preprint arXiv:2510.17894 , year=

work page arXiv

[40] [40]

Systems and Soft Computing , pages=

Comparative analysis based on deepseek, chatgpt, and google gemini: Features, techniques, performance, future prospects , author=. Systems and Soft Computing , pages=. 2025 , publisher=

work page 2025

[41] [41]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[42] [42]

and Lopez-Villarejo, Juan J

Kosower, David A. and Lopez-Villarejo, Juan J. and Roubtsov, Serguei , booktitle=. Flowgen: Flowchart-Based Documentation Framework for C++ , year=

work page

[43] [43]

2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA) , pages=

Documentation Practices in Agile Software Development: A Systematic Literature Review , author=. 2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA) , pages=. 2023 , organization=

work page 2023

[44] [44]

IEEE Software , volume=

Let me in: Guidelines for the successful onboarding of newcomers to open source projects , author=. IEEE Software , volume=. 2018 , publisher=

work page 2018

[45] [45]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

work page

[46] [46]

Learning and individual differences , volume=

ChatGPT for good? On opportunities and challenges of large language models for education , author=. Learning and individual differences , volume=. 2023 , publisher=

work page 2023

[47] [47]

Fundamentals of artificial intelligence , pages=

Natural language processing , author=. Fundamentals of artificial intelligence , pages=. 2020 , publisher=

work page 2020

[48] [48]

arXiv preprint arXiv:2102.12727 , year=

What's in a GitHub Repository?--A Software Documentation Perspective , author=. arXiv preprint arXiv:2102.12727 , year=

work page arXiv

[49] [49]

Information and Software Technology , volume=

A systematic literature review on the barriers faced by newcomers to open source software projects , author=. Information and Software Technology , volume=. 2015 , publisher=

work page 2015

[50] [50]

Information and Software Technology , volume=

Guiding the way: A systematic literature review on mentoring practices in open source software projects , author=. Information and Software Technology , volume=. 2024 , publisher=

work page 2024

[51] [51]

2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE) , pages=

The Introduction of README and CONTRIBUTING Files in Open Source Software Development , author=. 2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE) , pages=. 2025 , organization=

work page 2025

[52] [52]

Information and Software Technology , pages=

Community Tapestry: An actionable tool to track turnover and diversity in OSS , author=. Information and Software Technology , pages=. 2025 , publisher=

work page 2025

[53] [53]

Proceedings of the acm/ieee 42nd international conference on software engineering , pages=

Software documentation: the practitioners' perspective , author=. Proceedings of the acm/ieee 42nd international conference on software engineering , pages=

work page

[54] [54]

Educational psychologist , volume=

Nine ways to reduce cognitive load in multimedia learning , author=. Educational psychologist , volume=. 2003 , publisher=

work page 2003

[55] [55]

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , volume=

A layered reference model of the brain (LRMB) , author=. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , volume=. 2006 , publisher=

work page 2006

[56] [56]

The Second IEEE International Conference on Cognitive Informatics, 2003

The cognitive process of comprehension , author=. The Second IEEE International Conference on Cognitive Informatics, 2003. Proceedings. , pages=. 2003 , organization=

work page 2003

[57] [57]

arXiv preprint arXiv:2312.11431 , year=

Make It Make Sense! Understanding and Facilitating Sensemaking in Computational Notebooks , author=. arXiv preprint arXiv:2312.11431 , year=

work page arXiv

[58] [58]

Computer Supported Cooperative Work (CSCW) , volume=

The types, roles, and practices of documentation in data analytics open source software libraries: a collaborative ethnography of documentation work , author=. Computer Supported Cooperative Work (CSCW) , volume=. 2018 , publisher=

work page 2018

[59] [59]

Educational Psychology Review , volume=

Learning from maps and diagrams , author=. Educational Psychology Review , volume=. 1991 , publisher=

work page 1991

[60] [60]

Proceedings of the 2015 10th joint meeting on foundations of software engineering , pages=

Summarizing and measuring development activity , author=. Proceedings of the 2015 10th joint meeting on foundations of software engineering , pages=

work page 2015

[61] [61]

Discover Computing , volume=

Max--Min semantic chunking of documents for RAG application , author=. Discover Computing , volume=. 2025 , publisher=

work page 2025

[62] [62]

2022 , note =

GitHub REST API Documentation (version 2022-11-28) , author =. 2022 , note =

work page 2022

[63] [63]

Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics , pages=

Discourse segmentation of multi-party conversation , author=. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics , pages=

work page

[64] [64]

Biochemia medica , volume=

Interrater reliability: the kappa statistic , author=. Biochemia medica , volume=. 2012 , publisher=

work page 2012

[65] [65]

2025 , note =

The Kubernetes , title =. 2025 , note =

work page 2025

[66] [66]

The Cambridge handbook of multimedia learning , pages=

The signaling (or cueing) principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , pages=. 2021 , publisher=

work page 2021

[67] [67]

, author=

Effects of Example-Problem Pairs on Students' Mathematics Achievements: A Mixed-Method Study. , author=. International Education Studies , volume=. 2021 , publisher=

work page 2021

[68] [68]

Information and software technology , volume=

A mapping study on documentation in Continuous Software Development , author=. Information and software technology , volume=. 2022 , publisher=

work page 2022

[69] [69]

Challenges and Solutions of Free and Open Source Software Documentation: A Systematic Mapping Study , author=. Simp. 2024 , publisher=

work page 2024

[70] [70]

Low-code LLM: Graphical user interface over large language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations) , pages=

work page 2024

[71] [71]

Frontiers in Psychology , volume=

Rethinking pre-training: cognitive load implications for learners with varying prior knowledge , author=. Frontiers in Psychology , volume=. 2025 , publisher=

work page 2025

[72] [72]

Procedia Computer Science , volume=

Automating Software Diagram Generation with Large Language Models , author=. Procedia Computer Science , volume=. 2025 , publisher=

work page 2025

[73] [73]

Information and Software Technology , volume=

Is this GitHub project maintained? Measuring the level of maintenance activity of open-source projects , author=. Information and Software Technology , volume=. 2020 , publisher=

work page 2020

[74] [74]

1991 , publisher=

A theory of goal setting and task performance , author=. 1991 , publisher=

work page 1991

[75] [75]

Procedia Computer Science , volume=

Open source software (OSS) quality assurance: A survey paper , author=. Procedia Computer Science , volume=. 2015 , publisher=

work page 2015

[76] [76]

Informing Science , volume=

A cognitive approach to instructional design for multimedia learning , author=. Informing Science , volume=. 2005 , publisher=

work page 2005

[77] [77]

The handbook of educational theories , pages=

The Cognitive Theory , author=. The handbook of educational theories , pages=. 2013 , publisher=

work page 2013

[78] [78]

The Cambridge handbook of multimedia learning , volume=

The split-attention principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , volume=

work page

[79] [79]

, author=

Reducing cognitive load by mixing auditory and visual presentation modes. , author=. Journal of educational psychology , volume=. 1995 , publisher=

work page 1995

[80] [80]

The Cambridge handbook of multimedia learning , volume=

The modality principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , volume=

work page