Restructure This: Using AI to Restructure Onboarding Documents to Reduce Cognitive Overload
Pith reviewed 2026-05-20 08:33 UTC · model grok-4.3
The pith
Restructuring OSS onboarding documents with AI and cognitive principles improves newcomer task success and reduces cognitive load.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A generative AI pipeline called VisDoc that applies Cognitive Theory of Multimedia Learning strategies to restructure open source onboarding documentation produces materials that experts judge complete and accurate, and that allow newcomers to achieve higher task success rates with lower cognitive load and higher perceived usability.
What carries the argument
VisDoc, the GenAI prototype that segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations.
If this is right
- Newcomers achieve higher rates of task success when using the restructured documentation.
- Users of VisDoc report significantly lower cognitive load during onboarding tasks.
- The restructured documents receive higher usability ratings from participants.
- Expert evaluators confirm that the restructured documents maintain completeness and accuracy.
Where Pith is reading between the lines
- This restructuring method could apply to documentation in other domains where newcomers face dense technical materials.
- Long-term studies might reveal whether the reduced cognitive load leads to better retention and continued contribution.
- Similar GenAI pipelines could be developed for maintaining documentation as projects evolve.
Load-bearing premise
The generative AI pipeline correctly applies the cognitive learning strategies without creating new inaccuracies or confusing content in the documents.
What would settle it
A larger study that finds no difference in task success or cognitive load between groups using original versus VisDoc-restructured documents would indicate the approach does not deliver the claimed benefits.
Figures
read the original abstract
Onboarding documentation is critical for attracting and retaining newcomers in open source software (OSS). However, it is often presented as dense, inconsistently structured, and fragmented presentations that are difficult to understand, which creates cognitive overload leading to frustration, errors, and abandonment. Here, we investigate how Cognitive Theory of Multimedia Learning (CTML) strategies can be used to restructure OSS documentation. We use a GenAI-based pipeline to operationalize these strategies to restructure OSS documentation through our prototype VisDoc. VisDoc segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations. An expert evaluation (N=4) affirmed VisDoc's completeness, accuracy, and adoptability; A between-subjects evaluation (N=14) with newcomers found that VisDoc participants achieved higher task success, had significantly lower cognitive load, and perceived higher usability. The contributions of this work include a CTML-grounded analysis of onboarding challenges, a GenAI-based documentation restructuring pipeline, and empirical evidence that cognitively informed documentation restructuring reduces cognitive load and improves usability and task performance in OSS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a GenAI-based pipeline called VisDoc can operationalize Cognitive Theory of Multimedia Learning (CTML) strategies to restructure OSS onboarding documentation, thereby reducing cognitive overload. This is supported by an expert evaluation (N=4) that affirmed completeness, accuracy, and adoptability of the outputs, plus a between-subjects user study (N=14) with newcomers showing higher task success, significantly lower cognitive load, and higher perceived usability for VisDoc-restructured documents versus originals. Contributions include a CTML-grounded analysis of onboarding challenges, the restructuring pipeline, and empirical evidence of benefits.
Significance. If the results hold, the work is significant for software engineering and HCI, offering a scalable, theory-grounded approach to improving OSS newcomer onboarding. Better documentation could reduce frustration and abandonment, aiding retention and productivity in open-source communities. The combination of CTML with GenAI provides a practical method for addressing cognitive issues in technical docs, with the user-study evidence strengthening real-world applicability.
major comments (2)
- [GenAI Pipeline and Evaluation sections] The central empirical claim (higher task success, lower cognitive load, higher usability) in the between-subjects evaluation (N=14) depends on the restructured documents being faithful applications of CTML without GenAI-induced artifacts. The paper reports only an expert evaluation (N=4) on completeness/accuracy/adoptability but provides no systematic validation for hallucinated steps, introduced redundancies, misleading multimodal elements, or inconsistencies. This validation gap is load-bearing for attributing benefits to CTML restructuring rather than other content properties.
- [User Study / Between-subjects evaluation] The user study reports positive outcomes but with N=14, no reported statistical details (e.g., specific tests, p-values, effect sizes), power analysis, or open data/code. These omissions limit confidence that the findings reliably support the claim of reduced cognitive overload, especially in a between-subjects design where individual differences could confound results.
minor comments (2)
- [Pipeline Description] Clarify the exact CTML strategies implemented in the pipeline (e.g., which principles for segmentation, redundancy removal, and multimodal generation) and how they map to specific GenAI prompts or steps.
- [Abstract and Results] The abstract states 'significantly lower cognitive load' without qualifiers; ensure the full text reports the exact measure (e.g., NASA-TLX) and any limitations of the small-sample comparison.
Simulated Author's Rebuttal
Thank you for the positive assessment of our work's significance and for the constructive major comments. We appreciate the opportunity to strengthen the manuscript by addressing the validation of the GenAI pipeline outputs and the statistical reporting in the user study.
read point-by-point responses
-
Referee: [GenAI Pipeline and Evaluation sections] The central empirical claim (higher task success, lower cognitive load, higher usability) in the between-subjects evaluation (N=14) depends on the restructured documents being faithful applications of CTML without GenAI-induced artifacts. The paper reports only an expert evaluation (N=4) on completeness/accuracy/adoptability but provides no systematic validation for hallucinated steps, introduced redundancies, misleading multimodal elements, or inconsistencies. This validation gap is load-bearing for attributing benefits to CTML restructuring rather than other content properties.
Authors: We agree that more targeted validation is needed to rule out GenAI-induced artifacts and support attribution to CTML strategies. The existing expert evaluation (N=4) focused on high-level completeness, accuracy, and adoptability but did not explicitly probe for hallucinated workflow steps, introduced redundancies, or misleading multimodal elements. In the revised manuscript we will augment the evaluation protocol with specific items addressing these issues (e.g., expert ratings on presence of inconsistencies or misleading content) and report the results to provide stronger evidence that benefits derive from the CTML-informed restructuring. revision: yes
-
Referee: [User Study / Between-subjects evaluation] The user study reports positive outcomes but with N=14, no reported statistical details (e.g., specific tests, p-values, effect sizes), power analysis, or open data/code. These omissions limit confidence that the findings reliably support the claim of reduced cognitive overload, especially in a between-subjects design where individual differences could confound results.
Authors: We acknowledge the limitations of the small sample and the absence of detailed statistical reporting. In the revision we will specify the exact statistical tests performed, report p-values and effect sizes, include a post-hoc power analysis, and make anonymized data and analysis code available via a public repository. We will also expand the limitations section to discuss potential confounds from individual differences in the between-subjects design and how random assignment was used to mitigate them. revision: yes
Circularity Check
No circularity: empirical results independent of any derivation or self-referential fit
full rationale
The paper describes a GenAI pipeline that applies CTML strategies to restructure OSS onboarding documents, followed by an expert review (N=4) for completeness/accuracy/adoptability and a separate between-subjects user study (N=14) measuring task success, cognitive load, and usability. No equations, parameter fitting, or predictive models are presented whose outputs reduce by construction to the inputs. Central claims rest on these independent empirical evaluations rather than self-definition, self-citation chains, or renamed known results. The work is self-contained against external user-study benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Cognitive Theory of Multimedia Learning strategies can be effectively operationalized by generative AI to reduce cognitive overload in technical documentation
invented entities (1)
-
VisDoc
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, BranchSelection.lean, AlexanderDuality.leanreality_from_one_distinction; branch_selection; alexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
VisDoc segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations... CTML strategies... Segmenting, Pretraining, Aligning, Eliminating, Signaling, Weeding, Off-loading, Synchronizing, Individualizing
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
2015 29th Brazilian Symposium on Software Engineering , pages=
Increasing the self-efficacy of newcomers to Open Source Software projects , author=. 2015 29th Brazilian Symposium on Software Engineering , pages=. 2015 , organization=
work page 2015
-
[2]
IEEE Transactions on Software Engineering , year=
From First Patch to Long-Term Contributor: Evaluating Onboarding Recommendations for OSS Newcomers , author=. IEEE Transactions on Software Engineering , year=
-
[3]
Educational psychology review , volume=
Dual coding theory and education , author=. Educational psychology review , volume=. 1991 , publisher=
work page 1991
-
[4]
IFIP International Conference on Open Source Systems , pages=
Barriers faced by newcomers to open source projects: a systematic review , author=. IFIP International Conference on Open Source Systems , pages=. 2014 , organization=
work page 2014
-
[5]
Psychology of learning and motivation , volume=
Cognitive load theory , author=. Psychology of learning and motivation , volume=. 2011 , publisher=
work page 2011
-
[6]
Do contributing files provide information about oss newcomers’ onboarding barriers? , author=. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=
-
[7]
IEEE Transactions on Software Engineering , volume=
How gender-biased tools shape newcomer experiences in oss projects , author=. IEEE Transactions on Software Engineering , volume=. 2020 , publisher=
work page 2020
-
[8]
International Conference on Evaluation of Novel Approaches to Software Engineering , volume=
Evaluating the quality of the documentation of open source software , author=. International Conference on Evaluation of Novel Approaches to Software Engineering , volume=. 2017 , organization=
work page 2017
-
[9]
2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=
Evaluating software documentation quality , author=. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=. 2023 , organization=
work page 2023
-
[10]
Beyond accuracy: Assessing software documentation quality , author=. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=
-
[11]
Transformers: State-of-the-Art Machine Learning Library , year =
-
[12]
Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code , author=. 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages=. 2025 , organization=
work page 2025
-
[13]
2014 science and information conference , pages=
How many participants are really enough for usability studies? , author=. 2014 science and information conference , pages=. 2014 , organization=
work page 2014
-
[14]
A heuristic evaluation of a World Wide Web prototype , author=. interactions , volume=. 1996 , publisher=
work page 1996
-
[15]
Proceedings of the 21st annual international conference on Documentation , pages=
Towards a documentation maturity model , author=. Proceedings of the 21st annual international conference on Documentation , pages=
-
[16]
Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , pages=
Automatic code documentation generation using gpt-3 , author=. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , pages=
-
[17]
ACM Transactions on Software Engineering and Methodology , publisher=
Addressing OSS Community Managers’ Challenges in Contributor Retention , author=. ACM Transactions on Software Engineering and Methodology , publisher=
-
[18]
arXiv preprint arXiv:2504.08725 , year=
DocAgent: A Multi-Agent System for Automated Code Documentation Generation , author=. arXiv preprint arXiv:2504.08725 , year=
-
[19]
Proceedings of the 22nd International Conference on Program Comprehension , pages=
Automatic documentation generation via source code summarization of method context , author=. Proceedings of the 22nd International Conference on Program Comprehension , pages=
-
[20]
Qualitative research in psychology , volume=
Using thematic analysis in psychology , author=. Qualitative research in psychology , volume=. 2006 , publisher=
work page 2006
-
[21]
Evaluating usage and quality of technical software documentation: an empirical study , author=. Proceedings of the 17th international conference on evaluation and assessment in software engineering , pages=
-
[22]
Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , pages=
Towards Leveraging LLMs for Reducing Open Source Onboarding Information Overload , author=. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , pages=
-
[23]
Journal of educational psychology , volume=
When learning is just a click away: Does simple user interaction foster deeper understanding of multimedia messages? , author=. Journal of educational psychology , volume=. 2001 , publisher=
work page 2001
-
[24]
Smart Learning Environments , volume=
Multimedia learning principles in different learning environments: A systematic review , author=. Smart Learning Environments , volume=. 2022 , publisher=
work page 2022
-
[25]
Educational technology research and development , volume=
Using commonly-available technologies to create online multimedia lessons through the application of the Cognitive Theory of Multimedia Learning , author=. Educational technology research and development , volume=. 2023 , publisher=
work page 2023
-
[26]
The implementation of the cognitive theory of multimedia learning in the design and evaluation of an AI educational video assistant utilizing large language models , author=. Heliyon , volume=. 2024 , publisher=
work page 2024
-
[27]
Applying the cognitive theory of multimedia learning: an analysis of medical animations , author=. Medical education , volume=. 2013 , publisher=
work page 2013
-
[28]
Educational Psychology Review , volume=
The past, present, and future of the cognitive theory of multimedia learning , author=. Educational Psychology Review , volume=. 2024 , publisher=
work page 2024
-
[29]
The Cambridge handbook of multimedia learning , author=. 2005 , publisher=
work page 2005
-
[30]
Empirical Software Engineering , volume=
A field study of API learning obstacles , author=. Empirical Software Engineering , volume=. 2011 , publisher=
work page 2011
-
[31]
Proceedings of the 38th International Conference on Software Engineering , pages=
Overcoming open source project entry barriers with a portal for newcomers , author=. Proceedings of the 38th International Conference on Software Engineering , pages=
-
[32]
2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=
Measuring the cognitive load of software developers: A systematic mapping study , author=. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=. 2019 , organization=
work page 2019
-
[33]
Information and Software Technology , volume=
Usefulness and usability of heuristic walkthroughs for evaluating domain-specific developer tools in industry: Evidence from four field simulations , author=. Information and Software Technology , volume=. 2023 , publisher=
work page 2023
-
[34]
Usability evaluation in industry , volume=
SUS-A quick and dirty usability scale , author=. Usability evaluation in industry , volume=. 1996 , publisher=
work page 1996
-
[35]
GPTs and hallucination: why do large language models hallucinate? , author=. Queue , volume=. 2024 , publisher=
work page 2024
-
[36]
doi:10.5281/zenodo.17859471 , url =
Feng, Zixuan , title =. doi:10.5281/zenodo.17859471 , url =
-
[37]
Proceedings of the 2025 ACM Conference on International Computing Education Research V
The Effects of GitHub Copilot on Computing Students' Programming Effectiveness, Efficiency, and Processes in Brownfield Coding Tasks , author=. Proceedings of the 2025 ACM Conference on International Computing Education Research V. 1 , pages=
work page 2025
-
[38]
Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming: A Replication and Extension , author=. arXiv preprint arXiv:2511.02922 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[39]
arXiv preprint arXiv:2510.17894 , year=
A Systematic Literature Review of the Use of GenAI Assistants for Code Comprehension: Implications for Computing Education Research and Practice , author=. arXiv preprint arXiv:2510.17894 , year=
-
[40]
Systems and Soft Computing , pages=
Comparative analysis based on deepseek, chatgpt, and google gemini: Features, techniques, performance, future prospects , author=. Systems and Soft Computing , pages=. 2025 , publisher=
work page 2025
-
[41]
Gemini: A Family of Highly Capable Multimodal Models
Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
Kosower, David A. and Lopez-Villarejo, Juan J. and Roubtsov, Serguei , booktitle=. Flowgen: Flowchart-Based Documentation Framework for C++ , year=
-
[43]
Documentation Practices in Agile Software Development: A Systematic Literature Review , author=. 2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA) , pages=. 2023 , organization=
work page 2023
-
[44]
Let me in: Guidelines for the successful onboarding of newcomers to open source projects , author=. IEEE Software , volume=. 2018 , publisher=
work page 2018
-
[45]
Advances in neural information processing systems , volume=
Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=
-
[46]
Learning and individual differences , volume=
ChatGPT for good? On opportunities and challenges of large language models for education , author=. Learning and individual differences , volume=. 2023 , publisher=
work page 2023
-
[47]
Fundamentals of artificial intelligence , pages=
Natural language processing , author=. Fundamentals of artificial intelligence , pages=. 2020 , publisher=
work page 2020
-
[48]
arXiv preprint arXiv:2102.12727 , year=
What's in a GitHub Repository?--A Software Documentation Perspective , author=. arXiv preprint arXiv:2102.12727 , year=
-
[49]
Information and Software Technology , volume=
A systematic literature review on the barriers faced by newcomers to open source software projects , author=. Information and Software Technology , volume=. 2015 , publisher=
work page 2015
-
[50]
Information and Software Technology , volume=
Guiding the way: A systematic literature review on mentoring practices in open source software projects , author=. Information and Software Technology , volume=. 2024 , publisher=
work page 2024
-
[51]
The Introduction of README and CONTRIBUTING Files in Open Source Software Development , author=. 2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE) , pages=. 2025 , organization=
work page 2025
-
[52]
Information and Software Technology , pages=
Community Tapestry: An actionable tool to track turnover and diversity in OSS , author=. Information and Software Technology , pages=. 2025 , publisher=
work page 2025
-
[53]
Proceedings of the acm/ieee 42nd international conference on software engineering , pages=
Software documentation: the practitioners' perspective , author=. Proceedings of the acm/ieee 42nd international conference on software engineering , pages=
-
[54]
Educational psychologist , volume=
Nine ways to reduce cognitive load in multimedia learning , author=. Educational psychologist , volume=. 2003 , publisher=
work page 2003
-
[55]
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , volume=
A layered reference model of the brain (LRMB) , author=. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , volume=. 2006 , publisher=
work page 2006
-
[56]
The Second IEEE International Conference on Cognitive Informatics, 2003
The cognitive process of comprehension , author=. The Second IEEE International Conference on Cognitive Informatics, 2003. Proceedings. , pages=. 2003 , organization=
work page 2003
-
[57]
arXiv preprint arXiv:2312.11431 , year=
Make It Make Sense! Understanding and Facilitating Sensemaking in Computational Notebooks , author=. arXiv preprint arXiv:2312.11431 , year=
-
[58]
Computer Supported Cooperative Work (CSCW) , volume=
The types, roles, and practices of documentation in data analytics open source software libraries: a collaborative ethnography of documentation work , author=. Computer Supported Cooperative Work (CSCW) , volume=. 2018 , publisher=
work page 2018
-
[59]
Educational Psychology Review , volume=
Learning from maps and diagrams , author=. Educational Psychology Review , volume=. 1991 , publisher=
work page 1991
-
[60]
Proceedings of the 2015 10th joint meeting on foundations of software engineering , pages=
Summarizing and measuring development activity , author=. Proceedings of the 2015 10th joint meeting on foundations of software engineering , pages=
work page 2015
-
[61]
Max--Min semantic chunking of documents for RAG application , author=. Discover Computing , volume=. 2025 , publisher=
work page 2025
-
[62]
GitHub REST API Documentation (version 2022-11-28) , author =. 2022 , note =
work page 2022
-
[63]
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics , pages=
Discourse segmentation of multi-party conversation , author=. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics , pages=
-
[64]
Interrater reliability: the kappa statistic , author=. Biochemia medica , volume=. 2012 , publisher=
work page 2012
- [65]
-
[66]
The Cambridge handbook of multimedia learning , pages=
The signaling (or cueing) principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , pages=. 2021 , publisher=
work page 2021
- [67]
-
[68]
Information and software technology , volume=
A mapping study on documentation in Continuous Software Development , author=. Information and software technology , volume=. 2022 , publisher=
work page 2022
-
[69]
Challenges and Solutions of Free and Open Source Software Documentation: A Systematic Mapping Study , author=. Simp. 2024 , publisher=
work page 2024
-
[70]
Low-code LLM: Graphical user interface over large language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations) , pages=
work page 2024
-
[71]
Frontiers in Psychology , volume=
Rethinking pre-training: cognitive load implications for learners with varying prior knowledge , author=. Frontiers in Psychology , volume=. 2025 , publisher=
work page 2025
-
[72]
Procedia Computer Science , volume=
Automating Software Diagram Generation with Large Language Models , author=. Procedia Computer Science , volume=. 2025 , publisher=
work page 2025
-
[73]
Information and Software Technology , volume=
Is this GitHub project maintained? Measuring the level of maintenance activity of open-source projects , author=. Information and Software Technology , volume=. 2020 , publisher=
work page 2020
-
[74]
A theory of goal setting and task performance , author=. 1991 , publisher=
work page 1991
-
[75]
Procedia Computer Science , volume=
Open source software (OSS) quality assurance: A survey paper , author=. Procedia Computer Science , volume=. 2015 , publisher=
work page 2015
-
[76]
A cognitive approach to instructional design for multimedia learning , author=. Informing Science , volume=. 2005 , publisher=
work page 2005
-
[77]
The handbook of educational theories , pages=
The Cognitive Theory , author=. The handbook of educational theories , pages=. 2013 , publisher=
work page 2013
-
[78]
The Cambridge handbook of multimedia learning , volume=
The split-attention principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , volume=
- [79]
-
[80]
The Cambridge handbook of multimedia learning , volume=
The modality principle in multimedia learning , author=. The Cambridge handbook of multimedia learning , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.