Restructure This: Using AI to Restructure Onboarding Documents to Reduce Cognitive Overload

Anita Sarma; Igor Steinmacher; Marco Aurelio Gerosa; Prashant Tandan; Zixuan Feng

arxiv: 2605.19174 · v2 · pith:7P5UN7O3new · submitted 2026-05-18 · 💻 cs.SE

Restructure This: Using AI to Restructure Onboarding Documents to Reduce Cognitive Overload

Zixuan Feng , Prashant Tandan , Igor Steinmacher , Marco Aurelio Gerosa , Anita Sarma This is my paper

Pith reviewed 2026-05-20 08:33 UTC · model grok-4.3

classification 💻 cs.SE

keywords onboarding documentationopen source softwarecognitive loadgenerative AIdocumentation restructuringnewcomer experiencemultimedia learning

0 comments

The pith

Restructuring OSS onboarding documents with AI and cognitive principles improves newcomer task success and reduces cognitive load.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Open source software projects often lose potential contributors because their onboarding documentation is dense, fragmented, and inconsistent. This paper tests whether applying principles from the Cognitive Theory of Multimedia Learning through a generative AI pipeline can fix that. The resulting VisDoc prototype breaks documents into task-focused segments, removes repeats, infers workflows, and adds visual and other explanations. In tests, experts found the output reliable and useful, while actual newcomers using the restructured documents succeeded more often, felt less overwhelmed, and rated the materials as more usable.

Core claim

A generative AI pipeline called VisDoc that applies Cognitive Theory of Multimedia Learning strategies to restructure open source onboarding documentation produces materials that experts judge complete and accurate, and that allow newcomers to achieve higher task success rates with lower cognitive load and higher perceived usability.

What carries the argument

VisDoc, the GenAI prototype that segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations.

If this is right

Newcomers achieve higher rates of task success when using the restructured documentation.
Users of VisDoc report significantly lower cognitive load during onboarding tasks.
The restructured documents receive higher usability ratings from participants.
Expert evaluators confirm that the restructured documents maintain completeness and accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This restructuring method could apply to documentation in other domains where newcomers face dense technical materials.
Long-term studies might reveal whether the reduced cognitive load leads to better retention and continued contribution.
Similar GenAI pipelines could be developed for maintaining documentation as projects evolve.

Load-bearing premise

The generative AI pipeline correctly applies the cognitive learning strategies without creating new inaccuracies or confusing content in the documents.

What would settle it

A larger study that finds no difference in task success or cognitive load between groups using original versus VisDoc-restructured documents would indicate the approach does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2605.19174 by Anita Sarma, Igor Steinmacher, Marco Aurelio Gerosa, Prashant Tandan, Zixuan Feng.

**Figure 1.** Figure 1: VisDoc Task Tree UI with tagged features. layout using the Clear button ( F ), returning the interface to a clean, collapsed state. 4.2 CTML-Guided Design Strategies Segmenting and Pretraining for mitigating C1. To reduce essential overload (C1), VisDoc applies CTML’s segmenting and pretraining strategies by breaking complex onboarding documentation into short, task-based units and generating a high-level… view at source ↗

**Figure 2.** Figure 2: VisDoc Infrastructure Overview Chunker (Langchain 2025). We used a ground-truth segmentation of the CONTRIBUTING.md of an OSS project (Kubernetes)2 , annotated independently by two researchers (93.8% agreement (McHugh 2012)). We compared both methods using Pk (Beeferman et al. 1999) and WinDiff (Pevzner and Hearst 2002). LangChain’s Semantic Chunker performed better than RoBERTa (Pk = 0.33 vs. 0.36; WinDi… view at source ↗

**Figure 3.** Figure 3: Two-Phase Evaluation: Expert Evaluation and Between-subject User Study. ing development and our formative evaluation to promote transferability and adaptability across OSS contexts (Guizani et al. 2025). We chose the Transformers project because: (1) It belongs to the AI/ML domain, a very different domain from the Kubernetes-based project, allowing us to assess generalization across technical ecosystems. … view at source ↗

**Figure 4.** Figure 4: Task success rates for each task (T1–T3). VisDoc group (cyan) and documentation+ChatGPT group (blue). Participants’ reflections helped explain the lower failure rates in the VisDoc group. They emphasized that VisDoc’s structured, visual layout and guided task flows reduced uncertainty and steered them away from common errors [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Item-level SUS comparison using half–violin and box plots for VisDoc (cyan) and documentation+ChatGPT (blue). The Y-axis shows normalized Likert ratings (1–5; higher = better), with negatively worded items reverse-scored. Black dots indicate mean scores for each group. Hollow dots are outliers. easily” [P9]. Others highlighted that VisDoc felt coherent and well-structured: “I could visualize the hierarchy.… view at source ↗

read the original abstract

Onboarding documentation is critical for attracting and retaining newcomers in open source software (OSS). However, it is often presented as dense, inconsistently structured, and fragmented presentations that are difficult to understand, which creates cognitive overload leading to frustration, errors, and abandonment. Here, we investigate how Cognitive Theory of Multimedia Learning (CTML) strategies can be used to restructure OSS documentation. We use a GenAI-based pipeline to operationalize these strategies to restructure OSS documentation through our prototype VisDoc. VisDoc segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations. An expert evaluation (N=4) affirmed VisDoc's completeness, accuracy, and adoptability; A between-subjects evaluation (N=14) with newcomers found that VisDoc participants achieved higher task success, had significantly lower cognitive load, and perceived higher usability. The contributions of this work include a CTML-grounded analysis of onboarding challenges, a GenAI-based documentation restructuring pipeline, and empirical evidence that cognitively informed documentation restructuring reduces cognitive load and improves usability and task performance in OSS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VisDoc gives a workable GenAI pipeline for applying CTML to OSS onboarding docs and a small study shows lower cognitive load, but the AI outputs lack thorough checks for errors and the samples are too small to pin down the cause.

read the letter

The paper's main takeaway is a concrete GenAI pipeline, VisDoc, that breaks OSS onboarding docs into task-based segments, removes redundancy, infers workflows, and adds multimodal explanations based on Cognitive Theory of Multimedia Learning. A between-subjects study with 14 newcomers reported higher task success, lower cognitive load, and better usability for the restructured versions compared to the originals. An expert review with four people also backed the completeness and accuracy of the outputs. That combination of theory-driven restructuring and empirical results is the new piece here, and it directly targets a documented pain point in open source newcomer retention. The work does a solid job turning CTML principles into an operational pipeline and testing it on real documentation rather than staying at the level of abstract recommendations. The expert and user evaluations provide at least initial evidence that the approach can improve outcomes. The soft spots are mostly around scale and validation. The user study sits at N=14 with no power analysis or detailed statistical reporting, and the expert check is even smaller. More critically, there is no systematic review for GenAI-specific problems like hallucinated steps, introduced inconsistencies, or misleading multimodal content. If those exist, the measured gains could come from something other than the intended CTML restructuring. No open data or code is shared either, which limits how far others can verify or extend the results. This paper is aimed at software engineering researchers who work on open source onboarding or HCI folks interested in AI-supported documentation tools. A reader who needs practical examples of theory applied to real OSS docs would get usable ideas from the pipeline description and the study setup. It is coherent enough and grounded enough to deserve peer review, even though it will need larger samples and explicit checks on the generated content before the central claims can be taken as settled.

Referee Report

2 major / 2 minor

Summary. The paper claims that a GenAI-based pipeline called VisDoc can operationalize Cognitive Theory of Multimedia Learning (CTML) strategies to restructure OSS onboarding documentation, thereby reducing cognitive overload. This is supported by an expert evaluation (N=4) that affirmed completeness, accuracy, and adoptability of the outputs, plus a between-subjects user study (N=14) with newcomers showing higher task success, significantly lower cognitive load, and higher perceived usability for VisDoc-restructured documents versus originals. Contributions include a CTML-grounded analysis of onboarding challenges, the restructuring pipeline, and empirical evidence of benefits.

Significance. If the results hold, the work is significant for software engineering and HCI, offering a scalable, theory-grounded approach to improving OSS newcomer onboarding. Better documentation could reduce frustration and abandonment, aiding retention and productivity in open-source communities. The combination of CTML with GenAI provides a practical method for addressing cognitive issues in technical docs, with the user-study evidence strengthening real-world applicability.

major comments (2)

[GenAI Pipeline and Evaluation sections] The central empirical claim (higher task success, lower cognitive load, higher usability) in the between-subjects evaluation (N=14) depends on the restructured documents being faithful applications of CTML without GenAI-induced artifacts. The paper reports only an expert evaluation (N=4) on completeness/accuracy/adoptability but provides no systematic validation for hallucinated steps, introduced redundancies, misleading multimodal elements, or inconsistencies. This validation gap is load-bearing for attributing benefits to CTML restructuring rather than other content properties.
[User Study / Between-subjects evaluation] The user study reports positive outcomes but with N=14, no reported statistical details (e.g., specific tests, p-values, effect sizes), power analysis, or open data/code. These omissions limit confidence that the findings reliably support the claim of reduced cognitive overload, especially in a between-subjects design where individual differences could confound results.

minor comments (2)

[Pipeline Description] Clarify the exact CTML strategies implemented in the pipeline (e.g., which principles for segmentation, redundancy removal, and multimodal generation) and how they map to specific GenAI prompts or steps.
[Abstract and Results] The abstract states 'significantly lower cognitive load' without qualifiers; ensure the full text reports the exact measure (e.g., NASA-TLX) and any limitations of the small-sample comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the positive assessment of our work's significance and for the constructive major comments. We appreciate the opportunity to strengthen the manuscript by addressing the validation of the GenAI pipeline outputs and the statistical reporting in the user study.

read point-by-point responses

Referee: [GenAI Pipeline and Evaluation sections] The central empirical claim (higher task success, lower cognitive load, higher usability) in the between-subjects evaluation (N=14) depends on the restructured documents being faithful applications of CTML without GenAI-induced artifacts. The paper reports only an expert evaluation (N=4) on completeness/accuracy/adoptability but provides no systematic validation for hallucinated steps, introduced redundancies, misleading multimodal elements, or inconsistencies. This validation gap is load-bearing for attributing benefits to CTML restructuring rather than other content properties.

Authors: We agree that more targeted validation is needed to rule out GenAI-induced artifacts and support attribution to CTML strategies. The existing expert evaluation (N=4) focused on high-level completeness, accuracy, and adoptability but did not explicitly probe for hallucinated workflow steps, introduced redundancies, or misleading multimodal elements. In the revised manuscript we will augment the evaluation protocol with specific items addressing these issues (e.g., expert ratings on presence of inconsistencies or misleading content) and report the results to provide stronger evidence that benefits derive from the CTML-informed restructuring. revision: yes
Referee: [User Study / Between-subjects evaluation] The user study reports positive outcomes but with N=14, no reported statistical details (e.g., specific tests, p-values, effect sizes), power analysis, or open data/code. These omissions limit confidence that the findings reliably support the claim of reduced cognitive overload, especially in a between-subjects design where individual differences could confound results.

Authors: We acknowledge the limitations of the small sample and the absence of detailed statistical reporting. In the revision we will specify the exact statistical tests performed, report p-values and effect sizes, include a post-hoc power analysis, and make anonymized data and analysis code available via a public repository. We will also expand the limitations section to discuss potential confounds from individual differences in the between-subjects design and how random assignment was used to mitigate them. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results independent of any derivation or self-referential fit

full rationale

The paper describes a GenAI pipeline that applies CTML strategies to restructure OSS onboarding documents, followed by an expert review (N=4) for completeness/accuracy/adoptability and a separate between-subjects user study (N=14) measuring task success, cognitive load, and usability. No equations, parameter fitting, or predictive models are presented whose outputs reduce by construction to the inputs. Central claims rest on these independent empirical evaluations rather than self-definition, self-citation chains, or renamed known results. The work is self-contained against external user-study benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The work rests on established CTML as background theory and introduces one new software artifact; no numerical free parameters are fitted and no new physical entities are postulated.

axioms (1)

domain assumption Cognitive Theory of Multimedia Learning strategies can be effectively operationalized by generative AI to reduce cognitive overload in technical documentation
This premise underpins the design of the VisDoc pipeline and the claim that restructuring improves usability and task performance.

invented entities (1)

VisDoc no independent evidence
purpose: Generative AI prototype that segments, deduplicates, and multimodalizes OSS onboarding documents
New system developed and evaluated in the paper; no independent evidence outside this work is provided.

pith-pipeline@v0.9.0 · 5736 in / 1334 out tokens · 42953 ms · 2026-05-20T08:33:44.244007+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, BranchSelection.lean, AlexanderDuality.lean reality_from_one_distinction; branch_selection; alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

VisDoc segments documentation into task-based units, infers workflows, removes redundancy, and generates multimodal explanations... CTML strategies... Segmenting, Pretraining, Aligning, Eliminating, Signaling, Weeding, Off-loading, Synchronizing, Individualizing

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.