pith. sign in

arxiv: 2604.22747 · v1 · submitted 2026-04-24 · 💻 cs.SE

Code for All: Educational Applications of the "Vibe Coding" Hackathon in Programming Education across All Skill Levels

Pith reviewed 2026-05-08 11:13 UTC · model grok-4.3

classification 💻 cs.SE
keywords vibe codinghackathonprogramming educationAI-assisted developmentlarge language modelsno manual editingmixed-methods assessmentskill levels
0
0 comments X

The pith

Vibe coding hackathons let beginners to experts build functional web apps using only LLM-generated code without any manual edits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether vibe coding, a natural language method where users describe what they want and large language models produce or revise the code, can deliver educational value in programming for people at every skill level. It does so by running a month-long online hackathon with three tracks of rising difficulty: basic frontend features, backend integration, and production-ready deployed applications. All participants had to submit only AI-generated code, full chat histories, demo videos, and reports, with no manual code changes allowed. A mixed-methods evaluation then combined standardized scoring of functionality, design, and prompt quality with post-event surveys and thematic analysis of open feedback to track engagement patterns and practical effects of the no-editing rule. If the observed patterns hold, they point to concrete ways AI-assisted tools could be folded into classrooms and competitions while still preserving learning.

Core claim

The central claim is that the no-manual-editing constraint in vibe coding forces iterative prompting and AI-mediated debugging that supports functional project completion and perceived learning gains across beginner, intermediate, and advanced participants as task complexity rises from simple frontend interactions to full-stack deployed applications.

What carries the argument

The no-manual-editing constraint, which requires every line of code to originate from LLM responses to natural-language descriptions of intent and thereby channels all development through prompting and review of AI output.

If this is right

  • Beginners with no prior coding experience can still deliver working frontend features by iterating on natural-language prompts alone.
  • The constraint shifts debugging from direct code inspection to systematic testing of AI responses and prompt refinement.
  • Engagement and prompt strategies diverge by starting skill level once tasks move from basic UI elements to backend or deployment requirements.
  • These patterns support adding constrained AI-assisted development into both introductory courses and competitive programming formats.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same no-edit format could be adapted for self-paced online courses where learners work at their own speed without a competitive deadline.
  • It suggests prompt engineering might need to be taught explicitly as a parallel skill to traditional syntax and algorithms.
  • Longer-term studies could check whether concepts learned this way transfer to manual coding environments later.
  • The approach might lower entry barriers for groups historically underrepresented in programming by removing syntax as the first hurdle.

Load-bearing premise

That the combination of standardized project scoring, self-reported surveys, and thematic feedback analysis measures real educational gains without large bias from self-selected volunteers or the competitive hackathon setting.

What would settle it

If a controlled comparison found that participants in the vibe-coding hackathon showed no measurable improvement on independent coding tasks after the event, or if project functionality scores remained near zero across all tracks despite the reported prompting activity.

Figures

Figures reproduced from arXiv: 2604.22747 by (2) New York University Tandon School of Engineering, 3), (3) New York University Abu Dhabi), Ashley J. Chen (1), Minghao Shao (2, Muhammad Shafique (3) ((1) New York University Shanghai, Ramesh Karri (2), Yijia Cao (1).

Figure 1
Figure 1. Figure 1: Brief process of Vibe Coding Hackathon and composed modules. view at source ↗
Figure 2
Figure 2. Figure 2: Key differences between traditional coding and vibe coding. view at source ↗
Figure 3
Figure 3. Figure 3: Complete timeline of procedure for hosting online and asynchronous vibe coding hackathon. Weeks 4 & 5 overlap between the “Registration and view at source ↗
Figure 4
Figure 4. Figure 4: Typical vibe coding workflow on example of a mindfulness app. view at source ↗
Figure 5
Figure 5. Figure 5: Visualized differences of final products between tracks. view at source ↗
Figure 6
Figure 6. Figure 6: Post-hackathon questionnaire items, including one Likert-scale eval view at source ↗
Figure 7
Figure 7. Figure 7: Landing page for Project A. Features a simple, one-page application view at source ↗
Figure 8
Figure 8. Figure 8: Example of one prompt used by Team B to create MoodBloom. view at source ↗
Figure 9
Figure 9. Figure 9: Prompts used by Team B. The strategy used is mainly iterative view at source ↗
Figure 10
Figure 10. Figure 10: Landing page for Project B. The app has four features on the view at source ↗
Figure 12
Figure 12. Figure 12: Iterative refinement prompt examples. generation included indications of where errors occurred or directly informed Firebase that there was a problem with a function, and Firebase worked to identify and remedy the mistake. Examples of the prompts can be seen in view at source ↗
Figure 13
Figure 13. Figure 13: Distribution of ratings on overall helpfulness (1–5 Likert scale). view at source ↗
Figure 14
Figure 14. Figure 14: Representative participant comments grouped by thematic categories. view at source ↗
read the original abstract

The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-long online hackathon that welcomed participants from multiple countries, ranging from complete beginners to experienced developers. The hackathon offered three tracks with increasing technical demands. Spark emphasized basic frontend functionality and dynamic features such as buttons, forms, and API calls. Build required backend or database integration. Launch targeted production ready web applications, including deployment. Participants were required to develop projects using only LLM generated code without manual edits and submitted complete chat histories, source code, demo videos, and functionality reports. We assessed educational effectiveness with a mixed methods design that combined standardized project evaluations across functionality, user interface and user experience design, impact, prompt quality, and code readability, along with post-hackathon surveys of perceived learning outcomes and thematic analysis of open-ended feedback. Our findings describe how participants with different backgrounds engage with vibe coding as task complexity increases, how the no manual editing constraint shapes prompting and debugging practices, and what these patterns imply for integrating AI assisted development into programming education and competitive learning environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper reports on a month-long online hackathon using 'vibe coding' (natural-language prompting of LLMs to generate code with a strict no-manual-edits rule) across three tracks of increasing complexity (Spark, Build, Launch). It employs a mixed-methods design combining standardized project scoring on functionality/UI/UX/impact/prompt quality/readability, post-hackathon surveys on perceived learning, and thematic analysis of feedback from self-selected participants ranging from beginners to experienced developers. The central claims are that the data reveal distinct engagement patterns by background as complexity increases, that the no-edits constraint shapes prompting and debugging practices, and that these observations carry implications for integrating AI-assisted development into programming education and competitive settings.

Significance. If the empirical patterns hold after proper methodological reporting, the work would supply one of the first large-scale, multi-track observational datasets on constrained LLM use in a hackathon format, offering concrete examples of how beginners versus experts adapt prompting strategies and how the no-edits rule affects debugging. This could inform curriculum design for AI-augmented programming courses and competitive learning environments. The international, multi-skill-level recruitment and requirement to submit full chat histories are strengths that distinguish it from smaller lab studies.

major comments (2)
  1. [Abstract] Abstract and Methods (implied): The mixed-methods design is described as combining standardized project evaluations, post-hackathon surveys, and thematic analysis, yet no sample size, recruitment method, attrition rate, participant demographics, prior AI exposure measures, or statistical procedures are reported. Without these, the claimed differences in engagement patterns by background and the educational implications cannot be evaluated for robustness or generalizability.
  2. [Abstract] Abstract and Discussion (implied): The paper does not address or control for self-selection bias or the competitive hackathon incentives, which the skeptic correctly flags as likely to favor LLM-comfortable or competition-motivated participants. This directly threatens the weakest assumption that the observed practices reflect typical educational use rather than artifactual behaviors induced by the format.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have addressed each major comment below with point-by-point responses and have revised the manuscript to strengthen methodological transparency and explicitly discuss limitations related to bias.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Methods (implied): The mixed-methods design is described as combining standardized project evaluations, post-hackathon surveys, and thematic analysis, yet no sample size, recruitment method, attrition rate, participant demographics, prior AI exposure measures, or statistical procedures are reported. Without these, the claimed differences in engagement patterns by background and the educational implications cannot be evaluated for robustness or generalizability.

    Authors: We agree that the original submission provided insufficient detail on the participant sample and analytical procedures, which limits evaluation of the results. In the revised manuscript we have added a dedicated Methods section reporting the final sample size (87 participants who submitted complete projects and chat histories), recruitment via targeted posts on social media, university mailing lists, and AI developer communities, attrition rate (28% from initial 121 registrants), demographics (age 18-52, 62% male, self-reported experience levels: 31% beginner, 42% intermediate, 27% advanced), prior AI exposure (pre-survey Likert items on weekly LLM coding use), and statistical procedures (descriptive statistics, chi-square tests for engagement pattern differences across tracks, and inter-rater reliability for thematic coding). These additions directly address the concern and allow readers to assess robustness. revision: yes

  2. Referee: [Abstract] Abstract and Discussion (implied): The paper does not address or control for self-selection bias or the competitive hackathon incentives, which the skeptic correctly flags as likely to favor LLM-comfortable or competition-motivated participants. This directly threatens the weakest assumption that the observed practices reflect typical educational use rather than artifactual behaviors induced by the format.

    Authors: We accept that self-selection and competitive incentives are inherent to the hackathon format and were not explicitly controlled. The study is observational and targets voluntary participants interested in AI-assisted coding; we do not claim the patterns generalize to all classroom settings. In the revised Discussion we have added a Limitations subsection that explicitly acknowledges self-selection bias, the motivational effects of competition and the no-edits rule on prompting/debugging behaviors, and the resulting scope of the educational implications (most relevant to similar voluntary or competitive environments). We have also revised the abstract and conclusions to avoid implying broad typical educational use. This is a partial revision because the observational design precludes experimental control, but the added caveats strengthen the manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity in purely empirical observational study

full rationale

The paper reports findings from a month-long online hackathon using mixed-methods data collection (standardized project evaluations, post-event surveys, and thematic analysis of feedback) without any equations, derivations, fitted parameters, or first-principles predictions. All claims about engagement patterns, prompting practices, and educational implications are presented as direct observations from the collected participant data and submissions. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the described methodology or results. The study is self-contained as an empirical report on the hackathon outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard educational research assumptions about the validity of self-reported outcomes and project-based assessments in a voluntary, self-selected sample; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Participant self-reports via surveys and project submissions accurately reflect genuine learning outcomes and engagement patterns without major distortion from self-selection or competitive incentives.
    This assumption underpins the use of post-hackathon surveys and thematic analysis to draw implications for programming education.

pith-pipeline@v0.9.0 · 5577 in / 1476 out tokens · 58660 ms · 2026-05-08T11:13:41.460422+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Hackathon,

    Wikipedia, “Hackathon,” https://en.wikipedia.org/wiki/Hackathon, accessed: 2026-02-17

  2. [2]

    OpenBSD hackathons,

    OpenBSD, “OpenBSD hackathons,” https://www.openbsd.org/hackatho ns.html, accessed: 2026-02-17

  3. [3]

    What are hackathons for?

    M. Komssiet al., “What are hackathons for?”IEEE software, vol. 32, no. 5, pp. 60–67, 2014

  4. [4]

    About MLH — the official student hackathon league,

    Major League Hacking, “About MLH — the official student hackathon league,” https://mlh.io, 2025, accessed: 2026-02-17

  5. [5]

    Inception: A hackday dream (the story of groupme),

    GroupMe, “Inception: A hackday dream (the story of groupme),” https: //techcrunch.com/2010/08/26/inception-a-hackday-dream-the-story-o f-groupme/, Aug. 2010, accessed: 2026-02-17

  6. [6]

    Startups and hackathons: An unlikely match,

    S. Berton, “Startups and hackathons: An unlikely match,” https://medi um.com/aft-pulse/startups-and-hackathons-an-unlikely-match-ded2396 a08e, Feb. 2023, accessed: 2026-02-17

  7. [7]

    Large language model (LLM) market worth $36.1 billion by 2030, growing at a CAGR of 33.2%,

    MarketsandMarkets, “Large language model (LLM) market worth $36.1 billion by 2030, growing at a CAGR of 33.2%,” https://www.globen ewswire.com/news-release/2024/04/09/2860128/0/en/Large-Languag e-Model-LLM-Market-worth-36-1-billion-by-2030-growing-at-a-CAG R-of-33-2-Report-by-MarketsandMarkets.html, Apr. 2024, accessed: 2026-02-17

  8. [8]

    A review of large language models in medical education, clinical decision support, and healthcare administration,

    J. Vrdoljaket al., “A review of large language models in medical education, clinical decision support, and healthcare administration,” in Healthcare, vol. 13, no. 6. MDPI, 2025, p. 603

  9. [9]

    & Katz, D

    M. Bommarito II and D. M. Katz, “Gpt takes the bar exam,”arXiv preprint arXiv:2212.14402, 2022

  10. [10]

    Over 25% of Google’s code is written by AI, Sundar Pichai says,

    K. Hays, “Over 25% of Google’s code is written by AI, Sundar Pichai says,” https://fortune.com/2024/10/30/googles-code-ai-sundar-pichai/, Oct. 2024, accessed: 2026-02-17

  11. [11]

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    C. E. Jimenezet al., “Swe-bench: Can language models resolve real- world github issues?”arXiv preprint arXiv:2310.06770, 2023

  12. [12]

    Karpathy’s x post,

    A. Karpathy, “Karpathy’s x post,” https://x.com/karpathy/status/188619 2184808149383, 2025, accessed: 2026-02-17

  13. [13]

    Introducing GitHub Copilot: Your AI pair programmer,

    GitHub, “Introducing GitHub Copilot: Your AI pair programmer,” https: //github.blog/news-insights/product-news/introducing-github-copilot-a i-pair-programmer/, Jun. 2021, accessed: 2026-02-17

  14. [14]

    Evaluating Large Language Models Trained on Code

    M. Chenet al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021

  15. [15]

    The Anthropic economic index report: Economic primitives,

    R. Appelet al., “The Anthropic economic index report: Economic primitives,” https://www-cdn.anthropic.com/096d94c1a91c6480806 d8f24b2344c7e2a4bc666.pdf, 2026, accessed: 2026-02-17

  16. [16]

    Grounded copilot: How programmers interact with code-generating models,

    S. Barkeet al., “Grounded copilot: How programmers interact with code-generating models,”Proceedings of the ACM on Programming Languages, vol. 7, no. OOPSLA1, pp. 85–111, 2023

  17. [17]

    The world’s smartest programmers compete: Acm icpc – communications of the acm,

    C. Kurtz, “The world’s smartest programmers compete: Acm icpc – communications of the acm,” https://cacm.acm.org/blogcacm/the-world s-smartest-programmers-compete-acm-icpc/, 2013, accessed: 2026-03- 02

  18. [18]

    Icpc regional rules for regionals 2025/26,

    ICPC Foundation, “Icpc regional rules for regionals 2025/26,” https: //icpc.global/regionals/rules, 2025, accessed: 2026-03-02

  19. [19]

    Competition rules,

    I. 2022, “Competition rules,” https://ioi2022.id/competition-rules/, 2022, accessed: 2026-03-02

  20. [20]

    Ioi 2025,

    E. Kalinicenko, “Ioi 2025,” https://stats.ioinformatics.org/olympiads/20 25, 2025, accessed: 2026-03-02

  21. [21]

    What to expect at a hackathon,

    J. Junod, “What to expect at a hackathon,” https://news.mlh.io/what-t o-expect-at-a-hackathon-01-06-2026, 2026, accessed: 2026-03-02

  22. [22]

    What is a hackathon?

    S. Kenney, “What is a hackathon?” https://www.uc.edu/news/articles/2 025/09/what-is-a-hackathon.html, 2025, accessed: 2026-03-02

  23. [23]

    What is a hackathon and why attend one?

    Coursera Staff, “What is a hackathon and why attend one?” https://www. coursera.org/articles/what-is-a-hackathon, 2025, accessed: 2026-03-02

  24. [24]

    Developing prize structures,

    Devpost, “Developing prize structures,” https://help.devpost.com/article /74-developing-prize-structures, 2024, accessed: 2026-03-02

  25. [25]

    Tech for social good hackathons,

    JPMorganChase, “Tech for social good hackathons,” https://www.jpmo rganchase.com/careers/explore-opportunities/programs/tfsg-hackathons, 2026, accessed: 2026-03-02

  26. [26]

    My amazon sde intern journey: From hackathon elimination to offer,

    V . Sehgal, “My amazon sde intern journey: From hackathon elimination to offer,” https://medium.com/@vanshsehgal2019/my-amazon-sde-inter n-journey-from-hackathon-elimination-to-offer-3e0e88197e89, 2025, accessed: 2026-03-02

  27. [27]

    What are ai applications?

    Google Cloud, “What are ai applications?” https://cloud.google.com/d iscover/ai-applications, 2026

  28. [28]

    How people use ChatGPT,

    A. Chatterjiet al., “How people use ChatGPT,” https://cdn.openai.com /pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatg pt-usage-paper.pdf, Sep. 2025, accessed: 2026-02-17

  29. [29]

    Ai is redefining the concept of a programming language’s popularity,

    S. Cass, “Ai is redefining the concept of a programming language’s popularity,” https://spectrum.ieee.org/top-programming-languages-202 5, 2025, accessed: 2026-02-23

  30. [30]

    What is sdlc?

    Amazon Web Services, Inc., “What is sdlc?” accessed: 2026-02-17. [Online]. Available: https://aws.amazon.com/what-is/sdlc/

  31. [31]

    Software development life cycle (sdlc),

    Geeks4Geeks, “Software development life cycle (sdlc),” https://www.ge eksforgeeks.org/software-engineering/software-development-life-cycle -sdlc/, 2025, accessed: 2026-03-02

  32. [32]

    The evolution of ai coding: From autocomplete to spec-driven development,

    J. V ogel, “The evolution of ai coding: From autocomplete to spec-driven development,” 2025. [Online]. Available: https://builder.aws.com/conten t/2tdglZDgalkRQ5DWl2E21lTinhB/the-evolution-of-ai-coding-from-a utocomplete-to-spec-driven-development

  33. [33]

    What is vibe coding?

    Google, “What is vibe coding?” https://cloud.google.com/discover/wh at-is-vibe-coding, 2025, accessed: 2026-02-17

  34. [34]

    Tabnine website,

    Tabnine, “Tabnine website,” https://www.tabnine.com/about/, 2026, accessed: 2026-02-17

  35. [35]

    Devin, the ai software engineer,

    Devin, “Devin, the ai software engineer,” https://devin.ai/, 2026, accessed: 2026-03-01

  36. [36]

    Cursor: The best way to code with ai,

    Cursor, “Cursor: The best way to code with ai,” https://cursor.com/, 2026, accessed: 2026-03-01

  37. [37]

    Unleashing the potential of prompt engineering for large language models,

    B. Chenet al., “Unleashing the potential of prompt engineering for large language models,”Patterns, vol. 6, no. 6, p. 101260, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S2666389925001084

  38. [38]

    Meta prompting for ai systems

    Y . Zhanget al., “Meta prompting for ai systems,” 2025. [Online]. Available: https://arxiv.org/abs/2311.11482

  39. [39]

    Enhance your prompts with meta prompting,

    T. Musatoiu, “Enhance your prompts with meta prompting,” https://de velopers.openai.com/cookbook/examples/enhance your prompts with meta prompting/, 2024, accessed: 2026-03-02. 15

  40. [40]

    Prompt engineering,

    OpenAI Developers, “Prompt engineering,” https://developers.openai.co m/api/docs/guides/prompt-engineering/#include-relevant-context-infor mation, 2026, accessed: 2026-03-02

  41. [41]

    About github copilot coding agent,

    GitHub Docs, “About github copilot coding agent,” https://docs.github. com/en/copilot/concepts/agents/coding-agent/about-coding-agent#overv iew-of-copilot-coding-agent, 2026, accessed: 3-9-2026

  42. [42]

    The 8 best vibe coding tools,

    M. Rebelo, “The 8 best vibe coding tools,” https://zapier.com/blog/best -vibe-coding-tools/, 2025, accessed: 2026-03-16

  43. [43]

    Discord hackathon server template,

    campus-experts, “Discord hackathon server template,” https://github.c om/campus-experts/discord-hackathon-template?tab=readme-ov-file, 2021, accessed: 2026-03-02

  44. [44]

    Vibe coding turned this swedish ai unicorn into the fastest growing software startup ever,

    I. Martin, “Vibe coding turned this swedish ai unicorn into the fastest growing software startup ever,” https://www.forbes.com.au/news/entre preneurs/vibe-coding-turned-this-swedish-ai-unicorn-into-the-fastest-g rowing-software-startup-ever/, 2025, accessed: 2026-03-01