Scalable and Personalized Oral Assessments Using Voice AI

· 2026 · cs.CY · arXiv 2603.18221

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Students in our AI/ML course submitted polished, well-argued project analyses. Then, in class discussion, we asked them to walk through a single choice from their own work. Many could not. The writing looked great. The understanding often wasn't. Oral examinations retain an evidentiary link where written work no longer does: a student who can reason aloud, defend a decision under follow-up, and adapt when pushed demonstrates something no submitted document can certify. The obstacle has always been cost. A 25-minute oral reviewed by two graders takes roughly 30 combined instructor and TA hours for 36 students; at 100 the format is untenable. Voice AI and automated grading change the arithmetic. We built Viva, a system that conducts a personalized oral exam, then grades the transcript with a panel of three LLMs that score independently, read each other's assessments, and revise. Across two undergraduate cohorts at NYU Stern (36 students in Fall 2025, 37 in Spring 2026), grading-LLM cost stayed under one dollar per exam within the ElevenLabs subscription covering our voice minutes; for deployments exceeding an equivalent credit pool, budget about a dollar per ten minutes of graded exam time, practical for weekly assignments, not just finals. The system also broke instructively: the agent asked several questions at once, failed to randomize topics across the cohort, and a voice cloned from the professor's came across as harsh, replaced in Spring 2026 with a calm preset. These failures, with an earlier finding that a monolithic agent handling both examination and grading proved unreliable, point to five candidate transferable patterns: decompose into single-purpose modules, constrain behavior with code rather than prompts, keep randomization out of the LLM, grade with a multi-model panel whose members disagree, and choose voice characteristics with the same care as question design.

representative citing papers

Reshaping Undergraduate Computer Science Education in the Generative AI Era

cs.CY · 2026-05-02 · unverdicted · novelty 3.0

The paper recommends shifting CS education from implementation skills to AI-native competencies, fundamental concepts, and critical evaluation based on workshop consensus.

citing papers explorer

Showing 1 of 1 citing paper.

Reshaping Undergraduate Computer Science Education in the Generative AI Era cs.CY · 2026-05-02 · unverdicted · none · ref 20 · internal anchor
The paper recommends shifting CS education from implementation skills to AI-native competencies, fundamental concepts, and critical evaluation based on workshop consensus.

Scalable and Personalized Oral Assessments Using Voice AI

fields

years

verdicts

representative citing papers

citing papers explorer