How do people watch AI-generated videos of physical scenes?
Pith reviewed 2026-05-16 07:50 UTC · model grok-4.3
The pith
Gaze when watching videos of physical scenes tracks perceived authenticity more than actual real or AI origin.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that participants' gaze behavior in both the video-understanding and AI-detection tasks was driven primarily by their inferred perception of authenticity rather than by whether the video was actually real or AI-generated. Because the selected AI videos were realistic enough to avoid obvious low-level artifacts, cognitive expectations about possible generation overrode objective differences in directing fixations and scan paths.
What carries the argument
Eye-tracking recordings of fixation locations and scan paths compared across real and AI-generated videos in two tasks, with perception of authenticity inferred from task performance accuracy.
If this is right
- Viewers actively search for anomalies once they suspect a video may be AI-generated.
- Gaze patterns could serve as an implicit behavioral signal for perceived authenticity in future detection systems.
- Video-generation models must remove not only low-level artifacts but also higher-level perceptual cues that trigger suspicion.
- Media platforms that label possible AI content may increase user scrutiny and reduce passive acceptance.
Where Pith is reading between the lines
- Platform warnings that a video might be synthetic could permanently heighten baseline vigilance even in later unlabeled viewing.
- The same perception-driven attention shift might appear in still images or audio, suggesting a general mechanism across synthetic media.
- Training people on AI detection could alter their default gaze behavior on all videos, not only during explicit detection tasks.
Load-bearing premise
Participants' beliefs about whether a video is AI-generated can be accurately inferred from how well they perform the understanding and detection tasks.
What would settle it
Eye-tracking data showing that gaze patterns differ systematically between actually real and actually AI-generated videos even when task performance indicates the viewer perceives both categories the same way.
read the original abstract
The growing prevalence of realistic AI-generated videos on media platforms increasingly blurs the line between fact and fiction, eroding public trust. Understanding how people watch AI-generated videos offers a human-centered perspective for improving AI detection and guiding advancements in video generation. However, existing studies have not investigated human gaze behavior in response to AI-generated videos of physical scenes. Here, we collect and analyze the eye movements from 40 participants during video understanding and AI detection tasks involving a mix of real-world and AI-generated videos. We find that given the high realism of AI-generated videos, gaze behavior is driven less by the video's actual authenticity and more by the viewer's perception of its authenticity. Our results demonstrate that the mere awareness of potential AI generation may alter media consumption from passive viewing into an active search for anomalies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports an eye-tracking experiment with 40 participants who viewed a mix of real-world and AI-generated videos of physical scenes while performing video-understanding and AI-detection tasks. The central claim is that, given the high realism of current AI videos, gaze patterns are driven primarily by viewers' perception of authenticity rather than by actual authenticity, and that mere awareness of possible AI generation shifts viewing from passive consumption to active anomaly search.
Significance. If the result holds after methodological clarification, the work supplies timely empirical evidence on human attention to realistic synthetic media, with direct relevance to media literacy, AI detection interfaces, and video-generation research. The use of objective gaze data rather than self-report alone is a positive feature.
major comments (3)
- [Abstract] Abstract: the claim that gaze follows perceived rather than actual authenticity requires showing that gaze patterns align with subjective judgments even on trials where those judgments are incorrect; no such analysis or independent validation of perception (separate from the binary detection response) is described.
- [Methods] Experimental design: the two-task protocol (understanding vs. detection) lacks a passive-viewing baseline without detection instructions or post-viewing realism ratings, so any gaze shift could be produced by the anomaly-search instruction itself rather than by spontaneous perception of authenticity.
- [Results] Results: without reported statistical details, participant demographics, video-selection criteria, error bars, or the precise analysis linking gaze metrics to perceived vs. actual authenticity, it is impossible to assess whether the data support the central claim.
minor comments (1)
- [Abstract] Abstract should include a brief statement of key statistical outcomes and sample characteristics to allow readers to evaluate the strength of the reported effects.
Simulated Author's Rebuttal
We appreciate the referee's detailed feedback on our manuscript. We address each major comment below and will make revisions to clarify and strengthen the presentation of our findings.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that gaze follows perceived rather than actual authenticity requires showing that gaze patterns align with subjective judgments even on trials where those judgments are incorrect; no such analysis or independent validation of perception (separate from the binary detection response) is described.
Authors: We agree that explicitly demonstrating the alignment on incorrect trials would bolster the claim. Our results section does compare gaze behavior between accurate and inaccurate AI detections, showing differences consistent with perception-driven viewing. To address this directly, we will add a new analysis in the revised manuscript that examines gaze metrics specifically on trials where participants' judgments were incorrect, correlating them with their perceived authenticity. revision: yes
-
Referee: [Methods] Experimental design: the two-task protocol (understanding vs. detection) lacks a passive-viewing baseline without detection instructions or post-viewing realism ratings, so any gaze shift could be produced by the anomaly-search instruction itself rather than by spontaneous perception of authenticity.
Authors: The video-understanding task was designed to serve as a baseline for natural viewing without explicit AI-detection instructions. However, we recognize the value of a purely passive condition and post-viewing realism ratings. We will revise the methods to include post-viewing realism ratings collected after each video and discuss the understanding task as approximating passive viewing. We will also add a limitations section addressing the absence of a no-task baseline. revision: partial
-
Referee: [Results] Results: without reported statistical details, participant demographics, video-selection criteria, error bars, or the precise analysis linking gaze metrics to perceived vs. actual authenticity, it is impossible to assess whether the data support the central claim.
Authors: We apologize for the omission of these details in the initial submission. The revised manuscript will include: full statistical tests with p-values and effect sizes, participant demographics table, detailed video selection criteria (including how AI videos were generated and matched to real ones), error bars on all figures, and an expanded methods/results section describing the exact gaze metrics (e.g., fixation duration, saccade patterns) and how they were linked to perceived authenticity via regression or correlation analyses. revision: yes
Circularity Check
No circularity: purely empirical behavioral study
full rationale
The paper collects and analyzes eye-tracking data from 40 participants performing video understanding and AI detection tasks on real and AI-generated videos. No equations, parameters, derivations, or theoretical models are present that could reduce to self-definition or fitted inputs. Claims about gaze being driven by perceived authenticity rest on direct measurements of task performance and gaze patterns, not on any chain that equates outputs to inputs by construction. Self-citations, if present, are not load-bearing for any derivation since none exists.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions of eye-tracking data analysis and statistical inference hold for the collected gaze data.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.