Do vision models perceive illusory motion in static images like humans?
Pith reviewed 2026-05-10 17:48 UTC · model grok-4.3
The pith
Most optical flow models fail to perceive the Rotating Snakes illusion as humans do, but a human-inspired Dual-Channel model succeeds under simulated saccadic eye movements.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Representative optical flow models mostly fail to produce flow fields consistent with human perception of the Rotating Snakes illusion in static images; under simulated saccadic eye movements, only the human-inspired Dual-Channel model exhibits the expected rotational motion, with closest correspondence during the saccade simulation itself.
What carries the argument
Comparison of model-generated optical flow fields to human illusory motion on the Rotating Snakes illusion, using a simulated saccade condition and ablation of the Dual-Channel model's luminance, color-feature, and recurrent-attention components.
Load-bearing premise
That optical flow fields output by the models can be directly equated to the motion signals underlying human perception of the illusion, and that the saccade simulation faithfully recreates the eye-movement conditions that trigger the effect in people.
What would settle it
Running the Dual-Channel model on the Rotating Snakes image with a different or more physiologically accurate eye-movement simulation and finding no rotational flow, or finding that other standard models produce matching flow without any saccade simulation.
Figures
read the original abstract
Understanding human motion processing is essential for building reliable, human-centered computer vision systems. Although deep neural networks (DNNs) achieve strong performance in optical flow estimation, they remain less robust than humans and rely on fundamentally different computational strategies. Visual motion illusions provide a powerful probe into these mechanisms, revealing how human and machine vision align or diverge. While recent DNN-based motion models can reproduce dynamic illusions such as reverse-phi, it remains unclear whether they can perceive illusory motion in static images, exemplified by the Rotating Snakes illusion. We evaluate several representative optical flow models on Rotating Snakes and show that most fail to generate flow fields consistent with human perception. Under simulated conditions mimicking saccadic eye movements, only the human-inspired Dual-Channel model exhibits the expected rotational motion, with the closest correspondence emerging during the saccade simulation. Ablation analyses further reveal that both luminance-based and higher-order color--feature--based motion signals contribute to this behavior and that a recurrent attention mechanism is critical for integrating local cues. Our results highlight a substantial gap between current optical-flow models and human visual motion processing, and offer insights for developing future motion-estimation systems with improved correspondence to human perception and human-centric AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that while most optical flow models fail to generate flow fields consistent with human perception of the Rotating Snakes illusion in static images, the human-inspired Dual-Channel model succeeds under simulated saccadic eye movements, showing the expected rotational motion particularly during the simulation. Ablation studies indicate that both luminance-based and higher-order color-feature-based motion signals, along with a recurrent attention mechanism, are important for this behavior. The work suggests a gap between current models and human visual motion processing.
Significance. If the central results hold, the paper provides evidence of a substantial difference in how current DNN optical flow models and human vision handle static motion illusions, with the Dual-Channel model offering a closer alignment. The inclusion of ablation analyses to identify key components (luminance, color features, recurrent attention) is a strength that offers concrete insights for improving future motion estimation systems to better match human perception. This could advance the development of human-centric AI in computer vision.
major comments (3)
- [§4 (Results on Rotating Snakes and Saccade Simulation)] The claim that the Dual-Channel model exhibits the expected rotational motion with closest correspondence during the saccade simulation (Abstract and Results) is not supported by any quantitative metric correlating the generated flow fields with human perceptual reports (e.g., no reported Pearson correlation or agreement rate with direction judgments). This is load-bearing for the assertion that it outperforms other models.
- [§3.2 (Saccade Simulation Protocol)] The simulation of saccadic eye movements lacks an explicit validation step comparing the simulated displacements to recorded human saccade statistics on the Rotating Snakes stimulus (Methods). Without this check, the protocol may not accurately replicate the eye-movement conditions that trigger the illusion, weakening the interpretation of the results.
- [§5 (Ablation Analyses)] The ablation results on luminance-based vs. color-feature-based signals and the recurrent attention mechanism are described qualitatively but without specific quantitative impacts on flow consistency metrics before and after ablation (Ablation Analyses), making it difficult to assess their individual contributions to the model's behavior.
minor comments (2)
- [Abstract] The abstract refers to 'several representative optical flow models' without listing them explicitly, which would improve clarity on the breadth of the evaluation.
- [Figures] The visualizations of flow fields could include quantitative annotations, such as average flow magnitude or direction histograms, to facilitate direct comparison across models and conditions.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify how to strengthen the presentation of our results. We address each major comment point by point below.
read point-by-point responses
-
Referee: [§4 (Results on Rotating Snakes and Saccade Simulation)] The claim that the Dual-Channel model exhibits the expected rotational motion with closest correspondence during the saccade simulation (Abstract and Results) is not supported by any quantitative metric correlating the generated flow fields with human perceptual reports (e.g., no reported Pearson correlation or agreement rate with direction judgments). This is load-bearing for the assertion that it outperforms other models.
Authors: We acknowledge that explicit quantitative correlation with human reports would provide stronger support. The original manuscript presented the comparison through direct visualization of flow fields, where only the Dual-Channel model produces coherent rotational flow matching the well-documented human perception of the illusion (clockwise or counterclockwise rotation depending on the stimulus variant), while other models produce near-zero or incoherent motion. To address the concern, we will add a quantitative directional agreement metric in the revised Results section: the fraction of flow vectors whose direction aligns with the expected illusory rotation, computed over the stimulus region. This will be reported for all models during the saccade simulation phase, allowing direct numerical comparison. revision: yes
-
Referee: [§3.2 (Saccade Simulation Protocol)] The simulation of saccadic eye movements lacks an explicit validation step comparing the simulated displacements to recorded human saccade statistics on the Rotating Snakes stimulus (Methods). Without this check, the protocol may not accurately replicate the eye-movement conditions that trigger the illusion, weakening the interpretation of the results.
Authors: We appreciate this observation. The protocol adopts standard saccade parameters (amplitude range 1–5°, velocity profiles, and inter-saccade intervals) drawn from the human eye-movement literature on natural scene viewing. We did not, however, include a direct empirical validation against eye-tracking recordings collected specifically on Rotating Snakes images. In the revision we will expand the Methods section to cite the precise human saccade statistics used, discuss their relevance to conditions known to elicit the illusion, and explicitly note the absence of stimulus-specific validation as a limitation that would require new eye-tracking data. revision: partial
-
Referee: [§5 (Ablation Analyses)] The ablation results on luminance-based vs. color-feature-based signals and the recurrent attention mechanism are described qualitatively but without specific quantitative impacts on flow consistency metrics before and after ablation (Ablation Analyses), making it difficult to assess their individual contributions to the model's behavior.
Authors: We agree that quantitative before-and-after metrics would make the ablation results more precise. The current manuscript supports the ablations with comparative flow-field visualizations showing the loss or retention of rotational motion. For the revised version we will augment the Ablation Analyses section with numerical measures, including (i) mean flow magnitude in the expected rotational direction and (ii) a directional consistency score (average cosine similarity of flow vectors to the illusory rotation field) computed on the full stimulus before and after each ablation. These values will be tabulated for the luminance channel, color-feature channel, and recurrent attention components. revision: yes
Circularity Check
Empirical model evaluation on illusory motion shows no circular derivation chain
full rationale
The paper performs direct empirical comparisons: it runs several optical flow models on static Rotating Snakes images, applies a saccade simulation, and checks which outputs produce flow fields matching human perceptual reports of rotation. No mathematical derivation, equation, or parameter-fitting step is presented that reduces to its own inputs by construction. Ablation analyses are likewise direct removals of components followed by re-testing. The Dual-Channel model is labeled human-inspired, but its superiority is established by the observed outputs rather than by any self-referential definition or load-bearing self-citation that would make the result tautological. This is a standard model-comparison study whose central claim rests on external data (model computations and human reports) rather than on any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Humans perceive rotational motion in the Rotating Snakes static image
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking (D=3 forces 8-tick via 2^D) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
The first-order channel (E1) applies spatiotemporally separable Gabor filters across 8 spatial scales to grayscale input, producing 256-dimensional motion-energy maps
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Stephan Allgeier, Fabian Anzlinger, Sebastian Bohn, Ralf Mikut, Oliver Neumann, Klaus-Martin Reichert, Oliver Stachs, and Karsten Sperlich. An open-source software for the simulation of fixational eye movements.Current Directions in Biomedical Engineering, 10(4):25–28, 2024. 10
work page 2024
-
[2]
Hiroshi Ashida, Ichiro Kuriki, Ikuya Murakami, Rumi Hisakata, and Akiyoshi Kitaoka. Direction-specific fmri adaptation reveals the visual cortical network underlying the “rotating snakes” illusion.Neuroimage, 61(4):1143–1152, 2012. 2
work page 2012
-
[3]
Lea Atala-G ´erard and Michael Bach. Rotating Snakes Illusion—Quantitative Analysis Reveals a Region in Luminance Space With Opposite Illusory Rotation.i-Perception, 8(1):2041669517691779, 2017. 2, 4, 5, 6
work page 2017
-
[4]
Michael Bach and Lea Atala-G ´erard. The rotating snakes illusion is a straightforward consequence of nonlinearity in arrays of standard motion detectors. 11(5):2041669520958025, 2020. 2, 9
work page 2020
-
[5]
Benjamin T. Backus and Ipek Oruc ¸. Illusory motion from change over time in the response to contrast and luminance.Journal of Vision, 5(11):10, 2005. 9
work page 2005
-
[6]
Simon Baker, Daniel Scharstein, James P Lewis, Stefan Roth, Michael J Black, and Richard Szeliski. A database and evaluation methodology for optical flow.International journal of computer vision, 92(1):1–31, 2011. 22
work page 2011
-
[7]
Spatial remapping of the visual world across saccades.NeuroReport, 18(12):1207–1213, 2007
Paul M Bays and Masud Husain. Spatial remapping of the visual world across saccades.NeuroReport, 18(12):1207–1213, 2007. 10
work page 2007
-
[8]
Motion estimation for large displacements and deformations.Scientific Reports, 12(1):19721,
Qiao Chen and Charalambos Poullis. Motion estimation for large displacements and deformations.Scientific Reports, 12(1):19721,
-
[9]
Neural basis for a powerful static motion illusion.Journal of Neuroscience, 25(23):5651–5656, 2005
Bevil R Conway, A3kiyoshi Kitaoka, Arash Yazdanbakhsh, Christopher C Pack, and Margaret S Livingstone. Neural basis for a powerful static motion illusion.Journal of Neuroscience, 25(23):5651–5656, 2005. 2, 9
work page 2005
-
[10]
Human microsaccade-related visual brain responses.Journal of Neuroscience, 29(39):12321–12331, 2009
Olaf Dimigen, Matteo Valsecchi, Werner Sommer, and Reinhold Kliegl. Human microsaccade-related visual brain responses.Journal of Neuroscience, 29(39):12321–12331, 2009. 4
work page 2009
-
[11]
Flownet: Learning optical flow with convolutional networks
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning optical flow with convolutional networks. InProceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015. 1
work page 2015
-
[12]
Ralf Engbert, Konstantin Mergenthaler, Peter Sinn, and Carl J. J. Herrmann. An integrated model of fixational eye movements and microsaccades.Proceedings of the National Academy of Sciences, 108(5):E161–E168, 2011. 10
work page 2011
-
[13]
Jocelyn Faubert and Andrew M Herbert. The peripheral drift illusion: A motion illusion in the visual periphery.Perception, 28(5): 617–621, 1999. 2, 9, 10
work page 1999
-
[14]
Illusory motion due to causal time filtering.Vision Research, 50(3):315–329,
Cornelia Ferm ¨uller, Hui Ji, and Akiyoshi Kitaoka. Illusory motion due to causal time filtering.Vision Research, 50(3):315–329,
-
[15]
Microsaccade-inspired event camera for robotics.Science Robotics, 9(76):eadj8124, 2024
Botao He, Ze Wang, Yuan Zhou, Jingxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao, Chao Xu, Yiannis Aloimonos, Fei Gao, and Cornelia Ferm¨uller. Microsaccade-inspired event camera for robotics.Science Robotics, 9(76):eadj8124, 2024. 8
work page 2024
-
[16]
Trevor Hine, Michael Cook, and Garry T Rogers. The ouchi illusion: An anomaly in the perception of rigid motion for limited spatial frequencies and angles.Perception & Psychophysics, 59(3):448–455, 1997. 3, 9
work page 1997
-
[17]
Spatial scaling of illusory motion perceived in a static figure.Journal of Vision, 18(13):15–15,
Rumi Hisakata and Ikuya Murakami. Spatial scaling of illusory motion perceived in a static figure.Journal of Vision, 18(13):15–15,
-
[18]
Tak-Wai Hui, Xiaoou Tang, and Chen Change Loy. A lightweight optical flow cnn – revisiting data fidelity and regularization.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 2555–2569, 2020. 3, 13
work page 2020
-
[19]
Ccmr: High resolution optical flow estimation via coarse-to-fine context-guided motion reasoning
Azin Jahedi, Maximilian Luz, Marc Rivinius, and Andr ´es Bruhn. Ccmr: High resolution optical flow estimation via coarse-to-fine context-guided motion reasoning. InIEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 6885–6894. IEEE, 2024. 3, 13
work page 2024
-
[20]
OR Kirubeswaran and Katherine R Storrs. Inconsistent illusory motion in predictive coding deep neural networks.Vision Research, 206:108195, 2023. 1
work page 2023
-
[21]
Akiyoshi Kitaoka. Color-Dependent Motion Illusions in Stationary Images and Their Phenomenal Dimorphism.Perception, 43(9): 914–925, 2014. 2
work page 2014
-
[22]
Classify illusions.https://www.psy.ritsumei.ac.jp/akitaoka/classify.html, 2025
Akiyoshi Kitaoka. Classify illusions.https://www.psy.ritsumei.ac.jp/akitaoka/classify.html, 2025. Ac- cessed: 2025-11-14. 2, 10
work page 2025
-
[23]
Phenomenal characteristics of the peripheral drift illusion.Vision, 15(4):261–262, 2003
Akiyoshi Kitaoka and Hiroshi Ashida. Phenomenal characteristics of the peripheral drift illusion.Vision, 15(4):261–262, 2003. 1, 2
work page 2003
-
[24]
Akiyoshi Kitaoka and Hiroshi Ashida. A new anomalous motion illusion: the “central drift illusion”.https://www.psy. ritsumei.ac.jp/akitaoka/VSJ04w.html, 2004. Accessed: 2025-11-14. 3
work page 2004
-
[25]
Taisuke Kobayashi, Akiyoshi Kitaoka, Manabu Kosaka, Kenta Tanaka, and Eiji Watanabe. Motion illusion-like patterns extracted from photo and art images using predictive deep neural networks.Scientific Reports, 12(1):3893, 2022. 1
work page 2022
-
[26]
Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
William Lotter, Gabriel Kreiman, and David Cox. Deep predictive coding networks for video prediction and unsupervised learning. arXiv preprint arXiv:1605.08104, 2016. 1
work page Pith review arXiv 2016
-
[27]
Flowdiffuser: Advancing optical flow estimation with diffusion models
Ao Luo, Xin Li, Fan Yang, Jiangyu Liu, Haoqiang Fan, and Shuaicheng Liu. Flowdiffuser: Advancing optical flow estimation with diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19167–19176, 2024. 3, 13
work page 2024
-
[28]
Integration biases in the ouchi and other visual illusions.Perception, 29(6):721–727, 2000
George Mather. Integration biases in the ouchi and other visual illusions.Perception, 29(6):721–727, 2000. 9
work page 2000
-
[29]
Pupil dilation underlies the peripheral drift illusion.Journal of Vision, 25(2):13–13, 2025
George Mather and Patrick Cavanagh. Pupil dilation underlies the peripheral drift illusion.Journal of Vision, 25(2):13–13, 2025. 2
work page 2025
-
[30]
Patrick Mineault, Shahab Bakhtiari, Blake A. Richards, and Christopher C. Pack. Your head is there to move you around: Goal-driven models of the primate dorsal pathway. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 3, 13
work page 2021
-
[31]
Ikuya Murakami, Akiyoshi Kitaoka, and Hiroshi Ashida. A positive correlation between fixation instability and the strength of illusory motion in a static display.Vision Research, 46(15):2421–2431, 2006. 2
work page 2006
-
[32]
Merlin A. Nau, Stefan B. Ploner, Eric M. Moult, James G. Fujimoto, and Andreas K. Maier. Open source simulation of fixational eye drift motion in oct scans: Towards better comparability and accuracy in retrospective oct motion correction. InBildverarbeitung f ¨ur die Medizin 2020. Informatik aktuell, pages 254–259, 2020. 10
work page 2020
-
[33]
Jorge Otero-Millan, Xoana G Troncoso, Stephen L Macknik, Ignacio Serrano-Pedraza, and Susana Martinez-Conde. Saccades and microsaccades during visual fixation, exploration, and search: foundations for a common saccadic generator.Journal of vision, 8 (14):21–21, 2008. 3
work page 2008
-
[34]
Jorge Otero-Millan, Stephen L. Macknik, and Susana Martinez-Conde. Microsaccades and blinks trigger illusory rotation in the “rotating snakes” illusion.Journal of Neuroscience, 32(17):6043–6051, 2012. 1, 2, 4
work page 2012
-
[35]
Anurag Ranjan, Joel Janai, Andreas Geiger, and Michael J Black. Attacking optical flow. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 2404–2413, 2019. 1
work page 2019
-
[36]
Unsupervised deep learning for optical flow estimation
Zhe Ren, Junchi Yan, Bingbing Ni, Bin Liu, Xiaokang Yang, and Hongyuan Zha. Unsupervised deep learning for optical flow estimation. InProceedings of the AAAI conference on artificial intelligence, 2017. 1
work page 2017
-
[37]
Medathati, and Pierre Kornprobst
Fabio Solari, Manuela Chessa, Narasimhan V . Medathati, and Pierre Kornprobst. What can we expect from a v1–mt feedforward architecture for optical flow estimation?Signal Processing: Image Communication, 39:257–268, 2015. 3, 13
work page 2015
-
[38]
The ¯Ouchi–spillmann illusion revisited.Perception, 42(4):413–429, 2013
Lothar Spillmann. The ¯Ouchi–spillmann illusion revisited.Perception, 42(4):413–429, 2013. 9
work page 2013
-
[39]
Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8934–8943, 2018. 3, 13
work page 2018
-
[40]
Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, and Shin’ya Nishida. Modeling human visual motion processing with trainable motion energy sensing and a self-attention network. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 1, 3, 13
work page 2023
-
[41]
Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, and Shin’ya Nishida. Acquisition of second-order motion perception by learning to recognize the motion of objects made by non-diffusive materials.Journal of Vision, 24(10):374–374, 2024. 13
work page 2024
-
[42]
Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, Yuan Li, and Shin’ya Nishida. Machine learning modelling for multi-order human visual motion processing.Nature Machine Intelligence, pages 1–16, 2025. 1, 3, 7, 9, 13
work page 2025
-
[43]
Matthias Tangemann, Matthias K ¨ummerer, and Matthias Bethge. Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli.Advances in Neural Information Processing Systems, 37:137135– 137160, 2024. 1, 9
work page 2024
-
[44]
Raft: Recurrent all-pairs field transforms for optical flow
Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InEuropean Conference on Computer Vision (ECCV), pages 402–419. Springer, 2020. 3, 13
work page 2020
-
[45]
Maiko Uesaki, Arnab Biswas, Hiroshi Ashida, and Gerrit Maus. Blue-yellow combination enhances perceived motion in rotating snakes illusion.i-Perception, 15(2):20416695241242346, 2024. 2 11
work page 2024
-
[46]
Eiji Watanabe, Akiyoshi Kitaoka, Kiwako Sakamoto, Masaki Yasugi, and Kenta Tanaka. Illusory motion reproduced by deep neural networks trained for prediction.Frontiers in psychology, 9:340023, 2018. 1
work page 2018
-
[47]
Motion illusions as optimal percepts.Nature neuroscience, 5(6):598–604,
Yair Weiss, Eero P Simoncelli, and Edward H Adelson. Motion illusions as optimal percepts.Nature neuroscience, 5(6):598–604,
-
[48]
Eric G. Wu, Andra M. Rudzite, Martin O. Bohlen, Peter H. Li, Alexandra Kling, Sam Cooler, Colleen Rhoades, Nora Brackbill, Alex R. Gogliettino, Nishal P. Shah, Sasidhar S. Madugula, Alexander Sher, Alan M. Litke, Greg D. Field, and E. J. Chichilnisky. Fixational eye movements enhance the precision of visual information transmitted by the primate retina.Na...
work page 2024
-
[49]
Jianyi Yang, Junyi Ye, Ankan Dash, and Guiling Wang. Illusions in humans and ai: How visual perception aligns and diverges.arXiv preprint arXiv:2508.12422, 2025. 1
-
[50]
Yung-Hao Yang, Taiki Fukiage, Zitang Sun, and Shin’ya Nishida. Psychophysical measurement of perceived motion flow of natural- istic scenes.Iscience, 26(12), 2023. 1 12 Appendix 1. Details on model parameters and architectures We evaluated ten motion-estimation models (Table S1): PWC-Net.A convolutional neural network (CNN) for optical flow estimation thr...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.