pith. machine review for the scientific record. sign in

arxiv: 2604.09853 · v2 · submitted 2026-04-10 · 💻 cs.CV

Recognition: 1 theorem link

· Lean Theorem

Do vision models perceive illusory motion in static images like humans?

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords optical flowRotating Snakesmotion illusionsdeep neural networkssaccadic eye movementshuman visioncomputer vision
0
0 comments X

The pith

Most optical flow models fail to perceive the Rotating Snakes illusion as humans do, but a human-inspired Dual-Channel model succeeds under simulated saccadic eye movements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether deep neural networks for optical flow estimation can detect illusory motion in a static image known as the Rotating Snakes, which appears to rotate to human observers. Most tested models produce flow fields that do not match human reports of the illusion. Only the Dual-Channel model, which draws from human visual processing, generates the expected rotational flow, and this occurs most clearly when the input includes simulated saccadic eye movements. Ablations show that both luminance signals and higher-order color features contribute, with recurrent attention needed to combine local cues. The results point to a persistent difference between current machine motion estimators and human vision.

Core claim

Representative optical flow models mostly fail to produce flow fields consistent with human perception of the Rotating Snakes illusion in static images; under simulated saccadic eye movements, only the human-inspired Dual-Channel model exhibits the expected rotational motion, with closest correspondence during the saccade simulation itself.

What carries the argument

Comparison of model-generated optical flow fields to human illusory motion on the Rotating Snakes illusion, using a simulated saccade condition and ablation of the Dual-Channel model's luminance, color-feature, and recurrent-attention components.

Load-bearing premise

That optical flow fields output by the models can be directly equated to the motion signals underlying human perception of the illusion, and that the saccade simulation faithfully recreates the eye-movement conditions that trigger the effect in people.

What would settle it

Running the Dual-Channel model on the Rotating Snakes image with a different or more physiologically accurate eye-movement simulation and finding no rotational flow, or finding that other standard models produce matching flow without any saccade simulation.

Figures

Figures reproduced from arXiv: 2604.09853 by (2) Kyoto University), Fan L. Cheng (1), Isabella Elaine Rosario (1), Nikolaus Kriegeskorte (1) ((1) Columbia University, Zitang Sun (2).

Figure 1
Figure 1. Figure 1: Rotating Snakes illusory and control images. Top row: illusory images in grayscale (G), blue–yellow (B-Y), and red–green (R-G) color schemes, each producing a robust perception of counterclockwise rotation in human observers. Bottom row: corresponding control images created by permuting the luminance or color sequence within each repeating unit, which fails to induce perception of motion while preserving g… view at source ↗
Figure 2
Figure 2. Figure 2: Schematic illustrations of three model architecture categories. (a) Multi-scale architectures estimate motion using pyramidal feature hierarchies. (b) Recurrent-decoding architectures iteratively refine optical flow over multiple update steps. (c) Bio-inspired archi￾tectures incorporate motion-energy units, or integration mechanisms motivated by primate visual motion pathways. 3 [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of normalized model-predicted optical flow for grayscale stimuli across central simulated viewing conditions. Static (top row), onset (second row), and microsaccadic 30-pixel shift (third row). Colored dots correspond to model architectures listed in Appendix 1. The fourth row shows model responses to rotated control stimuli, allowing qualitative comparison with veridical motion. Under static… view at source ↗
Figure 4
Figure 4. Figure 4: Correlation between model-predicted and expected human illusory percepts. Color types: grayscale (G), blue–yellow (B–Y), and red–green (R–G). (a) Correlations are shown for illusion (left) and control (right) stimuli under viewing conditions: static, onset, microsaccade, and saccade (all shift magnitudes included). Among all models, the Dual model exhibits the strongest alignment with human illusory percep… view at source ↗
Figure 5
Figure 5. Figure 5: Ablation analysis of the Dual model to identify the critical components for reproducing human illusory motion percepts. (a) Schematic of the Dual architecture showing the first-order (E1) and higher-order (E2) motion pathways, their fusion into Em, and subsequent recurrent integration (Stage II). (b) Probing analysis. Optical flow predictions for the grayscale illusion, control, and veridical rotation stim… view at source ↗
Figure 6
Figure 6. Figure 6: Spatial response profiles of two exemplar E1 units sensitive to rotational motion. Activations are normalized per unit to highlight spatial structure. For each unit (rows), spatial activation maps are shown for the illusion, control, and veridical rotation conditions. The preferred motion direction (θ) of each unit is indicated by the black arrow overlaid on each map. Observations [PITH_FULL_IMAGE:figures… view at source ↗
Figure 7
Figure 7. Figure 7: Optical flow predictions for other anomalous-motion illusion stimuli. (a) Peripheral drift illusion: observers typically experience smooth clockwise rotation following a blink or small eye movement. (b) Central drift illusion: the white elements appear to rotate slowly in a clockwise direction. (c) Ouchi illusion: eye movements induce a relative sliding motion between the center and surround. Representativ… view at source ↗
read the original abstract

Understanding human motion processing is essential for building reliable, human-centered computer vision systems. Although deep neural networks (DNNs) achieve strong performance in optical flow estimation, they remain less robust than humans and rely on fundamentally different computational strategies. Visual motion illusions provide a powerful probe into these mechanisms, revealing how human and machine vision align or diverge. While recent DNN-based motion models can reproduce dynamic illusions such as reverse-phi, it remains unclear whether they can perceive illusory motion in static images, exemplified by the Rotating Snakes illusion. We evaluate several representative optical flow models on Rotating Snakes and show that most fail to generate flow fields consistent with human perception. Under simulated conditions mimicking saccadic eye movements, only the human-inspired Dual-Channel model exhibits the expected rotational motion, with the closest correspondence emerging during the saccade simulation. Ablation analyses further reveal that both luminance-based and higher-order color--feature--based motion signals contribute to this behavior and that a recurrent attention mechanism is critical for integrating local cues. Our results highlight a substantial gap between current optical-flow models and human visual motion processing, and offer insights for developing future motion-estimation systems with improved correspondence to human perception and human-centric AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that while most optical flow models fail to generate flow fields consistent with human perception of the Rotating Snakes illusion in static images, the human-inspired Dual-Channel model succeeds under simulated saccadic eye movements, showing the expected rotational motion particularly during the simulation. Ablation studies indicate that both luminance-based and higher-order color-feature-based motion signals, along with a recurrent attention mechanism, are important for this behavior. The work suggests a gap between current models and human visual motion processing.

Significance. If the central results hold, the paper provides evidence of a substantial difference in how current DNN optical flow models and human vision handle static motion illusions, with the Dual-Channel model offering a closer alignment. The inclusion of ablation analyses to identify key components (luminance, color features, recurrent attention) is a strength that offers concrete insights for improving future motion estimation systems to better match human perception. This could advance the development of human-centric AI in computer vision.

major comments (3)
  1. [§4 (Results on Rotating Snakes and Saccade Simulation)] The claim that the Dual-Channel model exhibits the expected rotational motion with closest correspondence during the saccade simulation (Abstract and Results) is not supported by any quantitative metric correlating the generated flow fields with human perceptual reports (e.g., no reported Pearson correlation or agreement rate with direction judgments). This is load-bearing for the assertion that it outperforms other models.
  2. [§3.2 (Saccade Simulation Protocol)] The simulation of saccadic eye movements lacks an explicit validation step comparing the simulated displacements to recorded human saccade statistics on the Rotating Snakes stimulus (Methods). Without this check, the protocol may not accurately replicate the eye-movement conditions that trigger the illusion, weakening the interpretation of the results.
  3. [§5 (Ablation Analyses)] The ablation results on luminance-based vs. color-feature-based signals and the recurrent attention mechanism are described qualitatively but without specific quantitative impacts on flow consistency metrics before and after ablation (Ablation Analyses), making it difficult to assess their individual contributions to the model's behavior.
minor comments (2)
  1. [Abstract] The abstract refers to 'several representative optical flow models' without listing them explicitly, which would improve clarity on the breadth of the evaluation.
  2. [Figures] The visualizations of flow fields could include quantitative annotations, such as average flow magnitude or direction histograms, to facilitate direct comparison across models and conditions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify how to strengthen the presentation of our results. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [§4 (Results on Rotating Snakes and Saccade Simulation)] The claim that the Dual-Channel model exhibits the expected rotational motion with closest correspondence during the saccade simulation (Abstract and Results) is not supported by any quantitative metric correlating the generated flow fields with human perceptual reports (e.g., no reported Pearson correlation or agreement rate with direction judgments). This is load-bearing for the assertion that it outperforms other models.

    Authors: We acknowledge that explicit quantitative correlation with human reports would provide stronger support. The original manuscript presented the comparison through direct visualization of flow fields, where only the Dual-Channel model produces coherent rotational flow matching the well-documented human perception of the illusion (clockwise or counterclockwise rotation depending on the stimulus variant), while other models produce near-zero or incoherent motion. To address the concern, we will add a quantitative directional agreement metric in the revised Results section: the fraction of flow vectors whose direction aligns with the expected illusory rotation, computed over the stimulus region. This will be reported for all models during the saccade simulation phase, allowing direct numerical comparison. revision: yes

  2. Referee: [§3.2 (Saccade Simulation Protocol)] The simulation of saccadic eye movements lacks an explicit validation step comparing the simulated displacements to recorded human saccade statistics on the Rotating Snakes stimulus (Methods). Without this check, the protocol may not accurately replicate the eye-movement conditions that trigger the illusion, weakening the interpretation of the results.

    Authors: We appreciate this observation. The protocol adopts standard saccade parameters (amplitude range 1–5°, velocity profiles, and inter-saccade intervals) drawn from the human eye-movement literature on natural scene viewing. We did not, however, include a direct empirical validation against eye-tracking recordings collected specifically on Rotating Snakes images. In the revision we will expand the Methods section to cite the precise human saccade statistics used, discuss their relevance to conditions known to elicit the illusion, and explicitly note the absence of stimulus-specific validation as a limitation that would require new eye-tracking data. revision: partial

  3. Referee: [§5 (Ablation Analyses)] The ablation results on luminance-based vs. color-feature-based signals and the recurrent attention mechanism are described qualitatively but without specific quantitative impacts on flow consistency metrics before and after ablation (Ablation Analyses), making it difficult to assess their individual contributions to the model's behavior.

    Authors: We agree that quantitative before-and-after metrics would make the ablation results more precise. The current manuscript supports the ablations with comparative flow-field visualizations showing the loss or retention of rotational motion. For the revised version we will augment the Ablation Analyses section with numerical measures, including (i) mean flow magnitude in the expected rotational direction and (ii) a directional consistency score (average cosine similarity of flow vectors to the illusory rotation field) computed on the full stimulus before and after each ablation. These values will be tabulated for the luminance channel, color-feature channel, and recurrent attention components. revision: yes

Circularity Check

0 steps flagged

Empirical model evaluation on illusory motion shows no circular derivation chain

full rationale

The paper performs direct empirical comparisons: it runs several optical flow models on static Rotating Snakes images, applies a saccade simulation, and checks which outputs produce flow fields matching human perceptual reports of rotation. No mathematical derivation, equation, or parameter-fitting step is presented that reduces to its own inputs by construction. Ablation analyses are likewise direct removals of components followed by re-testing. The Dual-Channel model is labeled human-inspired, but its superiority is established by the observed outputs rather than by any self-referential definition or load-bearing self-citation that would make the result tautological. This is a standard model-comparison study whose central claim rests on external data (model computations and human reports) rather than on any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that human reports of rotational motion in the Rotating Snakes image constitute a valid benchmark for model evaluation, plus the validity of the saccade simulation as a proxy for human eye movements.

axioms (1)
  • domain assumption Humans perceive rotational motion in the Rotating Snakes static image
    This is the established perceptual phenomenon used as the benchmark for model comparison.

pith-pipeline@v0.9.0 · 5538 in / 1238 out tokens · 41726 ms · 2026-05-10T17:48:56.765196+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 2 canonical work pages

  1. [1]

    An open-source software for the simulation of fixational eye movements.Current Directions in Biomedical Engineering, 10(4):25–28, 2024

    Stephan Allgeier, Fabian Anzlinger, Sebastian Bohn, Ralf Mikut, Oliver Neumann, Klaus-Martin Reichert, Oliver Stachs, and Karsten Sperlich. An open-source software for the simulation of fixational eye movements.Current Directions in Biomedical Engineering, 10(4):25–28, 2024. 10

  2. [2]

    rotating snakes

    Hiroshi Ashida, Ichiro Kuriki, Ikuya Murakami, Rumi Hisakata, and Akiyoshi Kitaoka. Direction-specific fmri adaptation reveals the visual cortical network underlying the “rotating snakes” illusion.Neuroimage, 61(4):1143–1152, 2012. 2

  3. [3]

    Rotating Snakes Illusion—Quantitative Analysis Reveals a Region in Luminance Space With Opposite Illusory Rotation.i-Perception, 8(1):2041669517691779, 2017

    Lea Atala-G ´erard and Michael Bach. Rotating Snakes Illusion—Quantitative Analysis Reveals a Region in Luminance Space With Opposite Illusory Rotation.i-Perception, 8(1):2041669517691779, 2017. 2, 4, 5, 6

  4. [4]

    The rotating snakes illusion is a straightforward consequence of nonlinearity in arrays of standard motion detectors

    Michael Bach and Lea Atala-G ´erard. The rotating snakes illusion is a straightforward consequence of nonlinearity in arrays of standard motion detectors. 11(5):2041669520958025, 2020. 2, 9

  5. [5]

    Backus and Ipek Oruc ¸

    Benjamin T. Backus and Ipek Oruc ¸. Illusory motion from change over time in the response to contrast and luminance.Journal of Vision, 5(11):10, 2005. 9

  6. [6]

    A database and evaluation methodology for optical flow.International journal of computer vision, 92(1):1–31, 2011

    Simon Baker, Daniel Scharstein, James P Lewis, Stefan Roth, Michael J Black, and Richard Szeliski. A database and evaluation methodology for optical flow.International journal of computer vision, 92(1):1–31, 2011. 22

  7. [7]

    Spatial remapping of the visual world across saccades.NeuroReport, 18(12):1207–1213, 2007

    Paul M Bays and Masud Husain. Spatial remapping of the visual world across saccades.NeuroReport, 18(12):1207–1213, 2007. 10

  8. [8]

    Motion estimation for large displacements and deformations.Scientific Reports, 12(1):19721,

    Qiao Chen and Charalambos Poullis. Motion estimation for large displacements and deformations.Scientific Reports, 12(1):19721,

  9. [9]

    Neural basis for a powerful static motion illusion.Journal of Neuroscience, 25(23):5651–5656, 2005

    Bevil R Conway, A3kiyoshi Kitaoka, Arash Yazdanbakhsh, Christopher C Pack, and Margaret S Livingstone. Neural basis for a powerful static motion illusion.Journal of Neuroscience, 25(23):5651–5656, 2005. 2, 9

  10. [10]

    Human microsaccade-related visual brain responses.Journal of Neuroscience, 29(39):12321–12331, 2009

    Olaf Dimigen, Matteo Valsecchi, Werner Sommer, and Reinhold Kliegl. Human microsaccade-related visual brain responses.Journal of Neuroscience, 29(39):12321–12331, 2009. 4

  11. [11]

    Flownet: Learning optical flow with convolutional networks

    Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning optical flow with convolutional networks. InProceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015. 1

  12. [12]

    Ralf Engbert, Konstantin Mergenthaler, Peter Sinn, and Carl J. J. Herrmann. An integrated model of fixational eye movements and microsaccades.Proceedings of the National Academy of Sciences, 108(5):E161–E168, 2011. 10

  13. [13]

    The peripheral drift illusion: A motion illusion in the visual periphery.Perception, 28(5): 617–621, 1999

    Jocelyn Faubert and Andrew M Herbert. The peripheral drift illusion: A motion illusion in the visual periphery.Perception, 28(5): 617–621, 1999. 2, 9, 10

  14. [14]

    Illusory motion due to causal time filtering.Vision Research, 50(3):315–329,

    Cornelia Ferm ¨uller, Hui Ji, and Akiyoshi Kitaoka. Illusory motion due to causal time filtering.Vision Research, 50(3):315–329,

  15. [15]

    Microsaccade-inspired event camera for robotics.Science Robotics, 9(76):eadj8124, 2024

    Botao He, Ze Wang, Yuan Zhou, Jingxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao, Chao Xu, Yiannis Aloimonos, Fei Gao, and Cornelia Ferm¨uller. Microsaccade-inspired event camera for robotics.Science Robotics, 9(76):eadj8124, 2024. 8

  16. [16]

    The ouchi illusion: An anomaly in the perception of rigid motion for limited spatial frequencies and angles.Perception & Psychophysics, 59(3):448–455, 1997

    Trevor Hine, Michael Cook, and Garry T Rogers. The ouchi illusion: An anomaly in the perception of rigid motion for limited spatial frequencies and angles.Perception & Psychophysics, 59(3):448–455, 1997. 3, 9

  17. [17]

    Spatial scaling of illusory motion perceived in a static figure.Journal of Vision, 18(13):15–15,

    Rumi Hisakata and Ikuya Murakami. Spatial scaling of illusory motion perceived in a static figure.Journal of Vision, 18(13):15–15,

  18. [18]

    A lightweight optical flow cnn – revisiting data fidelity and regularization.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 2555–2569, 2020

    Tak-Wai Hui, Xiaoou Tang, and Chen Change Loy. A lightweight optical flow cnn – revisiting data fidelity and regularization.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 2555–2569, 2020. 3, 13

  19. [19]

    Ccmr: High resolution optical flow estimation via coarse-to-fine context-guided motion reasoning

    Azin Jahedi, Maximilian Luz, Marc Rivinius, and Andr ´es Bruhn. Ccmr: High resolution optical flow estimation via coarse-to-fine context-guided motion reasoning. InIEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 6885–6894. IEEE, 2024. 3, 13

  20. [20]

    Inconsistent illusory motion in predictive coding deep neural networks.Vision Research, 206:108195, 2023

    OR Kirubeswaran and Katherine R Storrs. Inconsistent illusory motion in predictive coding deep neural networks.Vision Research, 206:108195, 2023. 1

  21. [21]

    Color-Dependent Motion Illusions in Stationary Images and Their Phenomenal Dimorphism.Perception, 43(9): 914–925, 2014

    Akiyoshi Kitaoka. Color-Dependent Motion Illusions in Stationary Images and Their Phenomenal Dimorphism.Perception, 43(9): 914–925, 2014. 2

  22. [22]

    Classify illusions.https://www.psy.ritsumei.ac.jp/akitaoka/classify.html, 2025

    Akiyoshi Kitaoka. Classify illusions.https://www.psy.ritsumei.ac.jp/akitaoka/classify.html, 2025. Ac- cessed: 2025-11-14. 2, 10

  23. [23]

    Phenomenal characteristics of the peripheral drift illusion.Vision, 15(4):261–262, 2003

    Akiyoshi Kitaoka and Hiroshi Ashida. Phenomenal characteristics of the peripheral drift illusion.Vision, 15(4):261–262, 2003. 1, 2

  24. [24]

    central drift illusion

    Akiyoshi Kitaoka and Hiroshi Ashida. A new anomalous motion illusion: the “central drift illusion”.https://www.psy. ritsumei.ac.jp/akitaoka/VSJ04w.html, 2004. Accessed: 2025-11-14. 3

  25. [25]

    Motion illusion-like patterns extracted from photo and art images using predictive deep neural networks.Scientific Reports, 12(1):3893, 2022

    Taisuke Kobayashi, Akiyoshi Kitaoka, Manabu Kosaka, Kenta Tanaka, and Eiji Watanabe. Motion illusion-like patterns extracted from photo and art images using predictive deep neural networks.Scientific Reports, 12(1):3893, 2022. 1

  26. [26]

    Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning

    William Lotter, Gabriel Kreiman, and David Cox. Deep predictive coding networks for video prediction and unsupervised learning. arXiv preprint arXiv:1605.08104, 2016. 1

  27. [27]

    Flowdiffuser: Advancing optical flow estimation with diffusion models

    Ao Luo, Xin Li, Fan Yang, Jiangyu Liu, Haoqiang Fan, and Shuaicheng Liu. Flowdiffuser: Advancing optical flow estimation with diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19167–19176, 2024. 3, 13

  28. [28]

    Integration biases in the ouchi and other visual illusions.Perception, 29(6):721–727, 2000

    George Mather. Integration biases in the ouchi and other visual illusions.Perception, 29(6):721–727, 2000. 9

  29. [29]

    Pupil dilation underlies the peripheral drift illusion.Journal of Vision, 25(2):13–13, 2025

    George Mather and Patrick Cavanagh. Pupil dilation underlies the peripheral drift illusion.Journal of Vision, 25(2):13–13, 2025. 2

  30. [30]

    Richards, and Christopher C

    Patrick Mineault, Shahab Bakhtiari, Blake A. Richards, and Christopher C. Pack. Your head is there to move you around: Goal-driven models of the primate dorsal pathway. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. 3, 13

  31. [31]

    A positive correlation between fixation instability and the strength of illusory motion in a static display.Vision Research, 46(15):2421–2431, 2006

    Ikuya Murakami, Akiyoshi Kitaoka, and Hiroshi Ashida. A positive correlation between fixation instability and the strength of illusory motion in a static display.Vision Research, 46(15):2421–2431, 2006. 2

  32. [32]

    Nau, Stefan B

    Merlin A. Nau, Stefan B. Ploner, Eric M. Moult, James G. Fujimoto, and Andreas K. Maier. Open source simulation of fixational eye drift motion in oct scans: Towards better comparability and accuracy in retrospective oct motion correction. InBildverarbeitung f ¨ur die Medizin 2020. Informatik aktuell, pages 254–259, 2020. 10

  33. [33]

    Saccades and microsaccades during visual fixation, exploration, and search: foundations for a common saccadic generator.Journal of vision, 8 (14):21–21, 2008

    Jorge Otero-Millan, Xoana G Troncoso, Stephen L Macknik, Ignacio Serrano-Pedraza, and Susana Martinez-Conde. Saccades and microsaccades during visual fixation, exploration, and search: foundations for a common saccadic generator.Journal of vision, 8 (14):21–21, 2008. 3

  34. [34]

    rotating snakes

    Jorge Otero-Millan, Stephen L. Macknik, and Susana Martinez-Conde. Microsaccades and blinks trigger illusory rotation in the “rotating snakes” illusion.Journal of Neuroscience, 32(17):6043–6051, 2012. 1, 2, 4

  35. [35]

    Attacking optical flow

    Anurag Ranjan, Joel Janai, Andreas Geiger, and Michael J Black. Attacking optical flow. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 2404–2413, 2019. 1

  36. [36]

    Unsupervised deep learning for optical flow estimation

    Zhe Ren, Junchi Yan, Bingbing Ni, Bin Liu, Xiaokang Yang, and Hongyuan Zha. Unsupervised deep learning for optical flow estimation. InProceedings of the AAAI conference on artificial intelligence, 2017. 1

  37. [37]

    Medathati, and Pierre Kornprobst

    Fabio Solari, Manuela Chessa, Narasimhan V . Medathati, and Pierre Kornprobst. What can we expect from a v1–mt feedforward architecture for optical flow estimation?Signal Processing: Image Communication, 39:257–268, 2015. 3, 13

  38. [38]

    The ¯Ouchi–spillmann illusion revisited.Perception, 42(4):413–429, 2013

    Lothar Spillmann. The ¯Ouchi–spillmann illusion revisited.Perception, 42(4):413–429, 2013. 9

  39. [39]

    Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume

    Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8934–8943, 2018. 3, 13

  40. [40]

    Modeling human visual motion processing with trainable motion energy sensing and a self-attention network

    Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, and Shin’ya Nishida. Modeling human visual motion processing with trainable motion energy sensing and a self-attention network. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 1, 3, 13

  41. [41]

    Acquisition of second-order motion perception by learning to recognize the motion of objects made by non-diffusive materials.Journal of Vision, 24(10):374–374, 2024

    Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, and Shin’ya Nishida. Acquisition of second-order motion perception by learning to recognize the motion of objects made by non-diffusive materials.Journal of Vision, 24(10):374–374, 2024. 13

  42. [42]

    Machine learning modelling for multi-order human visual motion processing.Nature Machine Intelligence, pages 1–16, 2025

    Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, Yuan Li, and Shin’ya Nishida. Machine learning modelling for multi-order human visual motion processing.Nature Machine Intelligence, pages 1–16, 2025. 1, 3, 7, 9, 13

  43. [43]

    Matthias Tangemann, Matthias K ¨ummerer, and Matthias Bethge. Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli.Advances in Neural Information Processing Systems, 37:137135– 137160, 2024. 1, 9

  44. [44]

    Raft: Recurrent all-pairs field transforms for optical flow

    Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InEuropean Conference on Computer Vision (ECCV), pages 402–419. Springer, 2020. 3, 13

  45. [45]

    Blue-yellow combination enhances perceived motion in rotating snakes illusion.i-Perception, 15(2):20416695241242346, 2024

    Maiko Uesaki, Arnab Biswas, Hiroshi Ashida, and Gerrit Maus. Blue-yellow combination enhances perceived motion in rotating snakes illusion.i-Perception, 15(2):20416695241242346, 2024. 2 11

  46. [46]

    Illusory motion reproduced by deep neural networks trained for prediction.Frontiers in psychology, 9:340023, 2018

    Eiji Watanabe, Akiyoshi Kitaoka, Kiwako Sakamoto, Masaki Yasugi, and Kenta Tanaka. Illusory motion reproduced by deep neural networks trained for prediction.Frontiers in psychology, 9:340023, 2018. 1

  47. [47]

    Motion illusions as optimal percepts.Nature neuroscience, 5(6):598–604,

    Yair Weiss, Eero P Simoncelli, and Edward H Adelson. Motion illusions as optimal percepts.Nature neuroscience, 5(6):598–604,

  48. [48]

    Wu, Andra M

    Eric G. Wu, Andra M. Rudzite, Martin O. Bohlen, Peter H. Li, Alexandra Kling, Sam Cooler, Colleen Rhoades, Nora Brackbill, Alex R. Gogliettino, Nishal P. Shah, Sasidhar S. Madugula, Alexander Sher, Alan M. Litke, Greg D. Field, and E. J. Chichilnisky. Fixational eye movements enhance the precision of visual information transmitted by the primate retina.Na...

  49. [49]

    Illusions in humans and ai: How visual perception aligns and diverges.arXiv preprint arXiv:2508.12422, 2025

    Jianyi Yang, Junyi Ye, Ankan Dash, and Guiling Wang. Illusions in humans and ai: How visual perception aligns and diverges.arXiv preprint arXiv:2508.12422, 2025. 1

  50. [50]

    Human percept

    Yung-Hao Yang, Taiki Fukiage, Zitang Sun, and Shin’ya Nishida. Psychophysical measurement of perceived motion flow of natural- istic scenes.Iscience, 26(12), 2023. 1 12 Appendix 1. Details on model parameters and architectures We evaluated ten motion-estimation models (Table S1): PWC-Net.A convolutional neural network (CNN) for optical flow estimation thr...