pith. sign in

arxiv: 2604.09829 · v1 · submitted 2026-04-10 · 💻 cs.RO

Perception Is All You Need: A Neuroscience Framework for Low Cost Sensorless Gaze in HRI

Pith reviewed 2026-05-10 16:47 UTC · model grok-4.3

classification 💻 cs.RO
keywords gaze followinghuman-robot interactionsensorless roboticsconvexity priorpredictive processinglow-cost HRIperceptual illusionneuroscience framework
0
0 comments X

The pith

A cardboard robot with concave painted eyes makes viewers perceive mutual gaze by exploiting their own brain's assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a framework for sensorless gaze in human-robot interaction by reversing the brain's natural gaze computation process. It shows that a simple concave eye socket with a painted pupil on a low-cost cardboard robot will be perceived as making eye contact due to the visual system's strong convexity prior and predictive face processing. This turns the viewer's perception into the effective actuator for the robot, removing all needs for sensors, computation, power sources, or data collection. Grounded in neuroscience evidence from face processing networks and depth perception overrides, the approach derives specific design rules and identifies where it will work or fail. If successful, established benefits of robot gaze in education and therapy become feasible at massive scale with minimal resources.

Core claim

The core discovery is that implementing the brain's gaze direction computation in reverse—via a concave eye design—causes the distributed face processing network, including the superior temporal sulcus, to interpret the painted pupil as directed gaze because the high-precision convexity prior forces perception of the socket as convex and top-down predictions override actual depth signals from the concavity.

What carries the argument

The brain's high-precision convexity prior in the predictive processing hierarchy, which overrides bottom-up depth cues with top-down face knowledge to perceive concave eye sockets as convex and thereby compute mutual gaze direction.

If this is right

  • Existing findings on how robot gaze improves attention and learning in children can now be applied using platforms costing under one dollar.
  • Robot designs become open-source templates with interchangeable eye inserts that parameterize the effect.
  • Privacy concerns disappear since no sensors or data processing are involved.
  • Boundary conditions based on developmental stages, clinical populations, and viewing geometry predict where the effect holds.
  • Two decades of HRI research on gaze become deliverable without the previous cost and complexity barriers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar perceptual hacks could apply to other robot behaviors like facial expressions to reduce hardware needs.
  • Field tests in real classrooms would reveal if the effect persists over time or with repeated exposure.
  • Variations in the illusion strength across different cultures or age groups could inform refinements to the eye insert designs.
  • Combining this with other low-tech robot elements might create fully functional educational robots from printed materials.

Load-bearing premise

The convexity prior and predictive processing hierarchy will reliably make people perceive mutual gaze from a concave painted eye across different angles, ages, and conditions in actual interactions.

What would settle it

A controlled study showing whether participants exhibit gaze-following behaviors or report eye contact with the concave-eye robot compared to control designs with flat or protruding eyes, especially under varying lighting or distances.

Figures

Figures reproduced from arXiv: 2604.09829 by Mason Kadem.

Figure 1
Figure 1. Figure 1: The proposed eye Third, we present an open sub-dollar platform with parameterized eye geometry ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Solved these problems A subcortical route through the superior colliculus, pulvinar, and amygdala provides rapid, coarse detection of whether someone is looking at you and reflexive social attention (Senju and Johnson, 2009). If the hollow-face percept produces the appearance of direct gaze, a fast-track system would engage, potentially triggering the full cascade of social-cognitive effects associated wit… view at source ↗
Figure 3
Figure 3. Figure 3: The perceptual pipeline The perceptual processing pipeline from retinal input to the illusory gaze-following percept. The key insight is that the robot contributes only the physical stimulus (a concave eye socket with a painted pupil). Every subsequent processing step from face detection, gaze computation, depth inversion, to the resulting social-cognitive cascade is performed by the viewer’s own neural ar… view at source ↗
Figure 4
Figure 4. Figure 4: Physics-based geometric robot eye gaze design. Concave eye inserts with painted pupils produce [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Gaze-following in child-robot interaction improves attention, recall, and learning, but requires expensive platforms (\$30,000+), sensors, algorithms, and raises privacy concerns. We propose a framework that avoids sensors and computation entirely, instead relying on the human visual system's assumption of convexity to produce perceptual gaze-following between a robot and its viewer. Specifically, we motivate sub-dollar cardboard robot design that directly implements the brain's own gaze computation pipeline in reverse, making the viewer's perceptual system the robot's "actuator", with no sensors, no power, and no privacy concerns. We ground this framework in three converging lines of theoretical and empirical neuroscience evidence. Namely, the distributed face processing network that computes gaze direction via the superior temporal sulcus, the high-precision convexity prior that causes the brain to perceive concave faces as convex, and the predictive processing hierarchy in which top-down face knowledge overrides bottom-up depth signals. These mechanisms explain why a concave eye socket with a painted pupil produces the perception of mutual gaze from any viewing angle. We derive design constraints from perceptual science, present a sub-dollar open-template robot with parameterized interchangeable eye inserts, and identify boundary conditions (developmental, clinical, and geometric) that predict where the framework will succeed and where it will fail. If leveraged, two decades of HRI gaze findings become deliverable at population scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a neuroscience-inspired framework for sensorless gaze-following in HRI that uses a sub-dollar cardboard robot with concave eye sockets and painted pupils. It claims that the human visual system's convexity prior, face-processing network (STS), and predictive-processing hierarchy will cause viewers to perceive mutual gaze from any angle, effectively making the viewer's perception the robot's actuator. The work grounds the idea in three lines of existing literature, derives design constraints, presents an open-template robot with interchangeable eye inserts, and identifies developmental, clinical, and geometric boundary conditions under which the approach is predicted to succeed or fail, with the goal of scaling established HRI gaze benefits without sensors, computation, power, or privacy issues.

Significance. If the proposed translation from neuroscience to this low-fidelity artifact holds, the framework could enable population-scale deployment of gaze-following robots in education and therapy at negligible cost. The manuscript earns credit for its explicit grounding in established neuroscience results, the provision of a parameterized open-source template, and the clear statement of falsifiable boundary conditions rather than overclaiming universality.

major comments (2)
  1. [Abstract and Neuroscience Framework] The central claim that the cardboard design 'produces the perception of mutual gaze from any viewing angle' (Abstract) rests on an untested extrapolation; the cited convexity-prior and predictive-processing studies use near-photorealistic or mask stimuli in static adult lab settings, and the manuscript supplies no user studies, eye-tracking data, or validation experiments confirming the effect survives cardboard geometry, low-contrast painting, child viewers, autism-spectrum populations, or dynamic HRI angles.
  2. [Boundary Conditions] Boundary Conditions section: the listed developmental, clinical, and geometric limits are not quantified against the concrete design parameters (socket depth, pupil placement, viewing distance, contrast) that the open template actually uses, so the reliability prediction for the specific artifact remains unsupported by data internal to the manuscript.
minor comments (1)
  1. [Design] The open-template description would benefit from explicit CAD or cut-file parameters (e.g., exact concavity depth in mm) so that boundary-condition tests can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The manuscript presents a theoretical neuroscience framework and open design template rather than an empirical validation study. We address each major comment below and outline targeted revisions to improve clarity and precision without altering the core contribution.

read point-by-point responses
  1. Referee: [Abstract and Neuroscience Framework] The central claim that the cardboard design 'produces the perception of mutual gaze from any viewing angle' (Abstract) rests on an untested extrapolation; the cited convexity-prior and predictive-processing studies use near-photorealistic or mask stimuli in static adult lab settings, and the manuscript supplies no user studies, eye-tracking data, or validation experiments confirming the effect survives cardboard geometry, low-contrast painting, child viewers, autism-spectrum populations, or dynamic HRI angles.

    Authors: We agree that the manuscript contains no new empirical data and that the abstract phrasing could be read as asserting a proven outcome rather than a literature-derived prediction. The contribution is the reverse-engineering of established perceptual mechanisms (convexity prior, STS gaze computation, and predictive processing) into a low-cost artifact, with explicit boundary conditions stated as falsifiable predictions. In revision we will rephrase the abstract and introduction to emphasize that the mutual-gaze perception is a hypothesized outcome requiring future validation across the listed populations and conditions. We will also add a short paragraph in the discussion outlining planned or recommended empirical tests (e.g., forced-choice gaze-direction judgments with the open template). This change clarifies scope without weakening the grounding in the cited neuroscience. revision: yes

  2. Referee: [Boundary Conditions] Boundary Conditions section: the listed developmental, clinical, and geometric limits are not quantified against the concrete design parameters (socket depth, pupil placement, viewing distance, contrast) that the open template actually uses, so the reliability prediction for the specific artifact remains unsupported by data internal to the manuscript.

    Authors: The boundary conditions are currently stated at the level of the general perceptual literature rather than tied to the template's specific dimensions. We acknowledge this gap. In the revised manuscript we will expand the section to map each condition to approximate quantitative ranges drawn from the referenced studies (for example, effective distances for convexity reversal effects, contrast thresholds for pupil visibility, and age ranges for mature face processing). Where the literature does not supply exact values for cardboard geometry, we will explicitly note the absence and flag it as an empirical question for users of the template. This will give readers clearer guidance on expected reliability for the provided design parameters. revision: partial

Circularity Check

0 steps flagged

No significant circularity; proposal applies external neuroscience results to new design without self-referential reduction

full rationale

The manuscript presents a conceptual framework that reverses known perceptual mechanisms (convexity prior, predictive processing hierarchy, STS gaze computation) to motivate a passive cardboard robot design. No equations, fitted parameters, or predictions are defined in terms of themselves. Design constraints are stated as derived from cited perceptual science literature rather than from any internal fit or self-citation chain. The central success claim is an untested extrapolation to HRI settings, but this is not circular by construction; it is simply unsupported by new data. No load-bearing step reduces to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 1 invented entities

The central claim rests on three standard neuroscience assumptions about face processing and convexity without new free parameters or invented physical entities beyond the proposed robot design itself.

axioms (3)
  • domain assumption The distributed face processing network computes gaze direction via the superior temporal sulcus.
    Invoked to explain how the brain extracts gaze from faces.
  • domain assumption The brain applies a high-precision convexity prior that causes concave faces to be perceived as convex.
    Central mechanism for why the concave eye socket produces mutual-gaze perception.
  • domain assumption Top-down face knowledge overrides bottom-up depth signals in the predictive processing hierarchy.
    Explains why the illusion persists despite actual concave geometry.
invented entities (1)
  • Cardboard robot with concave eye sockets and painted pupils no independent evidence
    purpose: To serve as a passive actuator that triggers perceptual gaze-following
    New physical design introduced by the paper; no independent evidence outside the cited perceptual mechanisms.

pith-pipeline@v0.9.0 · 5539 in / 1538 out tokens · 48515 ms · 2026-05-10T16:47:36.357815+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

  1. [1]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  2. [2]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  3. [3]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...