pith. sign in

arxiv: 2601.17326 · v2 · pith:UUVLGY6Fnew · submitted 2026-01-24 · 💻 cs.CV · cs.HC

SymbolSight: Minimizing Inter-Symbol Interference for Reading with Prosthetic Vision

Pith reviewed 2026-05-16 11:29 UTC · model grok-4.3

classification 💻 cs.CV cs.HC
keywords prosthetic visionretinal prosthesesinter-symbol interferencesymbol optimizationsimulated prosthetic visionbigram statisticsletter confusabilityreading performance
0
0 comments X

The pith

Optimizing symbol-to-letter assignments can reduce predicted confusion in prosthetic reading by a median factor of 22 across languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether redesigning visual symbols themselves can reduce inter-symbol interference during sequential letter presentation in retinal prostheses. Low spatial resolution and image persistence create afterimages that systematically confuse one symbol with the next. SymbolSight estimates pairwise confusability using simulated prosthetic vision and a neural proxy observer, then selects assignments that minimize expected errors weighted by language-specific bigram frequencies. Simulations for Arabic, Bulgarian, and English produce heterogeneous symbol sets that cut predicted confusion by a median factor of 22 relative to native alphabets. This approach treats symbol design as an adjustable parameter that can improve reading without requiring advances in implant hardware.

Core claim

SymbolSight selects symbol-to-letter mappings to minimize confusion among frequently adjacent letters by estimating pairwise symbol confusability via simulated prosthetic vision and a neural proxy observer, then optimizing the assignments with language-specific bigram statistics; the resulting heterogeneous symbol sets reduce predicted confusion by a median factor of 22 relative to native alphabets across Arabic, Bulgarian, and English simulations.

What carries the argument

An optimization procedure that assigns symbols to letters to minimize expected confusion, computed from bigram probabilities multiplied by simulated pairwise confusability.

Load-bearing premise

The neural proxy observer and simulated prosthetic vision accurately capture human letter confusability under real prosthetic conditions.

What would settle it

A psychophysical study with actual prosthetic users in which the optimized symbols produce no measurable reduction in reading errors compared with standard alphabets.

Figures

Figures reproduced from arXiv: 2601.17326 by Jasmine Lesner, Michael Beyeler.

Figure 1
Figure 1. Figure 1: SYMBOLSIGHT pipeline: candidate symbols undergo phosphene simulation and recognition modeling, then are assigned to letters based on confusion probabilities and bigram statistics. Second, temporal nonlinearities introduce inter-symbol in￾terference. Percepts can persist and fade over hundreds of milliseconds or longer [7], [8], so the afterimage of one symbol can distort the next. In this regime, the discr… view at source ↗
Figure 2
Figure 2. Figure 2: Letter transition probabilities across three languages. Heatmaps show P(Ln+1 | Ln) for Arabic (left), Bulgarian (middle), and English (right). The vertical axis is the leading (current) letter; the horizontal axis is the following (next) letter. Darker cells indicate higher probability. Language-specific bigram probabilities were estimated from November 2023 Wikipedia database dumps [22] for Arabic, Bulgar… view at source ↗
Figure 3
Figure 3. Figure 3: Top row: low distortion; middle row: medium distortion; bottom row: high distortion. Middle column: 146 symbols with spatial distortion (Latin 0–25, Braille 26–51, Arabic 52–79, DCT 80–115, Cyrillic 116–145). Right column: example of temporal distortion showing perceptual residue when symbols are presented sequentially. Left column: symbol confusion probability heatmaps from fine-tuned neural networks eval… view at source ↗
Figure 4
Figure 4. Figure 4: Arabic symbol comparison. Top three rows: native Arabic at low, medium, and high distortion. Bottom three rows: optimized symbols at each distortion level in matching order. Note how the native characters blur into indistinguishable blobs at high distortion (Row 3), whereas the optimized glyphs maintain distinct structural footprints (Row 6) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Bulgarian symbol comparison. Top three rows: native Cyrillic at low, medium, and high distortion. Bottom three rows: optimized symbols at each distortion level in matching order [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: English symbol comparison. Top three rows: native Latin at low, medium, and high distortion. Bottom three rows: optimized symbols at each distortion level in matching order. Second, as distortion increases, the algorithm shifts toward symbols with coarse, high-contrast structure, preferring Braille and DCT symbols, as their low-frequency content survives blur better than fine strokes. Third, comparing nati… view at source ↗
read the original abstract

Retinal prostheses restore limited visual perception, but low spatial resolution and temporal persistence make reading difficult. In sequential letter presentation, the afterimage of one symbol can interfere with perception of the next, leading to systematic recognition errors. Rather than relying on future hardware improvements, we investigate whether optimizing the visual symbols themselves can mitigate this temporal interference. We present SymbolSight, a computational framework that selects symbol-to-letter mappings to minimize confusion among frequently adjacent letters. Using simulated prosthetic vision (SPV) and a neural proxy observer, we estimate pairwise symbol confusability and optimize assignments using language-specific bigram statistics. Across simulations in Arabic, Bulgarian, and English, the resulting heterogeneous symbol sets reduced predicted confusion by a median factor of 22 relative to native alphabets. These results suggest that standard typography is poorly matched to serial, low-bandwidth prosthetic vision and demonstrate how computational modeling can narrow the design space of visual encodings, identifying high-potential candidates for future psychophysical and clinical evaluation rather than predicting present-day clinical reading performance directly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces SymbolSight, a computational framework that uses simulated prosthetic vision (SPV) and a neural proxy observer to estimate pairwise symbol confusability and optimize heterogeneous symbol-to-letter mappings that minimize predicted inter-symbol interference. The optimization incorporates language-specific bigram frequencies for Arabic, Bulgarian, and English. Simulations report that the resulting symbol sets achieve a median 22-fold reduction in predicted confusion relative to native alphabets, positioning the work as a method to narrow the design space for future empirical evaluation rather than a direct clinical predictor.

Significance. If the neural proxy and SPV faithfully reproduce human confusability patterns under real prosthetic conditions, the approach could usefully constrain the space of visual encodings for low-resolution, temporally persistent vision and accelerate identification of high-potential symbol sets for psychophysical testing. The work correctly emphasizes computational modeling as a precursor to hardware or clinical studies and supplies reproducible simulation pipelines that could be extended.

major comments (2)
  1. [Abstract and Results] Abstract and Results sections: the reported median factor-of-22 reduction in predicted confusion is computed entirely from pairwise confusability matrices generated by the SPV plus neural proxy; no error bars, sensitivity analysis to proxy hyperparameters, or cross-validation against human letter-confusion matrices under equivalent low-resolution, temporally persistent conditions are provided, rendering the quantitative claim dependent on an untested modeling assumption.
  2. [Methods] Methods (neural proxy observer): the manuscript supplies no details on training data, architecture, or validation of the neural proxy, nor any comparison to published psychophysical confusion matrices from prosthetic users; without such grounding, the optimization cannot be shown to target the actual error patterns (e.g., afterimage decay or phosphene overlap) that would occur in vivo.
minor comments (1)
  1. [Abstract] Abstract: the final sentence could more explicitly qualify that the 22-fold figure is a simulation-derived prediction rather than a measured human performance gain.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments, which help clarify the scope of our computational framework. We address each major comment below and have revised the manuscript to strengthen the presentation of modeling assumptions and methodological details.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results sections: the reported median factor-of-22 reduction in predicted confusion is computed entirely from pairwise confusability matrices generated by the SPV plus neural proxy; no error bars, sensitivity analysis to proxy hyperparameters, or cross-validation against human letter-confusion matrices under equivalent low-resolution, temporally persistent conditions are provided, rendering the quantitative claim dependent on an untested modeling assumption.

    Authors: We agree that the factor-of-22 reduction is a model-predicted quantity and that the original text should have included more qualification and robustness checks. In the revised manuscript we have added error bars (from 50 independent simulation runs with varied random seeds) to the reported median reduction, included a sensitivity analysis varying neural proxy hyperparameters (learning rate, hidden layer sizes, and dropout), and explicitly restated in the Abstract and Results that the value reflects predicted confusion under the SPV model rather than measured human performance. Cross-validation against human psychophysical matrices is not feasible in this purely computational study; we have expanded the Discussion to emphasize this limitation and the need for future empirical testing. revision: partial

  2. Referee: [Methods] Methods (neural proxy observer): the manuscript supplies no details on training data, architecture, or validation of the neural proxy, nor any comparison to published psychophysical confusion matrices from prosthetic users; without such grounding, the optimization cannot be shown to target the actual error patterns (e.g., afterimage decay or phosphene overlap) that would occur in vivo.

    Authors: We acknowledge the original Methods section was insufficiently detailed. The revised version now specifies the neural proxy architecture (a 4-layer CNN with 3×3 convolutions, ReLU activations, and a final softmax classifier), the training procedure (supervised training on 120,000 SPV-rendered symbol images with added temporal persistence and phosphene-overlap noise), and validation accuracy on a held-out simulated test set. A new subsection compares the proxy-derived confusion patterns to error types reported in prior prosthetic-vision literature (e.g., spatial merging and afterimage persistence), showing qualitative alignment with the interference mechanisms the SPV model is designed to capture. revision: yes

standing simulated objections not resolved
  • Cross-validation of the neural proxy against human letter-confusion matrices collected from actual prosthetic users under matched low-resolution, temporally persistent conditions, as this would require new clinical psychophysical experiments outside the scope of the present computational study.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation relies on externally supplied language bigram frequencies and independently generated pairwise confusability matrices from the SPV plus neural proxy observer. Symbol assignments are then optimized to minimize a predicted confusion metric computed from those matrices. This produces a quantitative reduction factor as an output of the optimization rather than a quantity that reduces by construction to fitted parameters or self-referential definitions within the paper's own equations. No load-bearing step invokes a self-citation chain, uniqueness theorem, or ansatz that is itself unverified within the present work. The central claim therefore remains self-contained against the stated simulation inputs and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim depends on two domain assumptions about the fidelity of the simulation and proxy model plus the relevance of bigram statistics; no new physical entities are postulated and the only free parameters are those internal to the optimization routine.

free parameters (2)
  • pairwise symbol confusability estimates
    Derived from SPV simulations and used as input to the assignment optimizer
  • bigram frequency weights
    Language-specific counts used to prioritize minimization of frequent adjacent-letter confusions
axioms (2)
  • domain assumption Neural proxy observer accurately models human letter recognition under SPV conditions
    Invoked to generate the confusability matrix that drives the optimization
  • domain assumption Bigram statistics from language corpora capture the relevant temporal adjacency patterns for reading
    Used to weight the objective function in the symbol assignment search

pith-pipeline@v0.9.0 · 5474 in / 1437 out tokens · 40408 ms · 2026-05-16T11:29:25.429078+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Subretinal Photovoltaic Implant to Restore Vision in Geographic Atrophy Due to AMD,

    F. G. Holz, Y . L. Mer, M. M. K. Muqit, L.-O. Hattenbach, A. Cusumano, S. Grisanti, L. Kodjikian, M. A. Pileri, F. Matonti, E. Souied, B. V . Stanzel, P. Szurman, M. Weber, K. U. Bartz-Schmidt, N. Eter, M. N. Delyfer, J. F. Girmens, K. A. v. Overdam, A. Wolf, R. Hornig, M. Coraz- zol, F. Brodie, L. O. d. Koo, D. Palanker, and J.-A. Sahel, “Subretinal Phot...

  2. [2]

    Subretinal electronic chips allow blind patients to read letters and combine them to words,

    E. Zrenner, K. U. Bartz-Schmidt, H. Benav, D. Besch, A. Bruckmann, V . P. Gabel, F. Gekeler, U. Greppmaier, A. Harscher, S. Kibbel, J. Koch, A. Kusnyerik, T. Peters, K. Stingl, H. Sachs, A. Stett, P. Szurman, B. Wilhelm, and R. Wilke, “Subretinal electronic chips allow blind patients to read letters and combine them to words,”Proc Biol Sci, vol. 278, pp. ...

  3. [3]

    The Argus II epiretinal prosthesis system allows letter and word reading and long- term function in patients with profound vision loss,

    L. da Cruz, B. F. Coley, J. D. Dorn, F. Merlini, E. Filley, P. Christopher, F. K. Chen, V . Wuyyuru, J. Sahel, P. Stanga, M. Humayun, R. J. Greenberg, G. Dagnelie, and Argus II Study Group, “The Argus II epiretinal prosthesis system allows letter and word reading and long- term function in patients with profound vision loss,”British Journal of Ophthalmolo...

  4. [4]

    Learning to see again: biological constraints on cortical plasticity and the implications for sight restoration technologies,

    M. Beyeler, A. Rokem, G. M. Boynton, and I. Fine, “Learning to see again: biological constraints on cortical plasticity and the implications for sight restoration technologies,”J Neural Eng, vol. 14, p. 051003, June 2017

  5. [5]

    Axonal stimulation affects the linear summation of single-point perception in three Argus II users,

    Y . Hou, D. Nanduri, J. Granley, J. D. Weiland, and M. Beyeler, “Axonal stimulation affects the linear summation of single-point perception in three Argus II users,”Journal of Neural Engineering, vol. 21, p. 026031, Apr. 2024

  6. [6]

    Sequential epiretinal stimulation improves discrimination in simple shape discrimination tasks only,

    B. Christie, R. Sadeghi, A. Kartha, A. Caspi, F. V . Tenore, R. L. Klatzky, G. Dagnelie, and S. Billings, “Sequential epiretinal stimulation improves discrimination in simple shape discrimination tasks only,”Journal of Neural Engineering, vol. 19, p. 036033, June 2022

  7. [7]

    Temporal interactions during paired-electrode stimulation in two retinal prosthesis subjects,

    A. Horsager, G. M. Boynton, R. J. Greenberg, and I. Fine, “Temporal interactions during paired-electrode stimulation in two retinal prosthesis subjects,”Invest Ophthalmol Vis Sci, vol. 52, pp. 549–57, Jan. 2011

  8. [8]

    Temporal Properties of Visual Perception on Electrical Stimulation of the Retina,

    A. P ´erez Fornos, J. Sommerhalder, L. da Cruz, J. A. Sahel, S. Mohand- Said, F. Hafezi, and M. Pelizzone, “Temporal Properties of Visual Perception on Electrical Stimulation of the Retina,”Investigative Oph- thalmology & Visual Science, vol. 53, no. 6, pp. 2720–2731, 2012

  9. [9]

    Reading with a simulated 60-channel implant,

    A. P. Fornos, J. Sommerhalder, and M. Pelizzone, “Reading with a simulated 60-channel implant,”Frontiers in Neuroscience, vol. 5, p. 57, 2011

  10. [10]

    Simulation of thalamic prosthetic vision: reading accuracy, speed, and acuity in sighted hu- mans,

    M. Vurro, A. M. Crowell, and J. S. Pezaris, “Simulation of thalamic prosthetic vision: reading accuracy, speed, and acuity in sighted hu- mans,”Frontiers in Human Neuroscience, vol. 8, 2014

  11. [11]

    Full gaze contingency provides better reading performance than head steering alone in a simulation of prosthetic vision,

    N. Paraskevoudi and J. S. Pezaris, “Full gaze contingency provides better reading performance than head steering alone in a simulation of prosthetic vision,”Scientific Reports, vol. 11, p. 11121, May 2021. Number: 1

  12. [12]

    C. D. Wickens, W. S. Helton, J. G. Hollands, and S. Banbury,Engineer- ing Psychology and Human Performance. Routledge, 5th ed., 2021

  13. [13]

    Development of a new guide sign alphabet,

    P. M. Garvey, M. T. Pietrucha, and D. Meeker, “Development of a new guide sign alphabet,”Ergonomics in Design, vol. 6, no. 3, pp. 7–11, 1998

  14. [14]

    A model of ganglion axon pathways accounts for percepts elicited by retinal implants,

    M. Beyeler, D. Nanduri, J. D. Weiland, A. Rokem, G. M. Boynton, and I. Fine, “A model of ganglion axon pathways accounts for percepts elicited by retinal implants,”Scientific Reports, vol. 9, no. 1, p. 9199, 2019

  15. [15]

    The Appearance of Phosphenes Elicited Using a Suprachoroidal Retinal Prosthesis,

    N. C. Sinclair, M. N. Shivdasani, T. Perera, L. N. Gillespie, H. J. McDermott, L. N. Ayton, and P. J. Blamey, “The Appearance of Phosphenes Elicited Using a Suprachoroidal Retinal Prosthesis,”Inves- tigative Ophthalmology & Visual Science, vol. 57, pp. 4948–4961, Sept. 2016

  16. [16]

    pulse2percept: A python-based simulation framework for bionic vision,

    M. Beyeler, G. M. Boynton, I. Fine, and A. Rokem, “pulse2percept: A python-based simulation framework for bionic vision,” inProceedings of the 16th Python in Science Conference (SciPy 2017), pp. 81–88, 2017

  17. [17]

    A Computational Model of Phosphene Ap- pearance for Epiretinal Prostheses,

    J. Granley and M. Beyeler, “A Computational Model of Phosphene Ap- pearance for Epiretinal Prostheses,” in2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 4477–4481, Nov. 2021

  18. [18]

    Deep learning– based scene simplification for bionic vision,

    N. Han, S. Srivastava, A. Xu, D. Klein, and M. Beyeler, “Deep learning– based scene simplification for bionic vision,” inAugmented Humans International Conference 2021 (AHs ’21), pp. 45–54, ACM, 2021

  19. [19]

    Predicting the Temporal Dynamics of Prosthetic Vision,

    Y . Hou, L. Pullela, J. Su, S. Aluru, S. Sista, X. Lu, and M. Beyeler, “Predicting the Temporal Dynamics of Prosthetic Vision,” in2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1–4, July 2024

  20. [20]

    Simulated prosthetic vision confirms checkerboard as an effective raster pattern for epiretinal implants,

    J. M. Kasowski, A. Varshney, R. Sadeghi, and M. Beyeler, “Simulated prosthetic vision confirms checkerboard as an effective raster pattern for epiretinal implants,”Journal of Neural Engineering, vol. 22, no. 4, p. 046017, 2025

  21. [21]

    mixup: Beyond empirical risk minimization,

    H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” inInternational Conference on Learning Representations (ICLR), 2018

  22. [22]

    Wikimedia downloads: Wikipedia database dumps,

    Wikimedia Foundation, “Wikimedia downloads: Wikipedia database dumps,” 2023

  23. [23]

    Searching for mobilenetv3,

    A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, Q. V . Le, and H. Adam, “Searching for mobilenetv3,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324, IEEE, October 2019

  24. [24]

    A Deep Learning Framework for Predicting Func- tional Visual Performance in Bionic Eye Users,

    J. Skaza, S. Murlidaran, A. Varshney, Z. Wen, W. Wang, M. P. Eckstein, and M. Beyeler, “A Deep Learning Framework for Predicting Func- tional Visual Performance in Bionic Eye Users,” June 2025. Pages: 2025.06.23.660990 Section: New Results

  25. [25]

    Discrete cosine transform,

    N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Transactions on Computers, vol. 23, no. 1, pp. 90–93, 1974

  26. [26]

    The hungarian method for the assignment problem,

    H. W. Kuhn, “The hungarian method for the assignment problem,”Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955

  27. [27]

    Improvement in reading performance through training with simulated thalamic visual prostheses,

    K. E. K. Rassia and J. S. Pezaris, “Improvement in reading performance through training with simulated thalamic visual prostheses,”Scientific Reports, vol. 8, p. 16310, Nov. 2018. Number: 1

  28. [28]

    Human-in-the-loop optimization for deep stimulus encoding in visual prostheses,

    J. Granley, T. Fauvel, M. Chalk, and M. Beyeler, “Human-in-the-loop optimization for deep stimulus encoding in visual prostheses,” inAd- vances in Neural Information Processing Systems(A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, eds.), vol. 36, pp. 79376–79398, 2023